[c++-delayed-folding] cp_fold_r
I think we want to clear *walk_subtrees a lot more often in cp_fold_r; as it is, for most expressions we end up calling cp_fold on the full-expression, then uselessly on the subexpressions after we already folded the containing expression. Jason
[c++-delayed-folding] code review
@@ -216,6 +216,7 @@ libgcov-driver-tool.o-warn = -Wno-error libgcov-merge-tool.o-warn = -Wno-error gimple-match.o-warn = -Wno-unused generic-match.o-warn = -Wno-unused +insn-modes.o-warn = -Wno-error This doesn't seem to be needed anymore. @@ -397,11 +397,13 @@ convert_to_real (tree type, tree expr) EXPR must be pointer, integer, discrete (enum, char, or bool), float, fixed-point or vector; in other cases error is called. + If DOFOLD is TRZE, we try to simplify newly-created patterns by folding. "TRUE" + if (!dofold) +{ + expr = build1 (CONVERT_EXPR, +lang_hooks.types.type_for_size + (TYPE_PRECISION (intype), 0), +expr); + return build1 (CONVERT_EXPR, type, expr); + } When we're not folding, I don't think we want to do the two-step conversion, just the second one. And we might want to use NOP_EXPR instead of CONVERT_EXPR, but I'm not sure about that. @@ -818,10 +828,15 @@ convert_to_integer (tree type, tree expr) if (TYPE_UNSIGNED (typex)) typex = signed_type_for (typex); } - return convert (type, - fold_build2 (ex_form, typex, -convert (typex, arg0), -convert (typex, arg1))); + if (dofold) + return convert (type, + fold_build2 (ex_form, typex, + convert (typex, arg0), + convert (typex, arg1))); + arg0 = build1 (CONVERT_EXPR, typex, arg0); + arg1 = build1 (CONVERT_EXPR, typex, arg1); + expr = build2 (ex_form, typex, arg0, arg1); + return build1 (CONVERT_EXPR, type, expr); This code path seems to be for pushing a conversion down into a binary expression. We shouldn't do this at all when we aren't folding. @@ -845,9 +860,14 @@ convert_to_integer (tree type, tree expr) if (!TYPE_UNSIGNED (typex)) typex = unsigned_type_for (typex); + if (!dofold) + return build1 (CONVERT_EXPR, type, +build1 (ex_form, typex, +build1 (CONVERT_EXPR, typex, +TREE_OPERAND (expr, 0; Likewise. @@ -867,6 +887,14 @@ convert_to_integer (tree type, tree expr) the conditional and never loses. A COND_EXPR may have a throw as one operand, which then has void type. Just leave void operands as they are. */ + if (!dofold) + return build3 (COND_EXPR, type, TREE_OPERAND (expr, 0), + VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 1))) + ? TREE_OPERAND (expr, 1) + : build1 (CONVERT_EXPR, type, TREE_OPERAND (expr, 1)), + VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, 2))) + ? TREE_OPERAND (expr, 2) + : build1 (CONVERT_EXPR, type, TREE_OPERAND (expr, 2))); Likewise. @@ -903,6 +933,10 @@ convert_to_integer (tree type, tree expr) return build1 (FIXED_CONVERT_EXPR, type, expr); case COMPLEX_TYPE: + if (!dofold) + return build1 (CONVERT_EXPR, type, + build1 (REALPART_EXPR, + TREE_TYPE (TREE_TYPE (expr)), expr)); Why can't we call convert here rather than build1 a CONVERT_EXPR? It would be good to ask a fold/convert maintainer to review the changes to this file, too. @@ -5671,8 +5668,8 @@ build_new_op_1 (location_t loc, enum tree_code code, int flags, tree arg1, decaying an enumerator to its value. */ if (complain & tf_warning) warn_logical_operator (loc, code, boolean_type_node, - code_orig_arg1, arg1, - code_orig_arg2, arg2); + code_orig_arg1, fold (arg1), + code_orig_arg2, fold (arg2)); arg2 = convert_like (conv, arg2, complain); } @@ -5710,7 +5707,7 @@ build_new_op_1 (location_t loc, enum tree_code code, int flags, tree arg1, case TRUTH_OR_EXPR: if (complain & tf_warning) warn_logical_operator (loc, code, boolean_type_node, - code_orig_arg1, arg1, code_orig_arg2, arg2); + code_orig_arg1, fold (arg1), code_orig_arg2, fold (arg2)); /* Fall through. */ case GT_EXPR: case LT_EXPR: @@ -5721,9 +5718,10 @@ build_new_op_1 (location_t loc, enum tree_code code, int flags, tree arg1, if
Re: [PATCH, rs6000] Add expansions for min/max vector reductions
On Wed, 16 Sep 2015, Alan Lawrence wrote: > On 16/09/15 17:10, Bill Schmidt wrote: > > > > On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote: > > > On 16/09/15 15:28, Bill Schmidt wrote: > > > > 2015-09-16 Bill Schmidt> > > > > > > > * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, > > > > UNSPEC_REDUC_SMIN, > > > > UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, UNSPEC_REDUC_SMAX_SCAL, > > > > UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL, > > > > UNSPEC_REDUC_UMIN_SCAL): New enumerated constants. > > > > (reduc_smax_v2di): New define_expand. > > > > (reduc_smax_scal_v2di): Likewise. > > > > (reduc_smin_v2di): Likewise. > > > > (reduc_smin_scal_v2di): Likewise. > > > > (reduc_umax_v2di): Likewise. > > > > (reduc_umax_scal_v2di): Likewise. > > > > (reduc_umin_v2di): Likewise. > > > > (reduc_umin_scal_v2di): Likewise. > > > > (reduc_smax_v4si): Likewise. > > > > (reduc_smin_v4si): Likewise. > > > > (reduc_umax_v4si): Likewise. > > > > (reduc_umin_v4si): Likewise. > > > > (reduc_smax_v8hi): Likewise. > > > > (reduc_smin_v8hi): Likewise. > > > > (reduc_umax_v8hi): Likewise. > > > > (reduc_umin_v8hi): Likewise. > > > > (reduc_smax_v16qi): Likewise. > > > > (reduc_smin_v16qi): Likewise. > > > > (reduc_umax_v16qi): Likewise. > > > > (reduc_umin_v16qi): Likewise. > > > > (reduc_smax_scal_): Likewise. > > > > (reduc_smin_scal_): Likewise. > > > > (reduc_umax_scal_): Likewise. > > > > (reduc_umin_scal_): Likewise. > > > > > > You shouldn't need the non-_scal reductions. Indeed, they shouldn't be > > > used if > > > the _scal are present. The non-_scal's were previously defined as > > > producing a > > > vector with one element holding the result and the other elements all > > > zero, and > > > this was only ever used with a vec_extract immediately after; the _scal > > > pattern > > > now includes the vec_extract as well. Hence the non-_scal patterns are > > > deprecated / considered legacy, as per md.texi. > > > > Thanks -- I had misread the description of the non-scalar versions, > > missing the part where the other elements are zero. What I really > > want/need is an optab defined as computing the maximum value in all > > elements of the vector. > > Yes, indeed. It seems reasonable to me that this would coexist with an optab > which computes only a single value (i.e. a scalar). So just to clarify - you need to reduce the vector with max to a scalar but want the (same) result in all vector elements? > At that point it might be appropriate to change the cond-reduction code to > generate the reduce-to-vector in all cases, and optabs.c expand it to > reduce-to-scalar + broadcast if reduce-to-vector was not available. Along with > the (parallel) changes to cost model already proposed, does that cover all the > cases? It does add a new tree code, yes, but I'm feeling that could be > justified if we go down this route. I'd rather have combine recognize an insn that does both (if it exists). As I understand powerpc doesn't have reduction to scalar (I think no ISA actually has this, but all ISAs have reduce to one vector element plus a cheap way of extraction (effectively a subreg)) but it's reduction already has all result vector elements set to the same value (as opposed to having some undefined or zero or whatever). > However, another point that springs to mind: if you reduce a loop containing > OR or MUL expressions, the vect_create_epilog_for_reduction reduces these > using shifts, and I think will also use shifts for platforms not possessing a > reduc_plus/min/max. If shifts could be changed for rotates, the code there > would do your reduce-to-a-vector-of-identical-elements in the midend...can we > (sensibly!) bring all of these together? > > > Perhaps the practical thing is to have the vectorizer also do an > > add_stmt_cost with some new token that indicates the cost model should > > make an adjustment if the back end doesn't need the extract/broadcast. > > Targets like PowerPC and AArch32 could then subtract the unnecessary > > cost, and remove the unnecessary code in simplify-rtx. > > I think it'd be good if we could do it before simplify-rtx, really, although > I'm not sure I have a strong argument as to why, as long as we can cost it > appropriately. > > > In any case, I will remove implementing the deprecated optabs, and I'll > > also try to look at Alan L's patch shortly. > > That'd be great, thanks :) As for the cost modeling the simple solution is of course to have these as "high-level" costs (a new enum entry), but the sustainable solution is to have the target see the vectorized code and decide for itself. Iff we'd go down the simple route then the target may as well create the
[patch committed SH] Fix build failure
I've committed the attached obvious fix for build failure for SH. object_allocator is changed so to remove the 2nd argument of its constructor. Regards, kaz -- 2015-09-17 Kaz Kojima* config/sh/sh.c (label_ref_list_d_pool): Adjust to object_allocator change. diff --git a/config/sh/sh.c b/config/sh/sh.c index 25149a6..ec0abc5 100644 --- a/config/sh/sh.c +++ b/config/sh/sh.c @@ -4659,7 +4659,7 @@ typedef struct label_ref_list_d } *label_ref_list_t; static object_allocator label_ref_list_d_pool - ("label references list", 30); + ("label references list"); /* The SH cannot load a large constant into a register, constants have to come from a pc relative load. The reference of a pc relative load
Re: [PATCH] Convert SPARC to LRA
From: David MillerDate: Mon, 14 Sep 2015 11:30:29 -0700 (PDT) > There are some other issues I'm having troubles resolving for 64-bit > native bootstraps as well, and I am probably going to revert the LRA > sparc changes unless I can resolve them by the end of today. So I was actually able to resolve these problems, and committed the following patch. The big take-aways are: 1) Since we can only access SFmode/SImode/etc. in FP_REGS but not necessarily in EXTRA_FP_REGS, we have to tell LRA that secondary memory is needed when moving such a mode between those two classes. 2) HARD_REGNO_CALLER_SAVE_MODE's default should really be reconsidered. If the caller has an explicit mode in mind we should just use it instead just going to choose_hard_reg_mode(). That causes stupid things like using a DFmode register to spill an SFmode value and all the subregging that results from that. 3) It's amazing how much we get away with in RELOAD, particular wrt. to addressing. Here all of the "(set x (HIGH (SYMBOLIC...)))" were always available even when flag_pic and this was never noticed before. [PATCH] Fix LRA regressions on 64-bit SPARC. gcc/ * config/sparc/sparc-protos.h (sparc_secondary_memory_needed): Declare. * config/sparc/sparc.c (sparc_secondary_memory_needed): New function. * config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): Use it. (HARD_REGNO_CALLER_SAVE_MODE): Define. * config/sparc/sparc.md (sethi_di_medlow, losum_di_medlow, seth44) (setm44, setl44, sethh, setlm, sethm, setlo, embmedany_sethi) (embmedany_losum, embmedany_brsum, embmedany_textuhi) (embmedany_texthi, embmedany_textulo, embmedany_textlo): Do not provide when flag_pic. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@227847 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 14 ++ gcc/config/sparc/sparc-protos.h | 2 ++ gcc/config/sparc/sparc.c| 20 gcc/config/sparc/sparc.h| 14 +- gcc/config/sparc/sparc.md | 32 5 files changed, 61 insertions(+), 21 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 76566ca..42faf2e 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,17 @@ +2015-09-17 David S. Miller + + * config/sparc/sparc-protos.h (sparc_secondary_memory_needed): + Declare. + * config/sparc/sparc.c (sparc_secondary_memory_needed): New + function. + * config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED): Use it. + (HARD_REGNO_CALLER_SAVE_MODE): Define. + * config/sparc/sparc.md (sethi_di_medlow, losum_di_medlow, seth44) + (setm44, setl44, sethh, setlm, sethm, setlo, embmedany_sethi) + (embmedany_losum, embmedany_brsum, embmedany_textuhi) + (embmedany_texthi, embmedany_textulo, embmedany_textlo): Do not + provide when flag_pic. + 2015-09-17 Kaz Kojima * config/sh/sh.c (label_ref_list_d_pool): Adjust to diff --git a/gcc/config/sparc/sparc-protos.h b/gcc/config/sparc/sparc-protos.h index 1431437..18192fd 100644 --- a/gcc/config/sparc/sparc-protos.h +++ b/gcc/config/sparc/sparc-protos.h @@ -62,6 +62,8 @@ extern bool constant_address_p (rtx); extern bool legitimate_pic_operand_p (rtx); extern rtx sparc_legitimize_reload_address (rtx, machine_mode, int, int, int, int *win); +extern bool sparc_secondary_memory_needed (enum reg_class, enum reg_class, + machine_mode); extern void load_got_register (void); extern void sparc_emit_call_insn (rtx, rtx); extern void sparc_defer_case_vector (rtx, rtx, int); diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c index b41800c..f4ad68d 100644 --- a/gcc/config/sparc/sparc.c +++ b/gcc/config/sparc/sparc.c @@ -12283,6 +12283,26 @@ sparc_expand_vector_init (rtx target, rtx vals) emit_move_insn (target, mem); } +bool sparc_secondary_memory_needed (enum reg_class class1, enum reg_class class2, + machine_mode mode) +{ + if (FP_REG_CLASS_P (class1) != FP_REG_CLASS_P (class2)) +{ + if (! TARGET_VIS3 + || GET_MODE_SIZE (mode) > 8 + || GET_MODE_SIZE (mode) < 4) + return true; + return false; +} + + if (GET_MODE_SIZE (mode) == 4 + && ((class1 == FP_REGS && class2 == EXTRA_FP_REGS) + || (class1 == EXTRA_FP_REGS && class2 == FP_REGS))) +return true; + + return false; +} + /* Implement TARGET_SECONDARY_RELOAD. */ static reg_class_t diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h index 8343671..1f26232 100644 --- a/gcc/config/sparc/sparc.h +++ b/gcc/config/sparc/sparc.h @@ -747,6 +747,12 @@ extern int sparc_mode_class[]; register window instruction in the prologue. */ #define
Re: [PATCH, ARM]: Fix static interworking call
On 17/09/15 09:46, Christian Bruel wrote: As obvious, bad operand number. OK for trunk ? Christian p1.patch 2015-09-18 Christian Bruel* config/arm/arm.md (*call_value_symbol): Fix operand for interworking. 2015-09-18 Christian Bruel * gcc.target/arm/attr_thumb-static2.c: New test. Ok, assuming testing is clean. Thanks, Kyrill
Re: Openacc launch API
Since Jakub appears to be busy, I'll give my 2 cents. On 08/25/2015 03:29 PM, Nathan Sidwell wrote: I did rename the GOACC_parallel entry point to GOACC_parallel_keyed and provide a forwarding function. However, as the mkoffload data is incompatible, this is probably overkill. I've had to increment the (just committed) version number to detect the change in data representation. So any attempt to run an old binary with a new libgomp will fail at the loading point. Fail how? Jakub has requested that it works but falls back to unaccelerated execution, can you confirm this is what you expect to happen with this patch? +/* Varadic launch arguments. */ +#define GOMP_LAUNCH_END0 /* End of args, no dev or op */ +#define GOMP_LAUNCH_DIM1 /* Launch dimensions, op = mask */ +#define GOMP_LAUNCH_ASYNC 2 /* Async, op = cst val if not MAX */ +#define GOMP_LAUNCH_WAIT 3 /* Waits, op = num waits. */ +#define GOMP_LAUNCH_CODE_SHIFT 28 +#define GOMP_LAUNCH_DEVICE_SHIFT 16 +#define GOMP_LAUNCH_OP_SHIFT 0 +#define GOMP_LAUNCH_PACK(CODE,DEVICE,OP) \ + (((CODE) << GOMP_LAUNCH_CODE_SHIFT)\ + | ((DEVICE) << GOMP_LAUNCH_DEVICE_SHIFT) \ + | ((OP) << GOMP_LAUNCH_OP_SHIFT)) +#define GOMP_LAUNCH_CODE(X) (((X) >> GOMP_LAUNCH_CODE_SHIFT) & 0xf) +#define GOMP_LAUNCH_DEVICE(X) (((X) >> GOMP_LAUNCH_DEVICE_SHIFT) & 0xfff) +#define GOMP_LAUNCH_OP(X) (((X) >> GOMP_LAUNCH_OP_SHIFT) & 0x) +#define GOMP_LAUNCH_OP_MAX 0x I probably would have used something simpler, like a code/device/op argument triplet, but I guess this ok. - if (num_waits) + va_start (ap, kinds); + /* TODO: This will need amending when device_type is implemented. */ I'd expect that this will check whether the device type in the argument is either zero (or whatever indicates all devices) or matches the current device. Is that what you intend? + while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0) +!= (tag = va_arg (ap, unsigned))) That's a somewhat non-idiomatic way to write this, with the constant first and not obviously a constant. I'd initialize a variable with the constant before the loop. + assert (!GOMP_LAUNCH_DEVICE (tag)); Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a bit more gracefully? (Alternatively, implement the device_type stuff now so that we don't have TODOs in the code and don't have to worry about compatibility issues.) + if (num_waits > 8) +gomp_fatal ("Too many waits for legacy interface"); How did you arrive at this number? +GOACC_2.0,1 { + global: + GOACC_parallel_keyed; +} GOACC_2.0; Did you mean to use a comma? + if (dims[GOMP_DIM_GANG] != 1) +GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported", + dims[GOMP_DIM_GANG]); + if (dims[GOMP_DIM_WORKER] != 1) +GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported", + dims[GOMP_DIM_WORKER]); I see that this is just moved here (which is good), but is this still a limitation? Or is that on trunk only? + for (comma = "", id = var_ids; id; comma = ",", id = id->next) +fprintf (out, "%s\n\t%s", comma, id->ptx_name); The comma trick is new to me, I'll have to remember this one. +static void +set_oacc_fn_attrib (tree fn, tree clauses, vec *args) +tree +get_oacc_fn_attrib (tree fn) These need function comments. +{ + /* Must match GOMP_DIM ordering. */ + static const omp_clause_code ids[] = +{OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH}; Formatting. No = at the end of a line, and whitespace around braces. @@ -9150,6 +9245,7 @@ expand_omp_target (struct omp_region *re } gimple g; + bool tagging = false; /* The maximum number used by any start_ix, without varargs. */ That looks misindented, but may be an email client thing. + else if (!tagging) Oh... so tagging controls two different methods for constructing argument lists, one for GOACC_parallel and the other for whatever OMP uses? That's a bit unfortunate, I'll need to think about it for a bit or defer to Jakub. Looks reasonable otherwise. Bernd
Re: [PATCH v2][GCC] Algorithmic optimization in match and simplify
On Thu, Sep 3, 2015 at 1:11 PM, Andre Vieirawrote: > On 01/09/15 15:01, Richard Biener wrote: >> >> On Tue, Sep 1, 2015 at 3:40 PM, Andre Vieira >> wrote: >>> >>> Hi Marc, >>> >>> On 28/08/15 19:07, Marc Glisse wrote: (not a review, I haven't even read the whole patch) On Fri, 28 Aug 2015, Andre Vieira wrote: > 2015-08-03 Andre Vieira > >* match.pd: Added new patterns: > ((X {&,<<,>>} C0) {|,^} C1) {^,|} C2) > (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} > C1) +(for op0 (rshift rshift lshift lshift bit_and bit_and) + op1 (bit_ior bit_xor bit_ior bit_xor bit_ior bit_xor) + op2 (bit_xor bit_ior bit_xor bit_ior bit_xor bit_ior) You can nest for-loops, it seems clearer as: (for op0 (rshift lshift bit_and) (for op1 (bit_ior bit_xor) op2 (bit_xor bit_ior) >>> >>> >>> >>> Will do, thank you for pointing it out. +(simplify + (op2:c + (op1:c + (op0 @0 INTEGER_CST@1) INTEGER_CST@2) INTEGER_CST@3) I suspect you will want more :s (single_use) and less :c (canonicalization should put constants in second position). >>> I can't find the definition of :s (single_use). >> >> >> Sorry for that - didn't get along updating it yet :/ It restricts >> matching to >> sub-expressions that have a single-use. So >> >> + a &= 0xd123; >> + unsigned short tem = a ^ 0x6040; >> + a = tem | 0xc031; /* Simplify _not_ to ((a & 0xd123) | 0xe071). */ >> ... use of tem ... >> >> we shouldn't do the simplifcation here because the expression >> (a & 0x123) ^ 0x6040 is kept live by 'tem'. >> >>> GCC internals do point out >>> that canonicalization does put constants in the second position, didnt >>> see >>> that first. Thank you for pointing it out. >>> + C1 = wi::bit_and_not (C1,C2); Space after ','. >>> Will do. >>> Having wide_int_storage in many places is surprising, I can't find similar code anywhere else in gcc. >>> >>> I tried looking for examples of something similar, I think I ended up >>> using >>> wide_int because I was able to convert easily to and from it and it has >>> the >>> "mask" and "wide_int_to_tree" functions. I welcome any suggestions on >>> what I >>> should be using here for integer constant transformations and >>> comparisons. >> >> >> Using wide-ints is fine, but you shouldn't need 'wide_int_storage' >> constructors - those >> are indeed odd. Is it just for the initializers of wide-ints? >> >> +wide_int zero_mask = wi::zero (prec); >> +wide_int C0 = wide_int_storage (@1); >> +wide_int C1 = wide_int_storage (@2); >> +wide_int C2 = wide_int_storage (@3); >> ... >> + zero_mask = wide_int_storage (wi::mask (C0.to_uhwi (), false, >> prec)); >> >> tree_to_uhwi (@1) should do the trick as well >> >> + C1 = wi::bit_and_not (C1,C2); >> + cst_emit = wi::bit_or (C1, C2); >> >> the ops should be replacable with @2 and @3, the store to C1 obviously not >> (but you can use a tree temporary and use wide_int_to_tree here to avoid >> the back-and-forth for the case where C1 is not assigned to). >> >> Note that transforms only doing association are prone to endless recursion >> in case some other pattern does the reverse op... >> >> Richard. >> >> >>> BR, >>> Andre >>> >> > Thank you for all the comments, see reworked version: > > Two new algorithmic optimisations: > 1.((X op0 C0) op1 C1) op2 C2) > with op0 = {&, >>, <<}, op1 = {|,^}, op2 = {|,^} and op1 != op2 > zero_mask has 1's for all bits that are sure to be 0 in (X op0 C0) > and 0's otherwise. > if (op1 == '^') C1 &= ~C2 (Only changed if actually emitted) > if ((C1 & ~zero_mask) == 0) then emit (X op0 C0) op2 (C1 op2 C2) > if ((C2 & ~zero_mask) == 0) then emit (X op0 C0) op1 (C1 op2 C2) > 2. (X {|,^,&} C0) {<<,>>} C1 -> (X {<<,>>} C1) {|,^,&} (C0 {<<,>>} C1) > > > This patch does two algorithmic optimisations that target patterns like: > (((x << 24) | 0x00FF) ^ 0xFF00) and ((x ^ 0x4002) >> 1) | > 0x8000. > > The transformation uses the knowledge of which bits are zero after > operations like (X {&,<<,(unsigned)>>}) to combine constants, reducing > run-time operations. > The two examples above would be transformed into (X << 24) ^ 0x and > (X >> 1) ^ 0xa001 respectively. > > The second transformation enables more applications of the first. Also some > targets may benefit from delaying shift operations. I am aware that such an > optimization, in combination with one or more optimizations that cause the > reverse transformation, may lead to an infinite loop. Though such behavior > has not been detected during regression testing and bootstrapping on > aarch64. +/* (X bit_op C0) rshift C1 -> (X rshift C0) bit_op (C0 rshift C1)
[PATCH] Improve genmatch errors
For a misplaced :s like (simplify (plus (minus@1:s @1 @2) (minus @2 @0)) (plus @1 @0)) we were giving test.pd:2:16 error: not implemented: predicate on leaf operand (plus (minus@1:s @1 @2) (minus @2 @0)) ^ which is at least confusing. The following patch improves this to test.pd:2:16 error: expected expression operand (plus (minus@1:s @1 @2) (minus @2 @0)) ^ also handling any other garbage after operands and rejecting test.pd:2:21 error: expected expression operand (plus (minus:s@1 @1@2) (minus @2 @0)) ^ which was previously accepted. Bootstrap on x86_64-unknown-linux-gnu in progress, no changes in generated files. Richard. 2015-09-17 Richard Biener* genmatch.c (parser::parse_expr): Improve error message for mis-placed flags. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 227779) +++ gcc/genmatch.c (working copy) @@ -3857,6 +3858,9 @@ parser::parse_expr () e->expr_type = expr_type; return op; } + else if (!(token->flags & PREV_WHITE)) + fatal_at (token, "expected expression operand"); + e->append_op (parse_op ()); } while (1);
Re: [PATCH] Target hook for disabling the delay slot filler.
On 09/17/2015 11:52 AM, Simon Dardis wrote: The profitability of using an ordinary branch over a delay slot branch depends on how the delay slot is filled. If a delay slot can be filled from an instruction preceding the branch or instructions proceeding that must be executed on both sides then it is profitable to use a delay slot branch. For cases when instructions are chosen from one side of the branch, the proposed optimization strategy is to not speculatively execute instructions when ordinary branches could be used. Performance-wise this avoids executing instructions which the eager delay filler picked wrongly. Since most branches have a compact form disabling the eager delay filler should be no worse than altering it not to fill delay slots in this case. Ok, so in that case I think the patch would be reasonable if the target hook was named appropriately to say something like don't speculate when filling delay slots. It looks like fill_eager_delay_slots always speculates, so you needn't change your approach in reorg.c. Possibly place the hook after rtx_cost/address_cost in target.def since it's cost related. Bernd
Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully
Hi Kyrill, > On 11/09/15 09:51, Rainer Orth wrote: >> Kyrill Tkachovwrites: >> >>> On 10/09/15 12:43, Rainer Orth wrote: Hi Kyrill, > Rainer, could you please check that this patch still fixes the SPARC > regressions? unfortunately, it breaks sparc-sun-solaris2.10 bootstrap: compiling stage2 libiberty/regex.c FAILs: >>> Thanks for providing the preprocessed file. >>> I've reproduced and fixed the ICE in this version of the patch. >>> The problem was that I was taking the mode of x before the check >>> of whether a and b are MEMs, after which we would change x to an >>> address_mode reg, >>> thus confusing emit_move_insn. >>> >>> The fix is to take the mode of x and perform the >>> can_conditionally_move_p check >>> after that transformation. >>> >>> Bootstrapped and tested on aarch64 and x86_64. >>> The preprocessed regex.i that Rainer provided now compiles successfully >>> for me >>> on a sparc-sun-solaris2.10 stage-1 cross-compiler. >>> >>> Rainer, thanks for your help so far, could you please try out this patch? >> While bootstrap succeeds again, the testsuite regression in >> gcc.c-torture/execute/20071216-1.c reoccured. > Right, so I dug into the RTL dumps and I think this is a separate issue > that's being exacerbated by my patch. > The code tries to if-convert a block which contains a compare instruction > i.e. sets the CC register. > Now, bb_valid_for_noce_process_p should have caught this, and in particular > insn_valid_noce_process_p > which should check that the instruction doesn't set the CC > register. However, on SPARC the > cc_in_cond returns NULL! This is due to the canonicalize_comparison > implementation that seems to > remove the CC register from the condition expression and returns something > like: > (leu (reg/v:SI 109 [ b ]) > (const_int -4096 [0xf000]) > > Therefore the set_of (cc_in_cond (cond), insn) check doesn't get triggered > because cc_in_cond returns NULL. > Regardless of how the branch condition got canonicalized, I think we still > want to reject any insn in the block > that sets a condition code register, so this patch checks the destination > of every set in the block for a MODE_CC > expression and cancels if-conversion if that's the case. > > Oleg pointed me to the older PR 58517 affecting SH which seems similar and > I think my previous ifcvt patch would expose > this problem more. > > Anyway, with this patch the failing SPARC testcase > gcc.c-torture/execute/20071216-1.c generates the same assembly > as before r227368 and bootstrap and test on aarch64 and x86_64 passes ok for > me. > > Rainer, could you try this patch on top of the previous patch? > (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html) > The two together should fix all of PR 67456, 67464, 67465 and 67481. sorry this took me so long: we've had a major switch failure and my sparc machines are way slow. Anyway, here's what I found: the two patches on top of each other do bootstrap just fine and the gcc.c-torture/execute/20071216-1.c regressions are gone. However, it introduces a new one: FAIL: gcc.dg/torture/stackalign/sibcall-1.c -O1 -fpic execution test It fails for both 32 and 64-bit. The testcase SEGVs: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1 (LWP 1)] 0x00010bb0 in ix86_split_ashr (mode=1) at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:20 20: gen_x86_64_shrd) (0); (gdb) where #0 0x00010bb0 in ix86_split_ashr (mode=1) at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:20 #1 0x00010be4 in main () at /vol/gcc/src/hg/trunk/local/gcc/testsuite/gcc.dg/torture/stackalign/sibcall-1.c:27 1: x/i $pc => 0x10bb0 :ld [ %g1 ], %g1 (gdb) p/x $g1 $1 = 0x317f8 truss reveals: Incurred fault #6, FLTBOUNDS %pc = 0x00010BB0 siginfo: SIGSEGV SEGV_MAPERR addr=0x000317F8 Received signal #11, SIGSEGV [default] siginfo: SIGSEGV SEGV_MAPERR addr=0x000317F8 with #define FLTBOUNDS 6 /* Memory bounds (invalid address) */ and indeed that address isn't mapped according to ro@apoc 58 > pmap 26536 26536: /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe 0001 8K r-x-- /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe 0002 8K rwx-- /var/gcc/regression/trunk/11-gcc/build/gcc/testsuite/gcc/sibcall-1.exe FEE6 696K r-x-- /lib/libm.so.2 FEF1C000 16K rwx-- /lib/libm.so.2 FF181464K r-x-- /lib/libc.so.1 FF2FE000 48K rwx-- /lib/libc.so.1 FF35 24K rwx--[ anon ] FF36 8K rw---[ anon ] FF37 8K rw---[ anon ] FF38 8K rw---[ anon ] FF39 8K rw---[ anon ] FF3A 248K r-x-- /lib/ld.so.1 FF3EE000 16K rwx-- /lib/ld.so.1 FFBFC000 16K rwx--[ stack ] total
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Sun, 13 Sep 2015, Mark Wielaard wrote: > Slightly adjusted patch attached. Now it is explicit that the warning is > enabled by -Wunused-variable for C, but not C++. There are testcases for > both C and C++ to check the defaults. And the hardcoded override is > removed for C++, so the user could enable it if they want. I believe making -Wunused-const-variable part of -Wall is not a good idea. For example, I have a nightly build of Wine with a nightly build of GCC. And my notifaction mail went from twently lines of warning to 6500 -- all coming from header files. Gerald
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 16/09/15 17:42 -0600, Martin Sebor wrote: I see now the first exists test will detect symlink loops in the original path. But I'm not convinced there isn't a corner case that's subject to a TOCTOU race condition between the first exists test and the while loop during which a symlink loop can be introduced. Suppose we call the function with /foo/bar as an argument and the path exists and contains no symlinks. result is / and cmpts is set to { foo, bar }. Just as the loop is entered, /foo/bar is replaced with a symlink containing /foo/bar. The loop then proceeds like so: 1. The first iteration removes foo from cmpts and sets result to /foo. cmpts is { bar }. 2. The second iteration removes bar from cmpts, sets result to /foo/bar, determines it's a symlink, reads its contents, sees it's an absolute pathname and replaces result with /. It then inserts the symlink's components { foo, bar } into cmpts. cmpts becomes { foo, bar }. exists(result) succeeds. 3. The next iteration of the loop has the same initial state as the first. But I could have very easily missed something that takes care of this corner case. If I did, sorry for the false alarm! No, you're right. The TS says such filesystem races are undefined: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4099.html#fs.race.behavior but it would be nice to fail gracefully rather than DOS the application. The simplest approach would be to increment a counter every time we follow a symlink, and if it reaches some limit decide something is wrong and fail with ELOOP. I don't see how anything else can be 100% bulletproof, because a truly evil attacker could just keep altering the contents of symlinks so we keep ping-ponging between two or more paths. If we keep track of paths we've seen before the attacker could just keep changing the contents to a unique path each time, that initially exists as a file, but by the time we get to is_symlink() its become a symlink to a new path. So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in the value the kernel uses? 20 seems quite low, I was thinking of a much higher number.
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 17/09/15 12:16 +0100, Jonathan Wakely wrote: So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in the value the kernel uses? 20 seems quite low, I was thinking of a much higher number. Until very recently Linux seemed to hardcode it to 40: https://github.com/torvalds/linux/blob/v4.1/fs/namei.c#L865 That changed in v4.2 and I'm not sure what it does not, if this is the relevant bit of code it uses MAXSYMLINKS: https://github.com/torvalds/linux/blob/v4.2/fs/namei.c#L1647
Re: [PATCH] Fix default_binds_local_p_2 for extern protected data
ping 2. this patch is needed for working visibility ("protected") attribute for extern data on targets using default_binds_local_p_2. https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01871.html On 10/08/15 12:04, Szabolcs Nagy wrote: ping. On 22/07/15 18:01, Szabolcs Nagy wrote: The commit https://gcc.gnu.org/viewcvs/gcc?view=revision=222184 changed a true to false in varasm.c: bool default_binds_local_p_2 (const_tree exp) { - return default_binds_local_p_3 (exp, flag_shlib != 0, true, true); + return default_binds_local_p_3 (exp, flag_shlib != 0, true, false, + !flag_pic); } where default_binds_local_p_3 (const_tree exp, bool shlib, bool weak_dominate, -bool extern_protected_data) +bool extern_protected_data, bool common_local_p) { false means that extern protected data binds locally, which is wrong if the target can have copy relocations against it (then the address must be loaded from GOT otherwise the main executable will see different address). Currently S/390, ARM and AArch64 targets use this predicate and the current default is wrong for all of them (they can have copy relocs) so I changed the default instead of doing it in a target specific way. The equivalent x86_64 bug was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65248 the default was changed for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65780 now i opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66912 for arm and aarch64. Needs a further binutils patch too to emit R_*_GLOB_DAT instead of R_*_RELATIVE relocs for protected data. The glibc elf/tst-protected1a and elf/tst-protected1b tests depend on this. Tested ARM and AArch64 targets. gcc/ChangeLog: 2015-07-22 Szabolcs NagyPR target/66912 * varasm.c (default_binds_local_p_2): Turn on extern_protected_data. gcc/testsuite/ChangeLog: 2015-07-22 Szabolcs Nagy PR target/66912 * gcc.target/aarch64/pr66912.c: New. * gcc.target/arm/pr66912.c: New.
Re: [patch match.pd c c++]: Ignore results of 'shorten_compare' and move missing patterns in match.pd
2015-09-17 11:12 GMT+02:00 Richard Biener: > On Wed, Sep 16, 2015 at 9:51 PM, Jeff Law wrote: >> On 09/15/2015 03:42 AM, Kai Tietz wrote: > > 2015-09-08 Kai Tietz > > * gcc.dg/tree-ssa/vrp23.c: Adjust testcase to reflect that > pattern is matching now already within forward-propagation pass. > * gcc.dg/tree-ssa/vrp24.c: Likewise. So for the new patterns, I would have expected testcases to ensure they're getting used. >>> >>> >>> We were talking about that. My approach was to disable shortening >>> code and see what regressions we get. For C++ our test-coverage isn't >>> that good, as we didn't had here any regressions, but for C testcases >>> we got some. Eg the testcase vrp25.c is one of them >> >> But as I noted, I think that simply looking at testsuite regressions after >> disabling shorten_compare is not sufficient as I don't think we have >> significant coverage of this code. >> >>> >>> By disabling "shorten_compare" one of most simple testcases, which is >>> failing, is: >>> >>> int foo (short s, short t) >>> { >>>int a = (int) s; >>>int b = (int) t; >>> >>>return a >= b; >>> } >> >> And I would argue it's precisely this kind of stuff we should be building >> tests around and resolving prior to actually disabling shorten_compare. >> >> >>> >>> Where we miss to do narrow down SSA-tree comparison to simply s >= t; >>> If we disable fre-pass ( -fno-tree-fre) for current trunk, we can see >>> that this pattern gets resolved in forward-propagation pass due >>> invocation of shorten_compare. >>> >>> Another simple one (with disabled "shorten_compare") is: >>> >>> The following testcase: >>> >>> int foo (unsigned short a) >>> { >>>unsigned int b = 0; >>>return (unsigned int) a) < b; >>> } >>> >>> Here we lacked the detection of ((unsigned int) a) < b being a < 0 >>> (which is always false). >> >> And again, this deserves a test and resolving prior to disabling >> shorten_compare. >> >> >> >>> In *theory* one ought to be able to look at the dumps or .s files before after this patch for a blob of tests and see that nothing significant has changed. Unfortunately, so much changes that it's hard to evaluate if this patch is a step forward or a step back. >>> >>> >>> Hmm, actually we deal here with obvious patterns from >>> "shorten_compare". Of interest are here the narrowing-lines on top of >>> this function, which we need to reflect in match.pd too, before we can >>> deprecate it. I don't get that there are so much changes? This are in >>> fact just 3 basic patterns '(convert) X (convert) Y', >>> '((convert) X) CST', and a more specialized variant for the last >>> pattern for '==/!=' case. >>> I wonder if we'd do better to first add new match.pd patterns, one at a time, with tests, and evaluating them along the way by looking at the dumps or .s files across many systems. Then when we think we've got the right set, then look at what happens to those dumps/.s files if we make the changes so that shorten_compare really just emits warnings. >>> >>> >>> As the shorten_compare function covers those transformations, I don't >>> see that this is something leading to something useful. As long as we >>> call shorten_compare on folding in FEs, we won't see those patterns in >>> ME that easy. And even more important is, that patterns getting >>> resolved if function gets invoked. >> >> It will tell you what will be missed from a code generation standpoint if >> you disable shorten_compare. And that is the best proxy we have for >> performance regressions related to disabling shorten_compare. >> >> ie, in a local tree, you disable shorten_compare. Then compare the code we >> generate with and without shorten compare. Reduce to a simple testcase. >> Resolve the issue with a match.pd pattern (most likely) and repeat the >> process. In theory at each step the number of things to deeply investigate >> should be diminishing while at the same time you're building match.pd >> patterns and tests we can include in the testsuite. And each step is likely >> a new patch in the patch series -- the final patch of which disables >> shorten_compare. >> >> It's a lot of work, but IMHO, it's necessary to help ensure we don't have >> code generation regressions. > > As said in the other reply I successfully used gcc_unreachable ()s in > code in intend to remove and then inspect/fix all fallout from that during > bootstrap & test ... Yes, that would be fine, if shorten_compare would be part of classical fold-machinery. But it use isn't there. It gets just used in frontend's binary-operation generation in C/C++'s build_binary_op function. As we don't want (and can) remove shorten_compare completely, due we still need atleast its diagnostics, using of gcc_unreachable () doesn't look suitable. Also the request to
Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
On Wed, Sep 16, 2015 at 5:45 PM, Manuel López-Ibáñezwrote: > On 16 September 2015 at 15:33, Richard Biener > wrote: >> On Wed, Sep 16, 2015 at 3:22 PM, Michael Matz wrote: if we suggest 'foo' instead of foz then we'll get a more confusing followup error if we actually use it. >>> >>> This particular case could be solved by ruling out candidaten of the wrong >>> kind (here, something that can be assigned to, vs. a function). But it >>> might actually be too early in parsing to say that there will be an >>> assignment. I don't think _this_ problem should block the patch. > > Indeed. The patch by David does not try to fix-up the code, it merely > suggests a possible candidate. The follow-up errors should be the same > before and after. Such suggestions will never be 100% right, even if > the suggestion makes the code compile and run, it may still be the > wrong one. A wrong suggestion is far less serious than a wrong > uninitialized or Warray-bounds warning and we can live with those. Why > this needs to be perfect from the very beginning? > > BTW, there is a PR for this: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52277 > >> I wonder if we can tentatively parse with the choice at hand, only allowing >> (and even suggesting?) it if that works out. > > This would require to queue the error, fix-up the wrong name and > continue parsing. If there is another error, ignore that one and emit > the original error without suggestion. The problem here is that we do > not know if the additional error is actually caused by the fix-up we > did or it is an already existing error. It would be equally terrible > to emit errors caused by the fix-up or emit just a single error for > the typo. We would need to roll-back the tentative parse and do a > definitive parse anyway. This does not seem possible at the moment > because the parsers maintain a lot of global state that is not easy to > roll-back. We cannot simply create a copy of the parser state and > throw it away later to continue as if the tentative parse has not > happened. > > I'm not even sure if, in general, one can stop at the statement level > or we would need to parse the whole function (or translation unit) to > be able to tell if the suggestion is a valid candidate. I was suggesting to only tentatively finish parsing the "current construct". No idea how to best figure that out to the extend to make the tentative parse useful. Say, if we have "a + s.foz" and the field foz is not there but foo is, so if we continue parsing with 'foo' instead but 'foo' will have a type that makes "a + s.foo" invalid then we probably shouldn't suggest it. It _might_ be reasonably "easy" to implement that, but I'm not sure. There might be a field named fz (with same or bigger levenstein distance) with the correct type. Of course it might have been I misspelled 's' and meant 'r' instead which has a field foz of corect type... (and 's' is available as well). I agree that we don't have to solve all this in the first iteration. Richard. > Cheers, > > Manuel.
****ping****: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux
Could somebody review this please? Thanks Paul -- Forwarded message -- From: Paul Richard ThomasDate: 6 September 2015 at 18:40 Subject: Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux To: Dominique Dhumieres , "fort...@gcc.gnu.org" , gcc-patches It helps to attach the patch :-) Paul On 6 September 2015 at 13:42, Paul Richard Thomas wrote: > Dear All, > > The attached patch more or less implements the assignment of > expressions to the result of a pointer function. To wit: > > my_ptr_fcn (arg1, arg2...) = expr > > arg1 would usually be the target, pointed to by the function. The > patch parses these statements and resolves them into: > > temp_ptr => my_ptr_fcn (arg1, arg2...) > temp_ptr = expr > > I say more or less implemented because I have ducked one of the > headaches here. At the end of the specification block, there is an > ambiguity between statement functions and pointer function > assignments. I do not even try to resolve this ambiguity and require > that there be at least one other type of executable statement before > these beasts. This can undoubtedly be fixed but the effort seems to me > to be unwarranted at the present time. > > This version of the patch extends the coverage of allowed rvalues to > any legal expression. Also, all the problems with error.c have been > dealt with by Manuel's patch. > > I am grateful to Dominique for reminding me of PR40054 and pointing > out PR63921. After a remark of his on #gfortran, I fixed the checking > of the standard to pick up all the offending lines with F2003 and > earlier. > > > Bootstraps and regtests on FC21/x86_64 - OK for trunk? > > Cheers > > Paul > > 2015-09-06 Paul Thomas > > PR fortran/40054 > PR fortran/63921 > * decl.c (get_proc_name): Return if statement function is > found. > * match.c (gfc_match_ptr_fcn_assign): New function. > * match.h : Add prototype for gfc_match_ptr_fcn_assign. > * parse.c : Add static flag 'in_specification_block'. > (decode_statement): If in specification block match a statement > function, otherwise if standard embraces F2008 try to match a > pointer function assignment. > (parse_interface): Set 'in_specification_block' on exiting from > parse_spec. > (parse_spec): Set and then reset 'in_specification_block'. > (gfc_parse_file): Set 'in_specification_block'. > * resolve.c (get_temp_from_expr): Extend to include other > expressions than variables and constants as rvalues. > (resolve_ptr_fcn_assign): New function. > (gfc_resolve_code): Call resolve_ptr_fcn_assign. > * symbol.c (gfc_add_procedure): Add a sentence to the error to > flag up the ambiguity between a statement function and pointer > function assignment at the end of the specification block. > > 2015-09-06 Paul Thomas > > PR fortran/40054 > PR fortran/63921 > * gfortran.dg/fmt_tab_1.f90: Change from run to compile and set > standard as legacy. > * gfortran.dg/ptr_func_assign_1.f08: New test. > * gfortran.dg/ptr_func_assign_2.f08: New test. -- Outside of a dog, a book is a man's best friend. Inside of a dog it's too dark to read. Groucho Marx -- Outside of a dog, a book is a man's best friend. Inside of a dog it's too dark to read. Groucho Marx Index: gcc/fortran/decl.c === *** gcc/fortran/decl.c (revision 227508) --- gcc/fortran/decl.c (working copy) *** get_proc_name (const char *name, gfc_sym *** 901,906 --- 901,908 return rc; sym = *result; + if (sym->attr.proc == PROC_ST_FUNCTION) + return rc; if (sym->attr.module_procedure && sym->attr.if_source == IFSRC_IFBODY) Index: gcc/fortran/match.c === *** gcc/fortran/match.c (revision 227508) --- gcc/fortran/match.c (working copy) *** match *** 4886,4892 gfc_match_st_function (void) { gfc_error_buffer old_error; - gfc_symbol *sym; gfc_expr *expr; match m; --- 4886,4891 *** gfc_match_st_function (void) *** 4926,4931 --- 4925,4990 return MATCH_YES; undo_error: + gfc_pop_error (_error); + return MATCH_NO; + } + + + /* Match an assignment to a pointer function (F2008). This could, in +general be ambiguous with a statement function. In this implementation +it remains so if it is the first statement after the specification +block. */ + + match + gfc_match_ptr_fcn_assign (void) + { + gfc_error_buffer old_error; + locus old_loc; + gfc_symbol *sym; + gfc_expr *expr; + match m; + char name[GFC_MAX_SYMBOL_LEN + 1]; + + old_loc = gfc_current_locus; + m = gfc_match_name (name); +
Re: [patch match.pd c c++]: Ignore results of 'shorten_compare' and move missing patterns in match.pd
On Wed, Sep 16, 2015 at 9:51 PM, Jeff Lawwrote: > On 09/15/2015 03:42 AM, Kai Tietz wrote: 2015-09-08 Kai Tietz * gcc.dg/tree-ssa/vrp23.c: Adjust testcase to reflect that pattern is matching now already within forward-propagation pass. * gcc.dg/tree-ssa/vrp24.c: Likewise. >>> >>> >>> So for the new patterns, I would have expected testcases to ensure >>> they're >>> getting used. >> >> >> We were talking about that. My approach was to disable shortening >> code and see what regressions we get. For C++ our test-coverage isn't >> that good, as we didn't had here any regressions, but for C testcases >> we got some. Eg the testcase vrp25.c is one of them > > But as I noted, I think that simply looking at testsuite regressions after > disabling shorten_compare is not sufficient as I don't think we have > significant coverage of this code. > >> >> By disabling "shorten_compare" one of most simple testcases, which is >> failing, is: >> >> int foo (short s, short t) >> { >>int a = (int) s; >>int b = (int) t; >> >>return a >= b; >> } > > And I would argue it's precisely this kind of stuff we should be building > tests around and resolving prior to actually disabling shorten_compare. > > >> >> Where we miss to do narrow down SSA-tree comparison to simply s >= t; >> If we disable fre-pass ( -fno-tree-fre) for current trunk, we can see >> that this pattern gets resolved in forward-propagation pass due >> invocation of shorten_compare. >> >> Another simple one (with disabled "shorten_compare") is: >> >> The following testcase: >> >> int foo (unsigned short a) >> { >>unsigned int b = 0; >>return (unsigned int) a) < b; >> } >> >> Here we lacked the detection of ((unsigned int) a) < b being a < 0 >> (which is always false). > > And again, this deserves a test and resolving prior to disabling > shorten_compare. > > > >> >>> In *theory* one ought to be able to look at the dumps or .s files before >>> after this patch for a blob of tests and see that nothing significant has >>> changed. Unfortunately, so much changes that it's hard to evaluate if >>> this >>> patch is a step forward or a step back. >> >> >> Hmm, actually we deal here with obvious patterns from >> "shorten_compare". Of interest are here the narrowing-lines on top of >> this function, which we need to reflect in match.pd too, before we can >> deprecate it. I don't get that there are so much changes? This are in >> fact just 3 basic patterns '(convert) X (convert) Y', >> '((convert) X) CST', and a more specialized variant for the last >> pattern for '==/!=' case. >> >>> I wonder if we'd do better to first add new match.pd patterns, one at a >>> time, with tests, and evaluating them along the way by looking at the >>> dumps >>> or .s files across many systems. Then when we think we've got the right >>> set, then look at what happens to those dumps/.s files if we make the >>> changes so that shorten_compare really just emits warnings. >> >> >> As the shorten_compare function covers those transformations, I don't >> see that this is something leading to something useful. As long as we >> call shorten_compare on folding in FEs, we won't see those patterns in >> ME that easy. And even more important is, that patterns getting >> resolved if function gets invoked. > > It will tell you what will be missed from a code generation standpoint if > you disable shorten_compare. And that is the best proxy we have for > performance regressions related to disabling shorten_compare. > > ie, in a local tree, you disable shorten_compare. Then compare the code we > generate with and without shorten compare. Reduce to a simple testcase. > Resolve the issue with a match.pd pattern (most likely) and repeat the > process. In theory at each step the number of things to deeply investigate > should be diminishing while at the same time you're building match.pd > patterns and tests we can include in the testsuite. And each step is likely > a new patch in the patch series -- the final patch of which disables > shorten_compare. > > It's a lot of work, but IMHO, it's necessary to help ensure we don't have > code generation regressions. As said in the other reply I successfully used gcc_unreachable ()s in code in intend to remove and then inspect/fix all fallout from that during bootstrap & test ... Yeah, it's a lot of work (sometimes). But it's way easier than to investigate code changes (which may also miss cases as followup transforms may end up causing the same code to be generated). Richard. > > > >> >> So I would suggest here to disable shorten_compare invocation and >> cleanup with fallout detectable in C-FE's testsuite. > > But without deeper analysis at the start & patches+testcases for those > issues, we have a real risk of regressing the code we generate. > >> >>> My worry is that we get part way through the conversion and end up with >>> the >>> match.pd
RE: [PATCH] Target hook for disabling the delay slot filler.
The profitability of using an ordinary branch over a delay slot branch depends on how the delay slot is filled. If a delay slot can be filled from an instruction preceding the branch or instructions proceeding that must be executed on both sides then it is profitable to use a delay slot branch. For cases when instructions are chosen from one side of the branch, the proposed optimization strategy is to not speculatively execute instructions when ordinary branches could be used. Performance-wise this avoids executing instructions which the eager delay filler picked wrongly. Since most branches have a compact form disabling the eager delay filler should be no worse than altering it not to fill delay slots in this case. Thanks, Simon -Original Message- From: Jeff Law [mailto:l...@redhat.com] Sent: 15 September 2015 16:02 To: Bernd Schmidt; Simon Dardis; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Target hook for disabling the delay slot filler. On 09/15/2015 08:27 AM, Bernd Schmidt wrote: > On 09/15/2015 04:19 PM, Simon Dardis wrote: >> This patch adds a target hook for disabling the eager delay slot >> filler which when disabled can give better code. No new regressions. >> Ok to commit? > > Hmm. Whether a branch was filled by the simple or eager filler is an > implementation detail - is there some better way to describe which > kind of branch is profitable? And more importantly, it's far better to be able to describe when it is not profitable to use eager filling rather than just disabling it completely. Jeff
Re: Openacc launch API
On 09/17/15 05:36, Bernd Schmidt wrote: Fail how? Jakub has requested that it works but falls back to unaccelerated execution, can you confirm this is what you expect to happen with this patch? Yes, that is the failure mode. - if (num_waits) + va_start (ap, kinds); + /* TODO: This will need amending when device_type is implemented. */ I'd expect that this will check whether the device type in the argument is either zero (or whatever indicates all devices) or matches the current device. Is that what you intend? correct. + while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0) + != (tag = va_arg (ap, unsigned))) That's a somewhat non-idiomatic way to write this, with the constant first and not obviously a constant. I'd initialize a variable with the constant before the loop. Hm, yeah, that is a little unpleasant. The alternative is IIRC a mid-loop break. (If only this was C++ then we could write: while (int tag = va_arg (ap, unsigned)) { ... } relying on GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0) being zero. Maybe the C-ified version of that would be clearer?) + assert (!GOMP_LAUNCH_DEVICE (tag)); Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a I suppose. I was thinking of this as an internal interface from the compiler, but I guess down the road some one could try running a device_type implemented binary from a future compiler with an old libgomp. + if (num_waits > 8) +gomp_fatal ("Too many waits for legacy interface"); How did you arrive at this number? The voice in my head. I've only seen code that had up to 2 waits, so I figured 8 was way plenty. +GOACC_2.0,1 { + global: +GOACC_parallel_keyed; +} GOACC_2.0; Did you mean to use a comma? Probably. + if (dims[GOMP_DIM_GANG] != 1) +GOMP_PLUGIN_fatal ("non-unity num_gangs (%d) not supported", + dims[GOMP_DIM_GANG]); + if (dims[GOMP_DIM_WORKER] != 1) +GOMP_PLUGIN_fatal ("non-unity num_workers (%d) not supported", + dims[GOMP_DIM_WORKER]); I see that this is just moved here (which is good), but is this still a limitation? Or is that on trunk only? Trunk only. It'll go away shortly when more patches get merged. +static void +set_oacc_fn_attrib (tree fn, tree clauses, vec *args) +tree +get_oacc_fn_attrib (tree fn) These need function comments. oops. + /* Must match GOMP_DIM ordering. */ + static const omp_clause_code ids[] = +{OMP_CLAUSE_NUM_GANGS, OMP_CLAUSE_NUM_WORKERS, OMP_CLAUSE_VECTOR_LENGTH}; Formatting. No = at the end of a line, and whitespace around braces. oh, I thought intialization placed the = where I had -- didn't know that nor the space-brace rule. @@ -9150,6 +9245,7 @@ expand_omp_target (struct omp_region *re } gimple g; + bool tagging = false; /* The maximum number used by any start_ix, without varargs. */ That looks misindented, but may be an email client thing. It does, doesn't it. Appears to be email artifact or something. +else if (!tagging) Oh... so tagging controls two different methods for constructing argument lists, one for GOACC_parallel and the other for whatever OMP uses? That's a bit unfortunate, I'll need to think about it for a bit or defer to Jakub. It's the (new) difference between how the following 3 OpenACC builtins handle asyn & wait args. case BUILT_IN_GOACC_PARALLEL: { set_oacc_fn_attrib (child_fn, clauses, ); tagging = true; } /* FALLTHRU */ case BUILT_IN_GOACC_ENTER_EXIT_DATA: case BUILT_IN_GOACC_UPDATE: All 3 pass info about memory copies etc, async and optional waits. For E_E_D and UPDATE the async is always passed (with a special value for 'synchronous'), followed by a count and then variadic wait ints. An alternarive I suppse would be to break out the meminfo arg pushing to a helper function and have 2 separate code paths, or something like that. Looks reasonable otherwise. thanks for your review. I'll repost shortly nathan
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 09/17/2015 05:16 AM, Jonathan Wakely wrote: On 16/09/15 17:42 -0600, Martin Sebor wrote: I see now the first exists test will detect symlink loops in the original path. But I'm not convinced there isn't a corner case that's subject to a TOCTOU race condition between the first exists test and the while loop during which a symlink loop can be introduced. Suppose we call the function with /foo/bar as an argument and the path exists and contains no symlinks. result is / and cmpts is set to { foo, bar }. Just as the loop is entered, /foo/bar is replaced with a symlink containing /foo/bar. The loop then proceeds like so: 1. The first iteration removes foo from cmpts and sets result to /foo. cmpts is { bar }. 2. The second iteration removes bar from cmpts, sets result to /foo/bar, determines it's a symlink, reads its contents, sees it's an absolute pathname and replaces result with /. It then inserts the symlink's components { foo, bar } into cmpts. cmpts becomes { foo, bar }. exists(result) succeeds. 3. The next iteration of the loop has the same initial state as the first. But I could have very easily missed something that takes care of this corner case. If I did, sorry for the false alarm! No, you're right. The TS says such filesystem races are undefined: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4099.html#fs.race.behavior but it would be nice to fail gracefully rather than DOS the application. The simplest approach would be to increment a counter every time we follow a symlink, and if it reaches some limit decide something is wrong and fail with ELOOP. I don't see how anything else can be 100% bulletproof, because a truly evil attacker could just keep altering the contents of symlinks so we keep ping-ponging between two or more paths. If we keep track of paths we've seen before the attacker could just keep changing the contents to a unique path each time, that initially exists as a file, but by the time we get to is_symlink() its become a symlink to a new path. So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in the value the kernel uses? 20 seems quite low, I was thinking of a much higher number. Yes, it is a corner case, and it's not really avoidable in the case of hard links. For symlinks, POSIX defines the SYMLOOP_MAX constant as the maximum, with the _SC_SYMLOOP_MAX and _PC_SYMLOOP_MAX sysconf and pathconf variables. Otherwise 40 seems reasonable. With this, I'll let you get back to work -- I think we've beat this function to death ;) Martin
Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249)
On 09/17/2015 10:05 AM, Marek Polacek wrote: The patch doesn't diagnose some more involved cases like the one below: if (i < 0) return 1; else if (!(i > 0 || i == 0)) return 2; even though it does diagnose some other such cases, including: if (i < 0) return 1; else if (!(i >= 0)) return 2; and even: int foo (int a, int b, int c) { if (a + c == b) return 1; else if (a + c - b == 0) return 2; return 0; } I'm guessing this is due to operand_equal_p returning true for some pairs of equivalent expressions and false for others. Would making this work be feasible? You're right that this is limited by what operand_equal_p considers equal and what not. I'm not sure if there is something more convoluted else I could use in the FE, but I think not. It certainly doesn't look terrible important to me. And you'll run into a point of diminishing returns quickly. There's many ways to write this stuff in ways that are equivalent, but a pain to uncover. I think relying on operand_equal_p is probably sufficient at this stage. You probably didn't intend to diagnose the final else clause in the following case but I wonder if it should be (as an extension of the original idea): if (i) return 1; else if (!i) return 2; else return 0; Diagnosing this wasn't my intention. I'm afraid it would be kinda hard to do that. This is the "unreachable code" problem. We used to have a warning for that, but in practice it was so unstable it was removed. (In fact, I wonder if it might be worth diagnosing even the 'if (!i)' part since the condition is the inverse of the one in the if and thus redundant). This, on the other hand, seems doable provided we have some predicate that would tell us whether an expression is a logical inverse of another expression. E.g. "a > 3" and "a <= 3". Something akin to pred_equal_p -- invert one expr and check whether they're equal. I think you just want to select a canonical form. So you canonicalize, then compare. jeff
[AArch64][PATCH 3/5] Add atomic load-operate instructions.
Hello, ARMv8.1 adds atomic swap and atomic load-operate instructions with optional memory ordering specifiers. This patch adds the ARMv8.1 atomic load-operate instructions. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check. Also tested for aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator with +lse enabled by default. Ok for trunk? Matthew 2015-09-17 Matthew Wahab* config/aarch64/aarch64/atomics.md (UNSPECV_ATOMIC_LDOP): New. (UNSPECV_ATOMIC_LDOP_OR): New. (UNSPECV_ATOMIC_LDOP_BIC): New. (UNSPECV_ATOMIC_LDOP_XOR): New. (UNSPECV_ATOMIC_LDOP_PLUS): New. (ATOMIC_LDOP): New. (atomic_ldop): New. (aarch64_atomic_load): New. >From 6a8a83c4efbd607924f0630779d4915c9dad079c Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Mon, 10 Aug 2015 17:02:08 +0100 Subject: [PATCH 3/5] Add atomic load-operate instructions. Change-Id: I3746875bad7469403bee7df952f0ba565e4abc71 --- gcc/config/aarch64/atomics.md | 41 + 1 file changed, 41 insertions(+) diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md index 0e71002..b7b6fb5 100644 --- a/gcc/config/aarch64/atomics.md +++ b/gcc/config/aarch64/atomics.md @@ -29,8 +29,25 @@ UNSPECV_ATOMIC_CAS ; Represent an atomic CAS. UNSPECV_ATOMIC_SWP ; Represent an atomic SWP. UNSPECV_ATOMIC_OP ; Represent an atomic operation. +UNSPECV_ATOMIC_LDOP ; Represent an atomic load-operation +UNSPECV_ATOMIC_LDOP_OR ; Represent an atomic load-or +UNSPECV_ATOMIC_LDOP_BIC ; Represent an atomic load-bic +UNSPECV_ATOMIC_LDOP_XOR ; Represent an atomic load-xor +UNSPECV_ATOMIC_LDOP_PLUS ; Represent an atomic load-add ]) +;; Iterators for load-operate instructions. + +(define_int_iterator ATOMIC_LDOP + [UNSPECV_ATOMIC_LDOP_OR UNSPECV_ATOMIC_LDOP_BIC + UNSPECV_ATOMIC_LDOP_XOR UNSPECV_ATOMIC_LDOP_PLUS]) + +(define_int_attr atomic_ldop + [(UNSPECV_ATOMIC_LDOP_OR "set") (UNSPECV_ATOMIC_LDOP_BIC "clr") + (UNSPECV_ATOMIC_LDOP_XOR "eor") (UNSPECV_ATOMIC_LDOP_PLUS "add")]) + +;; Instruction patterns. + (define_expand "atomic_compare_and_swap" [(match_operand:SI 0 "register_operand" "") ;; bool out (match_operand:ALLI 1 "register_operand" "") ;; val out @@ -541,3 +558,27 @@ else return "casal\t%0, %2, %1"; }) + +;; Atomic load-op: Load data, operate, store result, keep data. + +(define_insn "aarch64_atomic_load" + [(set (match_operand:ALLI 0 "register_operand" "=r") + (match_operand:ALLI 1 "aarch64_sync_memory_operand" "+Q")) + (set (match_dup 1) + (unspec_volatile:ALLI +[(match_dup 1) + (match_operand:ALLI 2 "register_operand") + (match_operand:SI 3 "const_int_operand")] +ATOMIC_LDOP))] + "TARGET_LSE && reload_completed" + { + enum memmodel model = memmodel_from_int (INTVAL (operands[3])); + if (is_mm_relaxed (model)) + return "ld\t%2, %0, %1"; + else if (is_mm_acquire (model) || is_mm_consume (model)) + return "lda\t%2, %0, %1"; + else if (is_mm_release (model)) + return "ldl\t%2, %0, %1"; + else + return "ldal\t%2, %0, %1"; + }) -- 2.1.4
[C++ Patch/RFC] PR 67576 ("[4.9/5/6 Regression] expression of typeid( expression ) is evaluated twice")
Hi, submitter noticed that the fix for c++/25466: https://gcc.gnu.org/ml/gcc-patches/2013-04/msg00553.html caused a double evaluation of the typeid, at least in some cases. I had a quick look and wondered if it could make sense to just use the new code when we are outside the straightforward INDIRECT_REF cases which we already handled correctly... The below passes testing, anyway. Thanks, Paolo. // Index: cp/rtti.c === --- cp/rtti.c (revision 227849) +++ cp/rtti.c (working copy) @@ -336,10 +336,25 @@ build_typeid (tree exp, tsubst_flags_t complain) { /* So we need to look into the vtable of the type of exp. Make sure it isn't a null lvalue. */ - exp = cp_build_addr_expr (exp, complain); - exp = stabilize_reference (exp); - cond = cp_convert (boolean_type_node, exp, complain); - exp = cp_build_indirect_ref (exp, RO_NULL, complain); + if (INDIRECT_REF_P (exp) + && TYPE_PTR_P (TREE_TYPE (TREE_OPERAND (exp, 0 + { + exp = mark_lvalue_use (exp); + exp = stabilize_reference (exp); + cond = cp_convert (boolean_type_node, TREE_OPERAND (exp, 0), +complain); + } + else + { + /* The standard is somewhat unclear here, but makes sense +to always check whether the address is null rather than +confine the check to when the immediate operand of typeid +is an INDIRECT_REF. */ + exp = cp_build_addr_expr (exp, complain); + exp = stabilize_reference (exp); + cond = cp_convert (boolean_type_node, exp, complain); + exp = cp_build_indirect_ref (exp, RO_NULL, complain); + } } exp = get_tinfo_decl_dynamic (exp, complain); Index: testsuite/g++.dg/rtti/typeid11.C === --- testsuite/g++.dg/rtti/typeid11.C(revision 0) +++ testsuite/g++.dg/rtti/typeid11.C(working copy) @@ -0,0 +1,17 @@ +// PR c++/67576 +// { dg-do run } + +#include + +struct Base { virtual void foo() {} }; + +int main() +{ + Base b; + Base *ary[] = { , , }; + + int iter = 0; + typeid(*ary[iter++]); + if (iter != 1) +__builtin_abort(); +}
[patch] libstdc++/65913 Handle alignment in __atomic_is_lock_free
This fixes a 5/6 regression that causes atomic::is_lock_free() to require libatomic even on targets where the compiler knows the answer. The new test only runs on x86_64-linux and powerpc*-linux. It isn't actually OS-dependent but as long as it runs somewhere we should pick up regressions so those commonly-tested targets are sufficient. Tested x86_64-linux and powerpc64le-linux. Richard, this is your middle-end change plus your suggestion for the std::atomic members, which works just as you said it would. OK for trunk? commit db33491a59064c80104d8482f92cbf80d0d9775c Author: Jonathan WakelyDate: Thu Sep 17 00:21:34 2015 +0100 Handle alignment in __atomic_is_lock_free gcc: 2015-09-17 Richard Henderson PR libstdc++/65913 * builtins.c (fold_builtin_atomic_always_lock_free): Handle fake pointers that encode the alignment of the object. libstdc++-v3: 2015-09-17 Jonathan Wakely PR libstdc++/65913 * include/bits/atomic_base.h (__atomic_base<_TTp>::is_lock_free(), __atomic_base<_PTp*>::is_lock_free()): Call the built-in with the immediate pointer value, not a variable. * include/std/atomic (atomic::is_lock_free()): Likewise. * testsuite/29_atomics/atomic/65913.cc: New. diff --git a/gcc/builtins.c b/gcc/builtins.c index d79372c..aeec170 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -5635,8 +5635,20 @@ fold_builtin_atomic_always_lock_free (tree arg0, tree arg1) mode = mode_for_size (size, MODE_INT, 0); mode_align = GET_MODE_ALIGNMENT (mode); - if (TREE_CODE (arg1) == INTEGER_CST && INTVAL (expand_normal (arg1)) == 0) -type_align = mode_align; + if (TREE_CODE (arg1) == INTEGER_CST) +{ + unsigned HOST_WIDE_INT val = UINTVAL (expand_normal (arg1)); + + /* Either this argument is null, or it's a fake pointer encoding + the alignment of the object. */ + val = val & -val; + val *= BITS_PER_UNIT; + + if (val == 0 || mode_align < val) +type_align = mode_align; + else +type_align = val; +} else { tree ttype = TREE_TYPE (arg1); diff --git a/libstdc++-v3/include/bits/atomic_base.h b/libstdc++-v3/include/bits/atomic_base.h index 79769cf..75a7ca7 100644 --- a/libstdc++-v3/include/bits/atomic_base.h +++ b/libstdc++-v3/include/bits/atomic_base.h @@ -350,17 +350,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION bool is_lock_free() const noexcept { - // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_i)); - return __atomic_is_lock_free(sizeof(_M_i), __a); + // Use a fake, minimally aligned pointer. + return __atomic_is_lock_free(sizeof(_M_i), + reinterpret_cast(-__alignof(_M_i))); } bool is_lock_free() const volatile noexcept { - // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_i)); - return __atomic_is_lock_free(sizeof(_M_i), __a); + // Use a fake, minimally aligned pointer. + return __atomic_is_lock_free(sizeof(_M_i), + reinterpret_cast(-__alignof(_M_i))); } _GLIBCXX_ALWAYS_INLINE void @@ -666,16 +666,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_p)); - return __atomic_is_lock_free(sizeof(_M_p), __a); + return __atomic_is_lock_free(sizeof(_M_p), + reinterpret_cast(-__alignof(_M_p))); } bool is_lock_free() const volatile noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_p)); - return __atomic_is_lock_free(sizeof(_M_p), __a); + return __atomic_is_lock_free(sizeof(_M_p), + reinterpret_cast(-__alignof(_M_p))); } _GLIBCXX_ALWAYS_INLINE void diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic index 125e37a..cdd1f0b 100644 --- a/libstdc++-v3/include/std/atomic +++ b/libstdc++-v3/include/std/atomic @@ -208,16 +208,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION is_lock_free() const noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_i)); - return __atomic_is_lock_free(sizeof(_M_i), __a); + return __atomic_is_lock_free(sizeof(_M_i), + reinterpret_cast(-__alignof(_M_i))); } bool is_lock_free() const volatile noexcept { // Produce a fake, minimally aligned pointer. - void *__a = reinterpret_cast(-__alignof(_M_i)); - return __atomic_is_lock_free(sizeof(_M_i), __a); + return __atomic_is_lock_free(sizeof(_M_i), + reinterpret_cast(-__alignof(_M_i))); } void diff --git a/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc b/libstdc++-v3/testsuite/29_atomics/atomic/65913.cc new file mode 100644 index 000..dbdd9cf --- /dev/null +++
Re: [PATCH] Fix PR64078
On 09/17/2015 09:00 AM, Marek Polacek wrote: On Wed, Sep 09, 2015 at 07:48:15PM +0200, Bernd Edlinger wrote: Hi, On Wed, 9 Sep 2015 09:31:33, Jeff Law wrote: You could probably make the function static or change its visibility via a function attribute (there's a visibility attribute which can take the values default, hidden protected or internal). Default visibility essentially means the function can be overridden. I think changing it to "protected" might work. Note if we do that, we may need some kind of target selector on the test since not all targets support the various visibility attributes. Yes, it works both ways: static works, and __attribute__ ((visibility ("protected"))) works too: make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c --target_board='unix{-fpic,-mcmodel=medium,-fpic\ -mcmodel=medium,-mcmodel=large,-fpic\ -mcmodel=large}'" has all tests passed, but.. make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c --target_board='unix{-fno-inline}'" still fails in the same way for all workarounds: inline, static, and __attribute__ ((visibility ("protected"))). Maybe "static" would be preferable? Yeah, I'd go with static if that helps. I'd rather avoid playing games with visibility. Static is certainly easier and doesn't rely on targets implementing all the visibility capabilities. So static is probably the best approach. jeff
[AArch64][PATCH 2/5] Add BIC instruction.
Hello, ARMv8.1 adds atomic swap and atomic load-operate instructions with optional memory ordering specifiers. This patch adds an expander to generate a BIC instruction that can be explicitly called when implementing the atomic__fetch pattern to calculate the value to be returned by the operation. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check. Also tested for aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator with +lse enabled by default. Ok for trunk? Matthew 2015-09-17 Matthew Wahab* config/aarch64/aarch64.md (bic_3): New. >From 14e122ee98aa20826ee070d20c58c94206cdd91b Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Mon, 17 Aug 2015 17:48:27 +0100 Subject: [PATCH 2/5] Add BIC instruction Change-Id: Ibef049bfa1bfe5e168feada3dc358f28383e6410 --- gcc/config/aarch64/aarch64.md | 13 + 1 file changed, 13 insertions(+) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 88ba72e..bae4af4 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3351,6 +3351,19 @@ (set_attr "simd" "*,yes")] ) +(define_expand "bic_3" + [(set (match_operand:GPI 0 "register_operand" "=r") + (and:GPI +(not:GPI + (SHIFT:GPI + (match_operand:GPI 1 "register_operand" "r") + (match_operand:QI 2 "aarch64_shift_imm_si" "n"))) +(match_operand:GPI 3 "register_operand" "r")))] + "" + "" + [(set_attr "type" "logics_shift_imm")] +) + (define_insn "*and_one_cmpl3_compare0" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ -- 2.1.4
[Ada] Language-defined checks and side effects in gigi
As diagnosed by Richard B., emit_check wrongly resets the TREE_SIDE_EFFECTS flag on the COND_EXPR built for language-defined checks in Ada, leading to the omission of required checks in peculiar cases. Tested on x86_64-suse-linux, applied on the mainline. 2015-09-17 Eric Botcazou* gcc-interface/trans.c (emit_check): Do not touch TREE_SIDE_EFFECTS. 2015-09-17 Eric Botcazou * gnat.dg/overflow_sum3.adb: New test. -- Eric BotcazouIndex: gcc-interface/trans.c === --- gcc-interface/trans.c (revision 227819) +++ gcc-interface/trans.c (working copy) @@ -8798,29 +8798,32 @@ emit_index_check (tree gnu_array_object, gnu_expr, CE_Index_Check_Failed, gnat_node); } -/* GNU_COND contains the condition corresponding to an access, discriminant or - range check of value GNU_EXPR. Build a COND_EXPR that returns GNU_EXPR if - GNU_COND is false and raises a CONSTRAINT_ERROR if GNU_COND is true. - REASON is the code that says why the exception was raised. GNAT_NODE is - the GNAT node conveying the source location for which the error should be - signaled. */ +/* GNU_COND contains the condition corresponding to an index, overflow or + range check of value GNU_EXPR. Build a COND_EXPR that returns GNU_EXPR + if GNU_COND is false and raises a CONSTRAINT_ERROR if GNU_COND is true. + REASON is the code that says why the exception is raised. GNAT_NODE is + the node conveying the source location for which the error should be + signaled. + + We used to propagate TREE_SIDE_EFFECTS from GNU_EXPR to the COND_EXPR, + overwriting the setting inherited from the call statement, on the ground + that the expression need not be evaluated just for the check. However + that's incorrect because, in the GCC type system, its value is presumed + to be valid so its comparison against the type bounds always yields true + and, therefore, could be done without evaluating it; given that it can + be a computation that overflows the bounds, the language may require the + check to fail and thus the expression to be evaluated in this case. */ static tree emit_check (tree gnu_cond, tree gnu_expr, int reason, Node_Id gnat_node) { tree gnu_call = build_call_raise (reason, gnat_node, N_Raise_Constraint_Error); - tree gnu_result -= fold_build3 (COND_EXPR, TREE_TYPE (gnu_expr), gnu_cond, - build2 (COMPOUND_EXPR, TREE_TYPE (gnu_expr), gnu_call, - convert (TREE_TYPE (gnu_expr), integer_zero_node)), - gnu_expr); - - /* GNU_RESULT has side effects if and only if GNU_EXPR has: - we don't need to evaluate it just for the check. */ - TREE_SIDE_EFFECTS (gnu_result) = TREE_SIDE_EFFECTS (gnu_expr); - - return gnu_result; + return +fold_build3 (COND_EXPR, TREE_TYPE (gnu_expr), gnu_cond, + build2 (COMPOUND_EXPR, TREE_TYPE (gnu_expr), gnu_call, + convert (TREE_TYPE (gnu_expr), integer_zero_node)), + gnu_expr); } /* Return an expression that converts GNU_EXPR to GNAT_TYPE, doing overflow -- { dg-do run } -- { dg-options "-gnato" } procedure Overflow_Sum3 is function Ident (I : Integer) return Integer is begin return I; end; X : Short_Short_Integer := Short_Short_Integer (Ident (127)); begin if X+1 <= 127 then raise Program_Error; end if; exception when Constraint_Error => null; end;
Re: [patch] libstdc++/65913 Handle alignment in __atomic_is_lock_free
On 09/17/2015 07:04 AM, Jonathan Wakely wrote: > Handle alignment in __atomic_is_lock_free > > gcc: > > 2015-09-17 Richard Henderson> > PR libstdc++/65913 > * builtins.c (fold_builtin_atomic_always_lock_free): Handle fake > pointers that encode the alignment of the object. > > libstdc++-v3: > > 2015-09-17 Jonathan Wakely > > PR libstdc++/65913 > * include/bits/atomic_base.h (__atomic_base<_TTp>::is_lock_free(), > __atomic_base<_PTp*>::is_lock_free()): Call the built-in with the > immediate pointer value, not a variable. > * include/std/atomic (atomic::is_lock_free()): Likewise. > * testsuite/29_atomics/atomic/65913.cc: New. Yes, this is ok. r~
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 16/09/15 23:50 +0100, Jonathan Wakely wrote: On 16/09/15 19:58 +0100, Jonathan Wakely wrote: commit ef25038796485298ff8f040bc79e0d9a371171fa Author: Jonathan WakelyDate: Wed Sep 16 18:07:32 2015 +0100 Implement filesystem::canonical() without realpath PR libstdc++/67173 * acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Check _XOPEN_VERSION and PATH_MAX for _GLIBCXX_USE_REALPATH. * config.h.in: Regenerate. * configure: Regenerate. * src/filesystem/ops.cc: (canonical) [!_GLIBCXX_USE_REALPATH]: Add alternative implementation. * testsuite/experimental/filesystem/operations/canonical.cc: New. * testsuite/experimental/filesystem/operations/exists.cc: Add more tests. * testsuite/experimental/filesystem/operations/absolute.cc: Add test variables. * testsuite/experimental/filesystem/operations/copy.cc: Likewise. * testsuite/experimental/filesystem/operations/current_path.cc: Likewise. * testsuite/experimental/filesystem/operations/file_size.cc: Likewise. * testsuite/experimental/filesystem/operations/status.cc: Likewise. * testsuite/experimental/filesystem/operations/temp_directory_path.cc: Likewise. Committed to trunk. I'm removing part of the new canonical.cc test as it fails occasionally with: terminate called after throwing an instance of 'std::experimental::filesystem::v1::__cxx11::filesystem_error' what(): filesystem error: cannot canonicalize: No such file or directory [/dev/stdin] FAIL: experimental/filesystem/operations/canonical.cc execution test This is odd, as I check with exists() before calling canonical(), but rather than try to understand what is happening I'm just going to remove that part. Committed to trunk. commit a250423d1964952312bf97e6be3de987308a5164 Author: Jonathan Wakely Date: Thu Sep 17 16:17:11 2015 +0100 Remove non-deterministic part of canonical() test * testsuite/experimental/filesystem/operations/canonical.cc: Remove non-deterministic part of the test. diff --git a/libstdc++-v3/testsuite/experimental/filesystem/operations/canonical.cc b/libstdc++-v3/testsuite/experimental/filesystem/operations/canonical.cc index d752feb..5091a70 100644 --- a/libstdc++-v3/testsuite/experimental/filesystem/operations/canonical.cc +++ b/libstdc++-v3/testsuite/experimental/filesystem/operations/canonical.cc @@ -57,17 +57,6 @@ test01() p = canonical( p, ec ); VERIFY( p == "/" ); VERIFY( !ec ); - - p = "/dev/stdin"; - if (exists(p)) -{ - auto p2 = canonical(p); - if (is_symlink(p)) -VERIFY( p != p2 ); - else -VERIFY( p == p2 ); - VERIFY( canonical(p2) == p2 ); -} } int
Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249)
On Wed, Sep 16, 2015 at 02:59:12PM -0600, Martin Sebor wrote: > On 09/16/2015 09:59 AM, Marek Polacek wrote: > >This patch implements a new warning, -Wduplicated-cond. It warns for > >code such as > > > > if (x) > > // ... > > else if (x) > > // ... > > As usual, I like this improvement. I think it's worth extending > to conditional expressions (e.g., x ? f() : x ? g() : h() should > be diagnosed as well). Maybe, probably. I admit I wasn't thinking of conditional expressions at all. > The patch currently issues a false positive for the test case > below. I suspect the chain might need to be cleared after each > condition that involves a side-effect. > > int foo (int a) > { > if (a) return 1; else if (++a) return 2; else if (a) return 3; > return 0; > } But the last branch here can never be reached, right? If a == 0, foo returns 2, otherwise it just returns 1. So I think we should diagnose this. > The patch doesn't diagnose some more involved cases like the one > below: > > if (i < 0) return 1; else if (!(i > 0 || i == 0)) return 2; > > even though it does diagnose some other such cases, including: > > if (i < 0) return 1; else if (!(i >= 0)) return 2; > > and even: > > int foo (int a, int b, int c) { > if (a + c == b) return 1; else if (a + c - b == 0) return 2; > return 0; > } > > I'm guessing this is due to operand_equal_p returning true for > some pairs of equivalent expressions and false for others. Would > making this work be feasible? You're right that this is limited by what operand_equal_p considers equal and what not. I'm not sure if there is something more convoluted else I could use in the FE, but I think not. It certainly doesn't look terrible important to me. > You probably didn't intend to diagnose the final else clause in > the following case but I wonder if it should be (as an extension > of the original idea): > > if (i) return 1; else if (!i) return 2; else return 0; Diagnosing this wasn't my intention. I'm afraid it would be kinda hard to do that. > (In fact, I wonder if it might be worth diagnosing even the > 'if (!i)' part since the condition is the inverse of the one > in the if and thus redundant). This, on the other hand, seems doable provided we have some predicate that would tell us whether an expression is a logical inverse of another expression. E.g. "a > 3" and "a <= 3". Something akin to pred_equal_p -- invert one expr and check whether they're equal. But that sounds like another warning ;). And it smells of doing some kind of VRP in the FE - ick. > Is diagnosing things like 'if (a) if (a);' or 'if (a); else { > if (a); }' not feasible or too involved, or did you choose not > to because you expect it might generate too many warnings? (If > the latter, perhaps it could be disabled by default and enabled > by -Wduplicated-cond=2). I intentionally avoided that. It would certainly be harder and I'm not sure it's worth it. We'd need to be careful to not warn on e.g. if (a > 5) { --a; if (a < 5) // ... } etc. Thanks a lot for looking into this. Marek
Re: [C/C++ PATCH] RFC: Implement -Wduplicated-cond (PR c/64249)
The patch currently issues a false positive for the test case below. I suspect the chain might need to be cleared after each condition that involves a side-effect. int foo (int a) { if (a) return 1; else if (++a) return 2; else if (a) return 3; return 0; } But the last branch here can never be reached, right? If a == 0, foo returns 2, otherwise it just returns 1. So I think we should diagnose this. It probably wasn't the best example. The general issue here is that the second condition has a side-effect that can change (in this case clearly does) the value of the expression. Here's a better example: int a; int bar (void) { a = 1; return 0; } int foo (void) { if (a) return 1; else if (foo ()) return 2; else if (a) return 3; return 0; } Since we don't know bar's side-effects we must assume they change the value of a and so we must avoid diagnosing the third if. Martin
[AArch64][PATCH 1/5] Use atomic instructions for swap and fetch-update operations.
Hello, ARMv8.1 adds atomic swap and atomic load-operate instructions with optional memory ordering specifiers. This patch series adds the instructions to GCC, making them available with -march=armv8.1-a or -march=armv8+lse, and uses them to implement the __sync and __atomic builtins. The ARMv8.1 swap instruction swaps the value in a register with a value in memory. The load-operate instructions load a value from memory, update it with the result of an operation and store the result in memory. This series uses the swap instruction to implement the atomic_exchange patterns and the load-operate instructions to implement the atomic_fetch_ and atomic__fetch patterns. For the atomic__fetch patterns, the value returned as the result of the operation has to be recalculated from the loaded data. The ARMv8 BIC instruction is added so that it can be used for this recalculation. The patches in this series - add and use the atomic swap instruction. - add the Aarch64 BIC instruction, - add the ARMv8.1 load-operate instructions, - use the load-operate instructions to implement the atomic_fetch_ patterns, - use the load-operate instructions to implement the patterns atomic__fetch patterns, The code-generation changes in this series are based around a new function, aarch64_gen_atomic_ldop, which takes the operation to be implemented and emits the appropriate code, making use of the atomic instruction. This follows the existing uses aarch64_split_atomic_op for the same purpose when atomic instructions aren't available. This patch adds the ARMv8.1 SWAP instruction and function aarch64_gen_atomic_ldop and changes the implementation of the atomic_exchange pattern to the atomic instruction when it is available. The general form of the code generated for an atomic_exchange, with destination D, source S, memory address A and memory order MO is: swp S, D, [A] where is one of {'', 'a', 'l', 'al'} depending on memory order MO. is one of {'', 'b', 'h'} depending on the data size. This patch also adds tests for the changes. These reuse the support code introduced for the atomic CAS tests, adding macros to test functions taking one memory ordering argument. These are used to iteratively define functions using the __atomic_exchange builtins, which should be implemented using the atomic swap. Tested the series for aarch64-none-linux-gnu with native bootstrap and make check. Also tested for aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator with +lse enabled by default. Ok for trunk? Matthew gcc/ 2015-09-17 Matthew Wahab* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop): Declare. * config/aarch64/aarch64.c (aarch64_emit_atomic_swp): New. (aarch64_gen_atomic_ldop): New. (aarch64_split_atomic_op): Fix whitespace and add a comment. * config/aarch64/atomics.md (UNSPECV_ATOMIC_SWP): New. (atomic_compare_and_swap_lse): Remove comments and fix whitespace. (atomic_exchange): Replace with an expander. (aarch64_atomic_exchange): New. (aarch64_atomic_exchange_lse): New. (aarch64_atomic_): Fix some whitespace. (aarch64_atomic_swp): New. gcc/testsuite/ 2015-09-17 Matthew Wahab * gcc.target/aarch64/atomic-inst-ops.inc: (TEST_MODEL): New. (TEST_ONE): New. * gcc.target/aarch64/atomic-inst-swap.c: New. >From 425fa05a5e3656c8d6d0d085628424b4c846cd49 Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Fri, 7 Aug 2015 17:18:37 +0100 Subject: [PATCH 1/5] Add atomic SWP instruction Change-Id: I87bf48526cb11e65edd15691f5eab20446e418c9 --- gcc/config/aarch64/aarch64-protos.h| 1 + gcc/config/aarch64/aarch64.c | 46 ++- gcc/config/aarch64/atomics.md | 92 +++--- .../gcc.target/aarch64/atomic-inst-ops.inc | 13 +++ gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c | 44 +++ 5 files changed, 183 insertions(+), 13 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-swp.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index ff19851..eba4c76 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -378,6 +378,7 @@ rtx aarch64_load_tp (rtx); void aarch64_expand_compare_and_swap (rtx op[]); void aarch64_split_compare_and_swap (rtx op[]); void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx); +void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4d2126b..dc05c6e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11185,11
Re: [Aarch64] Use vector wide add for mixed-mode adds
On Mon, Sep 07, 2015 at 06:54:30AM +0100, Michael Collison wrote: > This patch is designed to address code that was not being vectorized due > to missing widening patterns in the aarch64 backend. Code such as: > > int t6(int len, void * dummy, short * __restrict x) > { >len = len & ~31; >int result = 0; >__asm volatile (""); >for (int i = 0; i < len; i++) > result += x[i]; >return result; > } > > Validated on aarch64-none-elf, aarch64_be-none-elf, and > aarch64-none-linus-gnu. > > Note that there are three non-execution tree dump vectorization > regressions where previously code was being vectorized. They are: I'd like to understand these better before taking the patch... > > Passed now fails [PASS => FAIL]: >gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vectorized 1 loops" 1 >gcc.dg/vect/slp-multitypes-4.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vectorizing stmts using SLP" 1 >gcc.dg/vect/slp-multitypes-4.c scan-tree-dump-times vect "vectorized 1 > loops" 1 >gcc.dg/vect/slp-multitypes-4.c scan-tree-dump-times vect "vectorizing > stmts using SLP" 1 >gcc.dg/vect/slp-multitypes-5.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vectorized 1 loops" 1 >gcc.dg/vect/slp-multitypes-5.c -flto -ffat-lto-objects > scan-tree-dump-times vect "vectorizing stmts using SLP" 1 >gcc.dg/vect/slp-multitypes-5.c scan-tree-dump-times vect "vectorized 1 > loops" 1 >gcc.dg/vect/slp-multitypes-5.c scan-tree-dump-times vect "vectorizing > stmts using SLP" 1 These look like weaknesses in SLP trying to build a widening add by a constant in the wider mode (i.e. (short) w+ (int)). I'd like to understand what the issue is here. >gcc.dg/vect/slp-reduc-3.c -flto -ffat-lto-objects scan-tree-dump-times > vect "vectorizing stmts using SLP" 1 >gcc.dg/vect/slp-reduc-3.c scan-tree-dump-times vect "vectorizing stmts > using SLP" 1 Is this one as simple as setting check_effective_target_vect_widen_sum_hi_to_si_pattern in testsuite/lib/target-supports.exp ? >gcc.dg/vect/vect-125.c -flto -ffat-lto-objects scan-tree-dump vect > "vectorized 1 loops" >gcc.dg/vect/vect-125.c scan-tree-dump vect "vectorized 1 loops" These look like a failure to match the widening optab for operand types (int) w+ (short) -> (int), so I'm also concerned here. > I would like to treat these as saperate bugs and resolve them separately. I think we shouldn't write these off without at least partially understanding the issues. > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 9777418..d6c5d61 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -2636,6 +2636,60 @@ > > ;; w. > > +(define_expand "widen_ssum3" > + [(set (match_operand: 0 "register_operand" "") > +(plus: (sign_extend: (match_operand:VQW 1 > "register_operand" "")) Please check your mail settings, as the patch currently does not apply. The last time I saw this issue it was a mail client turning on format=flowed which caused carnage for patch files ( https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00179.html ) > + (match_operand: 2 "register_operand" "")))] > + "TARGET_SIMD" > + { > +rtx p = aarch64_simd_vect_par_cnst_half (mode, false); > +rtx temp = gen_reg_rtx (GET_MODE (operands[0])); > + > +emit_insn (gen_aarch64_saddw_internal (temp, operands[2], > +operands[1], p)); Replace 8 spaces with a tab (mail client?). Thanks, James > +emit_insn (gen_aarch64_saddw2 (operands[0], temp, operands[1])); > +DONE; > + } > +) > + > +(define_expand "widen_ssum3" > + [(set (match_operand: 0 "register_operand" "") > +(plus: (sign_extend: > + (match_operand:VD_BHSI 1 "register_operand" "")) > + (match_operand: 2 "register_operand" "")))] > + "TARGET_SIMD" > +{ > + emit_insn (gen_aarch64_saddw (operands[0], operands[2], > operands[1])); > + DONE; > +}) > + > +(define_expand "widen_usum3" > + [(set (match_operand: 0 "register_operand" "=") > +(plus: (zero_extend: (match_operand:VQW 1 > "register_operand" "")) > + (match_operand: 2 "register_operand" "")))] > + "TARGET_SIMD" > + { > +rtx p = aarch64_simd_vect_par_cnst_half (mode, false); > +rtx temp = gen_reg_rtx (GET_MODE (operands[0])); > + > +emit_insn (gen_aarch64_uaddw_internal (temp, operands[2], > + operands[1], p)); > +emit_insn (gen_aarch64_uaddw2 (operands[0], temp, operands[1])); > +DONE; > + } > +) > + > +(define_expand "widen_usum3" > + [(set (match_operand: 0 "register_operand" "") > +(plus: (zero_extend: > + (match_operand:VD_BHSI 1 "register_operand" "")) > + (match_operand: 2 "register_operand" "")))] > + "TARGET_SIMD" > +{ > + emit_insn (gen_aarch64_uaddw (operands[0], operands[2], > operands[1])); > +
Re: [PATCH][RTL-ifcvt] PR rtl-optimization/67465: Handle pairs of complex+simple blocks and empty blocks more gracefully
Hi Rainer, On 17/09/15 12:33, Rainer Orth wrote: Hi Kyrill, On 11/09/15 09:51, Rainer Orth wrote: Kyrill Tkachovwrites: On 10/09/15 12:43, Rainer Orth wrote: Hi Kyrill, Rainer, could you please check that this patch still fixes the SPARC regressions? unfortunately, it breaks sparc-sun-solaris2.10 bootstrap: compiling stage2 libiberty/regex.c FAILs: Thanks for providing the preprocessed file. I've reproduced and fixed the ICE in this version of the patch. The problem was that I was taking the mode of x before the check of whether a and b are MEMs, after which we would change x to an address_mode reg, thus confusing emit_move_insn. The fix is to take the mode of x and perform the can_conditionally_move_p check after that transformation. Bootstrapped and tested on aarch64 and x86_64. The preprocessed regex.i that Rainer provided now compiles successfully for me on a sparc-sun-solaris2.10 stage-1 cross-compiler. Rainer, thanks for your help so far, could you please try out this patch? While bootstrap succeeds again, the testsuite regression in gcc.c-torture/execute/20071216-1.c reoccured. Right, so I dug into the RTL dumps and I think this is a separate issue that's being exacerbated by my patch. The code tries to if-convert a block which contains a compare instruction i.e. sets the CC register. Now, bb_valid_for_noce_process_p should have caught this, and in particular insn_valid_noce_process_p which should check that the instruction doesn't set the CC register. However, on SPARC the cc_in_cond returns NULL! This is due to the canonicalize_comparison implementation that seems to remove the CC register from the condition expression and returns something like: (leu (reg/v:SI 109 [ b ]) (const_int -4096 [0xf000]) Therefore the set_of (cc_in_cond (cond), insn) check doesn't get triggered because cc_in_cond returns NULL. Regardless of how the branch condition got canonicalized, I think we still want to reject any insn in the block that sets a condition code register, so this patch checks the destination of every set in the block for a MODE_CC expression and cancels if-conversion if that's the case. Oleg pointed me to the older PR 58517 affecting SH which seems similar and I think my previous ifcvt patch would expose this problem more. Anyway, with this patch the failing SPARC testcase gcc.c-torture/execute/20071216-1.c generates the same assembly as before r227368 and bootstrap and test on aarch64 and x86_64 passes ok for me. Rainer, could you try this patch on top of the previous patch? (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html) The two together should fix all of PR 67456, 67464, 67465 and 67481. sorry this took me so long: we've had a major switch failure and my sparc machines are way slow. No problem. You're doing me a huge favour by testing the iterations of the patches. Sorry for causing the regression in the first place. The issues I'm finding are exposed due to the way the sparc backend does some things, so my aarch64 and x86_64 testing is unlikely to catch them. Anyway, here's what I found: the two patches on top of each other do bootstrap just fine and the gcc.c-torture/execute/20071216-1.c regressions are gone. However, it introduces a new one: FAIL: gcc.dg/torture/stackalign/sibcall-1.c -O1 -fpic execution test It fails for both 32 and 64-bit. The testcase SEGVs: Indeed, I can see if-conversion triggering here and doing something funky with the first patch that I posted (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00689.html) In this testcase we trigger the is_mem path through noce_try_cmove_arith and we have the 'a' and 'b' having the form reg+symbol. When we try to move them into a register with noce_emit_move_insn the resulting move is not recognised (presumably sparc doesn't have any instructions/patterns to do this operation) and the alternative tricks that noce_emit_move_insn tries to create the move end up generating a bizzare sequence that involves loading from the memory at reg+symbol expression and adding the base reg to it! In any case, this patch doesn't try calling noce_emit_move_insn and instead generates a simple SET expression, emits that and relies on end_ifcvt_sequence to call recog on it and cancel the transformation if it's not a valid instruction. IMO this is the desired behaviour since the move in question is supposed to be a simple move that would ideally be eliminated by the register allocator later on if the dependencies work out. If it actually expands to more complex sequences it's not going to be a win to if-convert anyway. TLDR: This updated patch generates the same code for the sibcall-1.c testcase on sparc as before the bad transformation and all the previous regressions are still fixed. Bootstrapped and tested on aarch64 and x86_64. Rainer, could you please try this patch in combination with the one I sent earlier at:
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On Thu, 2015-09-17 at 18:12 +0200, Bernd Schmidt wrote: > On 09/17/2015 02:01 PM, Gerald Pfeifer wrote: > > On Sun, 13 Sep 2015, Mark Wielaard wrote: > >> Slightly adjusted patch attached. Now it is explicit that the warning is > >> enabled by -Wunused-variable for C, but not C++. There are testcases for > >> both C and C++ to check the defaults. And the hardcoded override is > >> removed for C++, so the user could enable it if they want. > > > > I believe making -Wunused-const-variable part of -Wall is not > > a good idea. > > > > For example, I have a nightly build of Wine with a nightly build > > of GCC. And my notifaction mail went from twently lines of warning > > to 6500 -- all coming from header files. > > What does it warn about, do the warnings look valid? I was just taking a look. They come from constructs like: static const WCHAR INSTALLPROPERTY_VERSIONSTRINGW[] = {'V','e','r','s','i','o','n','S','t','r','i','n','g',0}; Using the more idiomatic and a bit more readable: #define INSTALLPROPERTY_VERSIONSTRINGW u"VersionString" gets rid of most of the issues. Cheers, Mark
Amend documentation of DF LIVE problem
As discussed in the audit trail of PR rtl-optimization/66790, the doc of the DF_LIVE problem is confusing/wrong so the attached patch amends it. Approved by Kenneth and applied on all active branches. 2015-09-17 Eric BotcazouPR rtl-optimization/66790 * df-problems.c (LIVE): Amend documentation. -- Eric BotcazouIndex: df-problems.c === --- df-problems.c (revision 227819) +++ df-problems.c (working copy) @@ -1309,22 +1309,23 @@ df_lr_verify_transfer_functions (void) /* - LIVE AND MUST-INITIALIZED REGISTERS. + LIVE AND MAY-INITIALIZED REGISTERS. This problem first computes the IN and OUT bitvectors for the - must-initialized registers problems, which is a forward problem. - It gives the set of registers for which we MUST have an available - definition on any path from the entry block to the entry/exit of - a basic block. Sets generate a definition, while clobbers kill + may-initialized registers problems, which is a forward problem. + It gives the set of registers for which we MAY have an available + definition, i.e. for which there is an available definition on + at least one path from the entry block to the entry/exit of a + basic block. Sets generate a definition, while clobbers kill a definition. In and out bitvectors are built for each basic block and are indexed by regnum (see df.h for details). In and out bitvectors in struct - df_live_bb_info actually refers to the must-initialized problem; + df_live_bb_info actually refers to the may-initialized problem; Then, the in and out sets for the LIVE problem itself are computed. These are the logical AND of the IN and OUT sets from the LR problem - and the must-initialized problem. + and the may-initialized problem. */ /* Private data used to verify the solution for this problem. */ @@ -1531,7 +1532,7 @@ df_live_confluence_n (edge e) } -/* Transfer function for the forwards must-initialized problem. */ +/* Transfer function for the forwards may-initialized problem. */ static bool df_live_transfer_function (int bb_index) @@ -1555,7 +1556,7 @@ df_live_transfer_function (int bb_index) } -/* And the LR info with the must-initialized registers, to produce the LIVE info. */ +/* And the LR info with the may-initialized registers to produce the LIVE info. */ static void df_live_finalize (bitmap all_blocks)
Re: [PATCH] PR28901 -Wunused-variable ignores unused const initialised variables
On 09/17/2015 02:01 PM, Gerald Pfeifer wrote: On Sun, 13 Sep 2015, Mark Wielaard wrote: Slightly adjusted patch attached. Now it is explicit that the warning is enabled by -Wunused-variable for C, but not C++. There are testcases for both C and C++ to check the defaults. And the hardcoded override is removed for C++, so the user could enable it if they want. I believe making -Wunused-const-variable part of -Wall is not a good idea. For example, I have a nightly build of Wine with a nightly build of GCC. And my notifaction mail went from twently lines of warning to 6500 -- all coming from header files. What does it warn about, do the warnings look valid? Bernd
Re: [PATCH, rs6000] Add expansions for min/max vector reductions
On Thu, 2015-09-17 at 09:18 -0500, Bill Schmidt wrote: > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote: > > On Wed, 16 Sep 2015, Alan Lawrence wrote: > > > > > On 16/09/15 17:10, Bill Schmidt wrote: > > > > > > > > On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote: > > > > > On 16/09/15 15:28, Bill Schmidt wrote: > > > > > > 2015-09-16 Bill Schmidt> > > > > > > > > > > > * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, > > > > > > UNSPEC_REDUC_SMIN, > > > > > > UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, > > > > > > UNSPEC_REDUC_SMAX_SCAL, > > > > > > UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL, > > > > > > UNSPEC_REDUC_UMIN_SCAL): New enumerated constants. > > > > > > (reduc_smax_v2di): New define_expand. > > > > > > (reduc_smax_scal_v2di): Likewise. > > > > > > (reduc_smin_v2di): Likewise. > > > > > > (reduc_smin_scal_v2di): Likewise. > > > > > > (reduc_umax_v2di): Likewise. > > > > > > (reduc_umax_scal_v2di): Likewise. > > > > > > (reduc_umin_v2di): Likewise. > > > > > > (reduc_umin_scal_v2di): Likewise. > > > > > > (reduc_smax_v4si): Likewise. > > > > > > (reduc_smin_v4si): Likewise. > > > > > > (reduc_umax_v4si): Likewise. > > > > > > (reduc_umin_v4si): Likewise. > > > > > > (reduc_smax_v8hi): Likewise. > > > > > > (reduc_smin_v8hi): Likewise. > > > > > > (reduc_umax_v8hi): Likewise. > > > > > > (reduc_umin_v8hi): Likewise. > > > > > > (reduc_smax_v16qi): Likewise. > > > > > > (reduc_smin_v16qi): Likewise. > > > > > > (reduc_umax_v16qi): Likewise. > > > > > > (reduc_umin_v16qi): Likewise. > > > > > > (reduc_smax_scal_): Likewise. > > > > > > (reduc_smin_scal_): Likewise. > > > > > > (reduc_umax_scal_): Likewise. > > > > > > (reduc_umin_scal_): Likewise. > > > > > > > > > > You shouldn't need the non-_scal reductions. Indeed, they shouldn't be > > > > > used if > > > > > the _scal are present. The non-_scal's were previously defined as > > > > > producing a > > > > > vector with one element holding the result and the other elements all > > > > > zero, and > > > > > this was only ever used with a vec_extract immediately after; the > > > > > _scal > > > > > pattern > > > > > now includes the vec_extract as well. Hence the non-_scal patterns are > > > > > deprecated / considered legacy, as per md.texi. > > > > > > > > Thanks -- I had misread the description of the non-scalar versions, > > > > missing the part where the other elements are zero. What I really > > > > want/need is an optab defined as computing the maximum value in all > > > > elements of the vector. > > > > > > Yes, indeed. It seems reasonable to me that this would coexist with an > > > optab > > > which computes only a single value (i.e. a scalar). > > > > So just to clarify - you need to reduce the vector with max to a scalar > > but want the (same) result in all vector elements? > > Yes. Alan Hayward's cond-reduction patch is set up to perform a > reduction to scalar, followed by a scalar broadcast to get the value > into all positions. It happens that our most efficient expansion to > reduce to scalar will naturally produce the value in all positions. > Using the reduc_smax_scal_ expander, we are required to extract > this into a scalar and then perform the broadcast. Those two operations > are unnecessary. We can clean up after the fact, but the cost model > will still reflect the unnecessary activity. > > > > > > At that point it might be appropriate to change the cond-reduction code to > > > generate the reduce-to-vector in all cases, and optabs.c expand it to > > > reduce-to-scalar + broadcast if reduce-to-vector was not available. Along > > > with > > > the (parallel) changes to cost model already proposed, does that cover > > > all the > > > cases? It does add a new tree code, yes, but I'm feeling that could be > > > justified if we go down this route. > > > > I'd rather have combine recognize an insn that does both (if it > > exists). As I understand powerpc doesn't have reduction to scalar > > (I think no ISA actually has this, but all ISAs have reduce to > > one vector element plus a cheap way of extraction (effectively a subreg)) > > but it's reduction already has all result vector elements set to the > > same value (as opposed to having some undefined or zero or whatever). > > The extraction is not necessarily so cheap. For floating-point values > in a register, we can indeed use a subreg. But in this case we are > talking about integers, and an extract must move them from the vector > registers to the GPRs. With POWER8, we can do this with a 5-cycle move > instruction. Prior to that, we can only do this with a very expensive > path through memory, as previous processors had no
Re: [PATCH, fortran] PR 53379 Backtrace on error termination
On Wed, Sep 9, 2015 at 12:12 AM, Ian Lance Taylorwrote: > Mike Stump writes: > >> Not a big issue, but slightly better if (O_CLOEXEC>>32) != 0 is also >> true. See, if AIX should ever define this to a sensible value, the >> above would disappear the feature. However, if they did, then this >> expression should then be false. > > Yes, I think this might be even better in code. How about something > like > > /* On some versions of AIX O_CLOEXEC does not fit in int, so use a > cast to force it. */ > descriptor = open (filename, (int) (O_RDONLY | O_BINARY | O_CLOEXEC)); > > Does that work on AIX? Any decision about this patch? Thanks, David
Re: Split up optabs.[hc]
Bernd Schmidtwrites: > On 09/14/2015 07:54 PM, Richard Sandiford wrote: >> This patch splits optabs up as follows: >> >>- optabs-query.[hc]: IL-independent functions for querying what a target >>can do natively. >>- optabs-tree.[hc]: tree and gimple query functions (an extension of >>optabs-query.[hc]). >>- optabs-libfuncs.[hc]: optabs-specific libfuncs (an extension of >>libfuncs.h) >>- optabs.h: For now includes optabs-query.h and optabs-libfuncs.h. > > This seems like a good change. > >> I changed can_conditionally_move_p from returning an int to returning >> a bool and fixed a few formatting glitches. There should be no other >> changes to the functions themselves. > > I'm taking your word for it. The patch is slightly confusing in one area > of optabs.c (it looks like debug_optab_libfuncs got moved around, it > might be better for patch readability not to do that). Hmm, yeah. I think that was just diff getting confused between very similar blocks of code: the retained functions are in the same order. The diff seems to be clean if I use --diff-algorithm=minimal instead. > The only thing I really wondered about... > >> --- /dev/null >> +++ b/gcc/optabs-tree.h >> @@ -0,0 +1,45 @@ >> + >> +#include "optabs-query.h" > > I haven't quite followed amacleod's work on the #includes, so I wasn't > quite sure whether headers are supposed to include other headers these > days. But as far as I can tell that's fine, So, patch ok. Thanks, applied (slightly later than planned). Richard
Re: Openacc launch API
Updated patch addressing your points. Some further comments though ... + while (GOMP_LAUNCH_PACK (GOMP_LAUNCH_END, 0, 0) + != (tag = va_arg (ap, unsigned))) That's a somewhat non-idiomatic way to write this, with the constant first and not obviously a constant. I'd initialize a variable with the constant before the loop. I went with while ((tag = va_arg (...)) != 0) ... and killed GOMP_LAUNCH_END throughout, using explicit '0'. + assert (!GOMP_LAUNCH_DEVICE (tag)); Uh, that seems unfriendly, and not exactly forwards compatible. Can that fail a bit more gracefully? (Alternatively, implement the device_type stuff now so that we don't have TODOs in the code and don't have to worry about compatibility issues.) Added call to gomp_fatal, indicating libgomp is out of date. Also added a default to the switch following with the same effect. The trouble with implementing handling of device_type here now, is difficulty in testing its correctness. If it were buggy we'd be in a worse position than not having it. +GOACC_2.0,1 { + global: +GOACC_parallel_keyed; +} GOACC_2.0; Did you mean to use a comma? I misunderstood your comment as 'did you mean to use a comma where you used something else', not 'is that comma a typo?' well spotted! +else if (!tagging) Oh... so tagging controls two different methods for constructing argument lists, one for GOACC_parallel and the other for whatever OMP uses? That's a bit unfortunate, I'll need to think about it for a bit or defer to Jakub. My earlier description was lacking. The memory arguments have already been pushed before that switch. This is just dealing with async & wait args. I found it easier to modify the existing code path and have a tagging flag, rather than duplicate it. nathan o2015-09-17 Nathan Sidwellinlude/ * gomp-constants.h (GOMP_VERSION_NVIDIA_PTX): Increment. (GOMP_DIM_GANG, GOMP_DIM_WORKER, GOMP_DIM_VECTOR, GOMP_DIM_MAX, GOMP_DIM_MASK): New. (GOMP_LAUNCH_DIM, GOMP_LAUNCH_ASYNC, GOMP_LAUNCH_WAIT): New. (GOMP_LAUNCH_CODE_SHIFT, GOMP_LAUNCH_DEVICE_SHIFT, GOMP_LAUNCH_OP_SHIFT): New. (GOMP_LAUNCH_PACK, GOMP_LAUNCH_CODE, GOMP_LAUNCH_DEVICE, GOMP_LAUNCH_OP): New. (GOMP_LAUNCH_OP_MAX): New. libgomp/ * libgomp.h (acc_dispatch_t): Replace separate geometry args with array. * libgomp.map (GOACC_parallel_keyed): New. * oacc-parallel.c (goacc_wait): Take pointer to va_list. Adjust all callers. (GOACC_parallel_keyed): New interface. Lose geometry arguments and take keyed varargs list. Adjust call to exec_func. (GOACC_parallel): Forward to GACC_parallel_keyed. * libgomp_g.h (GOACC_parallel): Remove. (GOACC_parallel_keyed): Declare. * plugin/plugin-nvptx.c (struct targ_fn_launch): New struct. (stuct targ_gn_descriptor): Replace name field with launch field. (nvptx_exec): Lose separate geometry args, take array. Process dynamic dimensions and adjust. (struct nvptx_tdata): Replace fn_names field with fn_descs. (GOMP_OFFLOAD_load_image): Adjust for change in function table data. (GOMP_OFFLOAD_openacc_parallel): Adjust for change in dimension passing. * oacc-host.c (host_openacc_exec): Adjust for change in dimension passing. gcc/ * config/nvptx/nvptx.c: Include omp-low.h and gomp-constants.h. (nvptx_record_offload_symbol): Record function execution geometry. * config/nvptx/mkoffload.c (process): Include launch geometry in function data. * omp-low.c (oacc_launch_pack): New. (replace_oacc_fn_attrib): New. (set_oacc_fn_attrib): New. (get_oacc_fn_attrib): New. (expand_omp_target): Create keyed varargs for GOACC_parallel call generation. * omp-low.h (get_oacc_fn_attrib): Declare. * builtin-types.def (DEF_FUNCTION_TyPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. * tree.h (OMP_CLAUSE_EXPR): New. * omp-builtins.def (BUILT_IN_GOACC_PARALLEL): Change target fn name. gcc/lto/ * lto-lang.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. gcc/c-family/ * c-common.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. gcc/fortran/ * f95-lang.c (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. * types.def (DEF_FUNCTION_TYPE_VAR_6): New. (DEF_FUNCTION_TYPE_VAR_11): Delete. Index: include/gomp-constants.h === --- include/gomp-constants.h (revision 227862) +++ include/gomp-constants.h (working copy) @@ -115,11 +115,33 @@ enum gomp_map_kind /* Versions of libgomp and device-specific plugins. */ #define GOMP_VERSION 0 -#define GOMP_VERSION_NVIDIA_PTX 0 +#define GOMP_VERSION_NVIDIA_PTX 1 #define GOMP_VERSION_INTEL_MIC 0 #define GOMP_VERSION_PACK(LIB, DEV) (((LIB) << 16) | (DEV)) #define GOMP_VERSION_LIB(PACK) (((PACK) >> 16) & 0x) #define GOMP_VERSION_DEV(PACK) ((PACK) & 0x) +#define GOMP_DIM_GANG 0 +#define GOMP_DIM_WORKER 1 +#define GOMP_DIM_VECTOR 2 +#define GOMP_DIM_MAX 3 +#define
[patch] Only do shrink_to_fit() when exceptions enabled
When exceptions are disabled a failed allocation while trying to shrink_to_fit() will abort the program. Since shrink_to_fit() is a non-binding request we should just ignore it rather than risk taking down the whole process. Tested powerpc64le-linux, committed to trunk. commit 13cf19282acf42a52d5eacd2c293a944bd3e5ebe Author: Jonathan WakelyDate: Thu Sep 17 00:07:33 2015 +0100 Only do shrink_to_fit() when exceptions enabled * include/bits/allocator.h (__shrink_to_fit_aux ::_S_do_it): Do nothing if exceptions are disabled. * include/bits/basic_string.h (basic_string::shrink_to_fit): Likewise. diff --git a/libstdc++-v3/include/bits/allocator.h b/libstdc++-v3/include/bits/allocator.h index 6fd3214..0131521 100644 --- a/libstdc++-v3/include/bits/allocator.h +++ b/libstdc++-v3/include/bits/allocator.h @@ -209,15 +209,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION static bool _S_do_it(_Tp& __c) noexcept { - __try +#if __cpp_exceptions + try { _Tp(__make_move_if_noexcept_iterator(__c.begin()), __make_move_if_noexcept_iterator(__c.end()), __c.get_allocator()).swap(__c); return true; } - __catch(...) + catch(...) { return false; } +#else + return false; +#endif } }; #endif diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h index e6e7bb5..b5e7e36 100644 --- a/libstdc++-v3/include/bits/basic_string.h +++ b/libstdc++-v3/include/bits/basic_string.h @@ -833,13 +833,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11 void shrink_to_fit() noexcept { +#if __cpp_exceptions if (capacity() > size()) { - __try + try { reserve(0); } - __catch(...) + catch(...) { } } +#endif } #endif @@ -3282,12 +3284,14 @@ _GLIBCXX_END_NAMESPACE_CXX11 void shrink_to_fit() _GLIBCXX_NOEXCEPT { +#if __cpp_exceptions if (capacity() > size()) { - __try + try { reserve(0); } - __catch(...) + catch(...) { } +#endif } } #endif
Re: [C++] Coding rule enforcement
>>>+ else if (warn_multiple_inheritance) >>>+warning (OPT_Wmultiple_inheritance, >>>+ "%qT defined with multiple direct bases", ref); >>You don't need to guard the warning with a check of the warning flag; warning >>will only give the warning if the option is enabled. >the spelling mistake on the option I used to experiment on didn't help, but I >>discovered one must have the Var clause in the options file too, which I >didn't expect. >Anyway, fixed now. ok? Note that there are a couple of places where 'namespace' is misspelled as 'namepace' in the patch.
Re: [patch] libstdc++/65142 Check read() result in std::random_device.
On 15/09/15 12:47 +0100, Jonathan Wakely wrote: On 11/09/15 14:44 +0100, Jonathan Wakely wrote: We should not silently ignore a failure to read from the random device. Tested powerpc64le-linux, committed to trunk. I'm going to commit this to the gcc-5 branch too. commit 2d2f7012dc3744dafef0de94dd845bd190253dbd Author: Jonathan WakelyDate: Fri Feb 20 17:29:50 2015 + Check read() result in std::random_device. PR libstdc++/65142 * src/c++11/random.cc (random_device::_M_getval()): Check read result. diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc index edf900f..1d102c7 100644 --- a/libstdc++-v3/src/c++11/random.cc +++ b/libstdc++-v3/src/c++11/random.cc @@ -130,13 +130,17 @@ namespace std _GLIBCXX_VISIBILITY(default) #endif result_type __ret; + #ifdef _GLIBCXX_HAVE_UNISTD_H -read(fileno(static_cast (_M_file)), -static_cast (&__ret), sizeof(result_type)); +auto e = read(fileno(static_cast (_M_file)), + static_cast (&__ret), sizeof(result_type)); #else -std::fread(static_cast (&__ret), sizeof(result_type), - 1, static_cast (_M_file)); +auto e = std::fread(static_cast (&__ret), sizeof(result_type), + 1, static_cast (_M_file)); #endif +if (e != sizeof(result_type)) + __throw_runtime_error(__N("random_device could not read enough bytes")); + return __ret; } Florian pointed out that this code should handle short reads (or EINTR) by retrying in a loop, so here's another attempt to fix it. This also fixes the bug I introduced in the std::fread() case where it expects e == sizeof(result_type) but fread will only return 0 or 1. We could loop in the fread case too, but I'm not doing that. If someone on non-POSIX targets cares enough they can make that change later. Any comments on this version? Committed to trunk. commit 6700c8c652da94332562fff465a1569d8fa9c94d Author: Jonathan Wakely Date: Tue Sep 15 11:02:42 2015 +0100 Fix handling of short reads in std::random_device PR libstdc++/65142 * src/c++11/random.cc (random_device::_M_getval()): Retry after short reads. diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc index 1d102c7..f1d6125 100644 --- a/libstdc++-v3/src/c++11/random.cc +++ b/libstdc++-v3/src/c++11/random.cc @@ -130,16 +130,26 @@ namespace std _GLIBCXX_VISIBILITY(default) #endif result_type __ret; - +void* p = &__ret; +size_t n = sizeof(result_type); #ifdef _GLIBCXX_HAVE_UNISTD_H -auto e = read(fileno(static_cast (_M_file)), - static_cast (&__ret), sizeof(result_type)); +do + { + const int e = read(fileno(static_cast (_M_file)), p, n); + if (e > 0) + { + n -= e; + p = static_cast (p) + e; + } + else if (e != -1 || errno != EINTR) + __throw_runtime_error(__N("random_device could not be read")); + } +while (n > 0); #else -auto e = std::fread(static_cast (&__ret), sizeof(result_type), - 1, static_cast (_M_file)); +const size_t e = std::fread(p, n, 1, static_cast (_M_file)); +if (e != 1) + __throw_runtime_error(__N("random_device could not be read")); #endif -if (e != sizeof(result_type)) - __throw_runtime_error(__N("random_device could not read enough bytes")); return __ret; }
Re: dejagnu version update?
On 16/09/15 17:36, Jeff Law wrote: > On 09/16/2015 10:25 AM, Ramana Radhakrishnan wrote: >> >> >> On 16/09/15 17:14, Mike Stump wrote: >>> On Sep 16, 2015, at 12:29 AM, Andreas Schwab>>> wrote: Mike Stump writes: > The software presently works with 1.4.4 and there aren’t any > changes that require anything newer. SLES 12 has 1.4.4. >>> >>> Would be nice to cover them as well, but their update schedule, 3-4 >>> years, means that their next update is 2018. They didn’t update to >>> a 3 year old stable release of dejagnu for their last OS, meaning >>> they are on a > 7 year update cycle. I love embedded and really >>> long term support cycles (20 years), but, don’t think we should >>> cater to the 20 year cycle just yet. :-) Since 7 is substantially >>> longer than 2, I don’t think we should worry about it. If they had >>> updated at the time, they would have had 3 years of engineering and >>> testing before the release and _had_ 1.5. >>> >> >> Sorry about the obvious (possibly dumb) question. >> >> Can't we just import a copy of dejagnu each year and install it as >> part of the source tree ? I can't imagine installing dejagnu is >> adding a huge amount of time to build and regression test time ? >> Advantage is that everyone is guaranteed to be on the same version. I >> fully expect resistance due to specific issues with specific versions >> of tcl and expect, but if folks aren't aware of this . > That should work -- certainly that's the way we used to do things at > Cygnus. Some of that code may have bitrotted as single tree builds have > fallen out-of-favor through the years. > > As to whether or not its a good idea. I'm torn -- I don't like copying > code from other repos because of the long term maintenance concerns. > > I'd rather just move to 1.5 and get on with things. If some systems > don't have a new enough version, I'm comfortable telling developers on > those platforms that they need to update. It's not like every *user* > needs dejagnu, it's just for the testing side of things. > > > jeff I don't see it as a major issue to have your own private build of dejagnu rather than the system supplied one. The only local change you need is to add it to the front of your path before testing. Dejagnu does not heavily depend on system libraries, it is not built directly into GCC is pretty independent on the version of expect that you have on your machine (likely the system version will serve fine). So why don't we just migrate to the latest and greatest version as our standard and be done with these old versions that are lying around? R.
Re: [PATCH] start of rewrite of unordered_{set,map}
Oops, I meant to reply to the lists, not just Geoff ... On 16/09/15 18:35 +0100, Jonathan Wakely wrote: On 16/09/15 10:08 -0700, Geoff Pike wrote: Also, to clarify, I am primarily seeking high-level comments; I am new here and don't want to waste anybody's time. Hi Geoff, This looks very interesting indeed, but we've just gone through one big ABI break earlier this year and my nerves and our users probably can't take another one in the near future! That means *replacing* the current containers is unlikely in the short term, but we could certainly offer alternatives (as e.g. __gnu_cxx::unordered_map instead of std::unordered_map), and could maybe have a configure option that would make them the default. So I don't want to discourage you from working on this, but would have been a lot more positive about the proposed changes 18 months ago :-) P.S. While I have your attention, could I ask you to comment on https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55815 ? I think I tried to reach you about it a while back, via someone else at Google, but failed. That's another case where we have probably missed our chance to change the default (i.e. std::hash) but we could still offer a __gnu_cxx::__sip_hash functor as an extension.
Re: [PATCH, rs6000] Add expansions for min/max vector reductions
On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote: > On Wed, 16 Sep 2015, Alan Lawrence wrote: > > > On 16/09/15 17:10, Bill Schmidt wrote: > > > > > > On Wed, 2015-09-16 at 16:29 +0100, Alan Lawrence wrote: > > > > On 16/09/15 15:28, Bill Schmidt wrote: > > > > > 2015-09-16 Bill Schmidt> > > > > > > > > > * config/rs6000/altivec.md (UNSPEC_REDUC_SMAX, > > > > > UNSPEC_REDUC_SMIN, > > > > > UNSPEC_REDUC_UMAX, UNSPEC_REDUC_UMIN, > > > > > UNSPEC_REDUC_SMAX_SCAL, > > > > > UNSPEC_REDUC_SMIN_SCAL, UNSPEC_REDUC_UMAX_SCAL, > > > > > UNSPEC_REDUC_UMIN_SCAL): New enumerated constants. > > > > > (reduc_smax_v2di): New define_expand. > > > > > (reduc_smax_scal_v2di): Likewise. > > > > > (reduc_smin_v2di): Likewise. > > > > > (reduc_smin_scal_v2di): Likewise. > > > > > (reduc_umax_v2di): Likewise. > > > > > (reduc_umax_scal_v2di): Likewise. > > > > > (reduc_umin_v2di): Likewise. > > > > > (reduc_umin_scal_v2di): Likewise. > > > > > (reduc_smax_v4si): Likewise. > > > > > (reduc_smin_v4si): Likewise. > > > > > (reduc_umax_v4si): Likewise. > > > > > (reduc_umin_v4si): Likewise. > > > > > (reduc_smax_v8hi): Likewise. > > > > > (reduc_smin_v8hi): Likewise. > > > > > (reduc_umax_v8hi): Likewise. > > > > > (reduc_umin_v8hi): Likewise. > > > > > (reduc_smax_v16qi): Likewise. > > > > > (reduc_smin_v16qi): Likewise. > > > > > (reduc_umax_v16qi): Likewise. > > > > > (reduc_umin_v16qi): Likewise. > > > > > (reduc_smax_scal_): Likewise. > > > > > (reduc_smin_scal_): Likewise. > > > > > (reduc_umax_scal_): Likewise. > > > > > (reduc_umin_scal_): Likewise. > > > > > > > > You shouldn't need the non-_scal reductions. Indeed, they shouldn't be > > > > used if > > > > the _scal are present. The non-_scal's were previously defined as > > > > producing a > > > > vector with one element holding the result and the other elements all > > > > zero, and > > > > this was only ever used with a vec_extract immediately after; the _scal > > > > pattern > > > > now includes the vec_extract as well. Hence the non-_scal patterns are > > > > deprecated / considered legacy, as per md.texi. > > > > > > Thanks -- I had misread the description of the non-scalar versions, > > > missing the part where the other elements are zero. What I really > > > want/need is an optab defined as computing the maximum value in all > > > elements of the vector. > > > > Yes, indeed. It seems reasonable to me that this would coexist with an optab > > which computes only a single value (i.e. a scalar). > > So just to clarify - you need to reduce the vector with max to a scalar > but want the (same) result in all vector elements? Yes. Alan Hayward's cond-reduction patch is set up to perform a reduction to scalar, followed by a scalar broadcast to get the value into all positions. It happens that our most efficient expansion to reduce to scalar will naturally produce the value in all positions. Using the reduc_smax_scal_ expander, we are required to extract this into a scalar and then perform the broadcast. Those two operations are unnecessary. We can clean up after the fact, but the cost model will still reflect the unnecessary activity. > > > At that point it might be appropriate to change the cond-reduction code to > > generate the reduce-to-vector in all cases, and optabs.c expand it to > > reduce-to-scalar + broadcast if reduce-to-vector was not available. Along > > with > > the (parallel) changes to cost model already proposed, does that cover all > > the > > cases? It does add a new tree code, yes, but I'm feeling that could be > > justified if we go down this route. > > I'd rather have combine recognize an insn that does both (if it > exists). As I understand powerpc doesn't have reduction to scalar > (I think no ISA actually has this, but all ISAs have reduce to > one vector element plus a cheap way of extraction (effectively a subreg)) > but it's reduction already has all result vector elements set to the > same value (as opposed to having some undefined or zero or whatever). The extraction is not necessarily so cheap. For floating-point values in a register, we can indeed use a subreg. But in this case we are talking about integers, and an extract must move them from the vector registers to the GPRs. With POWER8, we can do this with a 5-cycle move instruction. Prior to that, we can only do this with a very expensive path through memory, as previous processors had no direct path between the vector and integer registers. > > > However, another point that springs to mind: if you reduce a loop containing > > OR or MUL expressions, the vect_create_epilog_for_reduction reduces these > > using shifts, and I think will also use
Re: [PATCH, fortran] PR 53379 Backtrace on error termination
On Thu, Sep 17, 2015 at 7:08 AM, David Edelsohnwrote: > On Wed, Sep 9, 2015 at 12:12 AM, Ian Lance Taylor wrote: >> Mike Stump writes: >> >>> Not a big issue, but slightly better if (O_CLOEXEC>>32) != 0 is also >>> true. See, if AIX should ever define this to a sensible value, the >>> above would disappear the feature. However, if they did, then this >>> expression should then be false. >> >> Yes, I think this might be even better in code. How about something >> like >> >> /* On some versions of AIX O_CLOEXEC does not fit in int, so use a >> cast to force it. */ >> descriptor = open (filename, (int) (O_RDONLY | O_BINARY | O_CLOEXEC)); >> >> Does that work on AIX? > > Any decision about this patch? This patch is OK if it works. Ian
Re: [RFC] Masking vectorized loops with bound not aligned to VF.
2015-09-16 15:30 GMT+03:00 Richard Biener: > On Mon, 14 Sep 2015, Kirill Yukhin wrote: > >> Hello, >> I'd like to initiate discussion on vectorization of loops which >> boundaries are not aligned to VF. Main target for this optimization >> right now is x86's AVX-512, which features per-element embedded masking >> for all instructions. The main goal for this mail is to agree on overall >> design of the feature. >> >> This approach was presented @ GNU Cauldron 2015 by Ilya Enkovich [1]. >> >> Here's a sketch of the algorithm: >> 1. Add check on basic stmts for masking: possibility to introduce index >> vector and >> corresponding mask >> 2. At the check if statements are vectorizable we additionally check if >> stmts >> need and can be masked and compute masking cost. Result is stored in >> `stmt_vinfo`. >> We are going to mask only mem. accesses, reductions and modify mask >> for already >> masked stmts (mask load, mask store and vect. condition) > > I think you also need to mask divisions (for integer divide by zero) and > want to mask FP ops which may result in NaNs or denormals (because that's > generally to slow down execution a lot in my experience). > > Why not simply mask all stmts? Hi, Statement masking may be not free. Especially if we need to transform mask somehow to do it. It also may be unsupported on a platform (e.g. for AVX-512 not all instructions support masking) but still not be a problem to mask a loop. BTW for AVX-512 masking doesn't boost performance even if we have some special cases like NaNs. We don't consider exceptions in vector code (and it seems to be a case now?) otherwise we would need to mask them also. > >> 3. Make a decision about masking: take computed costs and est. iterations >> count >> into consideration >> 4. Modify prologue/epilogue generation according decision made at >> analysis. Three >> options available: >> a. Use scalar remainder >> b. Use masked remainder. Won't be supported in first version >> c. Mask main loop >> 5.Support vectorized loop masking: >> - Create stmts for mask generation >> - Support generation of masked vector code (create generic vector code >> then >> patch it w/ masks) >> - Mask loads/stores/vconds/reductions only >> >> In first version (targeted v6) we're not going to support 4.b and loop >> mask pack/unpack. No `pack/unpack` means that masking will be supported >> only for types w/ the same size as index variable > > This means that if ncopies for any stmt is > 1 masking won't be supported, > right? (you'd need two or more different masks) We don't think it is a very important feature to have in initial version. It can be added later and shouldn't affect overall implementation design much. BTW currently masked loads and stores don't support masks of other sizes and don't do masks pack/unpack. > >> >> [1] - >> https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile=view=Vectorization+for+Intel+AVX-512.pdf >> >> What do you think? > > There was the idea some time ago to use single-iteration vector > variants for prologues/epilogues by simply overlapping them with > the vector loop (and either making sure to mask out the overlap > area or make sure the result stays the same). This kind-of is > similar to 4b and thus IMHO it's better to get 4b implemented > rather than trying 4c. So for example > > int a[]; > for (i=0; i < 13; ++i) >a[i] = i; > > would be vectorized (with v4si) as > > for (i=0; i < 13 / 4; ++i) >((v4si *)a)[i] = { ... }; > *(v4si *)([9]) = { ... }; > > where the epilogue store of course would be unaligned. The masked > variant can avoid the data pointer adjustment and instead use a masked > store. > > OTOH it might be that the unaligned scheme is as efficient as the > masked version. Only the masked version is more trivially correct, > data dependences can make the above idea not work without masking > out stores like for > > for (i=0; i < 13; ++i) >a[i] = a[i+1]; > > obviously the overlapping iterations in the epilogue would > compute bogus values. To avoid this we can merge the result > with the previously stored values (using properly computed masks) > before storing it. > > Basically both 4b and the above idea need to peel a vector > iteration and "modify" it. The same trick can be applied to > prologue loops of course. > > Any chance you can try working on 4b instead? It also feels > like it would need less hacks throughout the vectorizer > (basically post-processing the generated vector loop). > > If 4b is implemented I don't think 4c is worth doing. I agree 4b is a more universal approach and should cover more cases. But we consider 4c is the first step to have 4b. I think significant difference of 4b from a replay you described is that in 4b masked remainder may be used to execute short trip count loops which mean we can execute masked remainder without actually executing main
Re: [Patch, fortran] PR40054 and PR63921 - Implement pointer function assignment - redux
Le 06/09/2015 18:40, Paul Richard Thomas a écrit : It helps to attach the patch :-) Paul On 6 September 2015 at 13:42, Paul Richard Thomaswrote: Dear All, The attached patch more or less implements the assignment of expressions to the result of a pointer function. To wit: my_ptr_fcn (arg1, arg2...) = expr arg1 would usually be the target, pointed to by the function. The patch parses these statements and resolves them into: temp_ptr => my_ptr_fcn (arg1, arg2...) temp_ptr = expr I say more or less implemented because I have ducked one of the headaches here. At the end of the specification block, there is an ambiguity between statement functions and pointer function assignments. I do not even try to resolve this ambiguity and require that there be at least one other type of executable statement before these beasts. This can undoubtedly be fixed but the effort seems to me to be unwarranted at the present time. This version of the patch extends the coverage of allowed rvalues to any legal expression. Also, all the problems with error.c have been dealt with by Manuel's patch. I am grateful to Dominique for reminding me of PR40054 and pointing out PR63921. After a remark of his on #gfortran, I fixed the checking of the standard to pick up all the offending lines with F2003 and earlier. Bootstraps and regtests on FC21/x86_64 - OK for trunk? Hello Paul, I'm mostly concerned about the position where the code rewriting happens. Details below. Mikael submit_2.diff Index: gcc/fortran/parse.c === *** gcc/fortran/parse.c (revision 227508) --- gcc/fortran/parse.c (working copy) *** decode_statement (void) *** 356,362 --- 357,371 match (NULL, gfc_match_assignment, ST_ASSIGNMENT); match (NULL, gfc_match_pointer_assignment, ST_POINTER_ASSIGNMENT); + + if (in_specification_block) + { match (NULL, gfc_match_st_function, ST_STATEMENT_FUNCTION); + } + else if (!gfc_notification_std (GFC_STD_F2008)) + { + match (NULL, gfc_match_ptr_fcn_assign, ST_ASSIGNMENT); + } As match exits the function upon success, I think it makes sense to move match (... gfc_match_ptr_fcn_assign ...) out of the else, namely: if (in_specification_block) { /* match statement function */ } /* match pointer fonction assignment */ so that non-ambiguous cases are recognized with gfc_match_ptr_fcn_assign. Non-ambiguous cases are for example the ones where one of the function arguments is a non-variable, or a variable with a subreference, or when there is one keyword argument. Example (rejected with unclassifiable statement): program p integer, parameter :: b = 3 integer, target:: a = 2 func(arg=b) = 1 if (a /= 1) call abort func(b + b - 3) = -1 if (a /= -1) call abort contains function func(arg) result(r) integer, pointer :: r integer :: arg if (arg == 3) then r => a else r => null() end if end function func end program p Index: gcc/fortran/resolve.c === *** gcc/fortran/resolve.c (revision 227508) --- gcc/fortran/resolve.c (working copy) *** generate_component_assignments (gfc_code *** 10133,10138 --- 10141,10205 } + /* F2008: Pointer function assignments are of the form: + ptr_fcn (args) = expr +This function breaks these assignments into two statements: + temporary_pointer => ptr_fcn(args) + temporary_pointer = expr */ + + static bool + resolve_ptr_fcn_assign (gfc_code **code, gfc_namespace *ns) + { + gfc_expr *tmp_ptr_expr; + gfc_code *this_code; + gfc_component *comp; + gfc_symbol *s; + + if ((*code)->expr1->expr_type != EXPR_FUNCTION) + return false; + + /* Even if standard does not support this feature, continue to build + the two statements to avoid upsetting frontend_passes.c. */ I don't mind this, but maybe we should return false at the end, when an error has been emitted? + gfc_notify_std (GFC_STD_F2008, "Pointer procedure assignment at " + "%L", &(*code)->loc); + + comp = gfc_get_proc_ptr_comp ((*code)->expr1); + + if (comp) + s = comp->ts.interface; + else + s = (*code)->expr1->symtree->n.sym; + + if (s == NULL || !s->result->attr.pointer) + { + gfc_error ("F2008: The function result at %L must have " +"the pointer attribute.", &(*code)->expr1->where); + /* Return true because we want a break after the call. */ Hum, I would rather not do this if possible. Do we really need the break? + return true; + } + + tmp_ptr_expr = get_temp_from_expr ((*code)->expr2, ns); + + /* get_temp_from_expression is set up for ordinary assignments. To that + end, where array bounds are not known, arrays are made allocatable. + Change the temporary to a pointer
[patch] Remove redundant conditional expressions in
This is silly and makes it harder to read: return _M_value != 0 ? true : false; Tested powerpc64le-linux, committed to trunk. commit ca8d1e0e92810ac72b337247c10af1de1de465d4 Author: Jonathan WakelyDate: Thu Sep 17 15:07:24 2015 +0100 Remove redundant conditional expressions in * include/std/system_error (error_code::operator bool(), error_condition::operator bool()): Remove redundant conditional expression. diff --git a/libstdc++-v3/include/std/system_error b/libstdc++-v3/include/std/system_error index 92f8af9..cc82bdf 100644 --- a/libstdc++-v3/include/std/system_error +++ b/libstdc++-v3/include/std/system_error @@ -181,7 +181,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return category().message(value()); } explicit operator bool() const noexcept -{ return _M_value != 0 ? true : false; } +{ return _M_value != 0; } // DR 804. private: @@ -257,7 +257,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { return category().message(value()); } explicit operator bool() const noexcept -{ return _M_value != 0 ? true : false; } +{ return _M_value != 0; } // DR 804. private:
Re: [PATCH] Fix PR64078
On Wed, Sep 09, 2015 at 07:48:15PM +0200, Bernd Edlinger wrote: > Hi, > > On Wed, 9 Sep 2015 09:31:33, Jeff Law wrote: > > You could probably make the function static or change its visibility via > > a function attribute (there's a visibility attribute which can take the > > values default, hidden protected or internal). Default visibility > > essentially means the function can be overridden. I think changing it > > to "protected" might work. Note if we do that, we may need some kind of > > target selector on the test since not all targets support the various > > visibility attributes. > > > > Yes, it works both ways: static works, and __attribute__ ((visibility > ("protected"))) works too: > > make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c > --target_board='unix{-fpic,-mcmodel=medium,-fpic\ > -mcmodel=medium,-mcmodel=large,-fpic\ -mcmodel=large}'" > > has all tests passed, but.. > > make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c > --target_board='unix{-fno-inline}'" > > still fails in the same way for all workarounds: inline, static, and > __attribute__ ((visibility ("protected"))). > > Maybe "static" would be preferable? Yeah, I'd go with static if that helps. I'd rather avoid playing games with visibility. Marek
Go compiler patch: issue receive type errors earlier
This patch by Chris Manghane changes the Go frontend to issue type errors earlier for a receive operation. This fixes https://golang.org/issue/12323 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 227834) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -1cb26dc898bda1e85f4dd2ee204adbce792e4813 +e069d4417a692c1261df99fe3323277e1a0193d2 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 227834) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -13502,9 +13502,14 @@ Expression::make_heap_expression(Express Type* Receive_expression::do_type() { + if (this->is_error_expression()) +return Type::make_error_type(); Channel_type* channel_type = this->channel_->type()->channel_type(); if (channel_type == NULL) -return Type::make_error_type(); +{ + this->report_error(_("expected channel")); + return Type::make_error_type(); +} return channel_type->element_type(); } @@ -13516,6 +13521,7 @@ Receive_expression::do_check_types(Gogo* Type* type = this->channel_->type(); if (type->is_error()) { + go_assert(saw_errors()); this->set_is_error(); return; } Index: gcc/go/gofrontend/statements.cc === --- gcc/go/gofrontend/statements.cc (revision 227696) +++ gcc/go/gofrontend/statements.cc (working copy) @@ -3856,7 +3856,10 @@ Switch_statement::do_lower(Gogo*, Named_ if (this->val_ != NULL && (this->val_->is_error_expression() || this->val_->type()->is_error())) -return Statement::make_error_statement(loc); +{ + go_assert(saw_errors()); + return Statement::make_error_statement(loc); +} if (this->val_ != NULL && this->val_->type()->integer_type() != NULL
[RFC][PATCH] Preferred rename register in regrename pass
Hi, We came across a situation for MIPS64 where moves for sign-extension were not converted into a nop because of IRA spilled some of the allocnos and assigned different hard register for the output operand in the move. LRA is not fixing this up as most likely the move was not introduced by the LRA itself. I found it hard to fix this in LRA and looked at an alternative solution where regrename pass appeared to be the best candidate. The patch below introduces a notion of a preferred rename register and attempts to use the output register for an input register iff the input register dies in an instruction. The preferred register is validated and in the case it fails to be validated, it falls back to the old technique of finding the best rename register. Of course, it has slightly limited scope of use as it's not enabled be default, however, when targeting performance one is likely to enable it via -funroll-loops or -fpeel-loops. I did some experiments with -funroll-loops on x86_64-unknown-linux-gnu and the code size improved almost 0.4% on average case. I haven't done an extensive performance testing but it appeared SPEC2006 had some minor improvement on average, which could be real improvement or just noise. On MIPS64 with -funroll-loops, there were a number of cases where the unnecessary moves turned into a nop in CSiBE. MIPS32 also marginally improved but to a lower degree. The patch successfully passed x86_64-unknown-linux-gnu, mips-mti-linux-gnu and mips-img-linux-gnu. I'm not sure if this is something that should be enabled by default for everyone or a target hook should be added. Any other comments/suggestions? Regards, Robert gcc/ * regrename.c (find_preferred_rename_reg): New function. (record_operand_use): Remove assertion. Allocate or resize heads and chains vectors, if necessary. (find_best_rename_reg): Use the new function and validate chosen register. (build_def_use): Don't allocate and initialize space of size 0. (regrename_finish): Free heads and chains vectors. (regrename_optimize): Pass true to initializing function. * regrename.h (struct operand_rr_info): Replace arrays of heads and chains with vectors. --- gcc/regrename.c | 86 + gcc/regrename.h | 4 +-- 2 files changed, 82 insertions(+), 8 deletions(-) diff --git a/gcc/regrename.c b/gcc/regrename.c index c328c1b..90fee98 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -174,6 +174,51 @@ dump_def_use_chain (int from) } } +/* Return a preferred rename register for HEAD. */ + +static int +find_preferred_rename_reg (du_head_p head) +{ + struct du_chain *this_du; + int preferred_reg = -1; + + for (this_du = head->first; this_du; this_du = this_du->next_use) +{ + rtx note; + insn_rr_info *p; + + /* The preferred rename register is an output register iff an input +register dies in an instruction but the candidate must be validated by +check_new_reg_p. */ + for (note = REG_NOTES (this_du->insn); note; note = XEXP (note, 1)) + if (insn_rr.exists() + && REG_NOTE_KIND (note) == REG_DEAD + && REGNO (XEXP (note, 0)) == head->regno + && (p = _rr[INSN_UID (this_du->insn)]) + && p->op_info) + { + int i; + for (i = 0; i < p->op_info->n_chains; i++) + { + struct du_head *next_head = p->op_info->heads[i]; + if (head != next_head) + { + preferred_reg = next_head->regno; + if (dump_file) + fprintf (dump_file, + "Chain %s (%d) has preferred rename register" + " %s for insn %d [%s]\n", + reg_names[head->regno], head->id, + reg_names[preferred_reg], + INSN_UID (this_du->insn), + reg_class_names[this_du->cl]); + } + } + } +} + return preferred_reg; +} + static void free_chain_data (void) { @@ -206,7 +251,16 @@ record_operand_use (struct du_head *head, struct du_chain *this_du) { if (cur_operand == NULL) return; - gcc_assert (cur_operand->n_chains < MAX_REGS_PER_ADDRESS); + + if (!cur_operand->heads.exists ()) +cur_operand->heads.create (0); + if (!cur_operand->chains.exists ()) +cur_operand->chains.create (0); + if (cur_operand->heads.length () <= (unsigned) cur_operand->n_chains) +cur_operand->heads.safe_grow_cleared (cur_operand->n_chains + 1); + if (cur_operand->chains.length () <= (unsigned) cur_operand->n_chains) +cur_operand->chains.safe_grow_cleared (cur_operand->n_chains + 1); + cur_operand->heads[cur_operand->n_chains] = head; cur_operand->chains[cur_operand->n_chains++] = this_du; } @@ -355,6 +409,7 @@
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 17/09/15 12:16 +0100, Jonathan Wakely wrote: On 16/09/15 17:42 -0600, Martin Sebor wrote: I see now the first exists test will detect symlink loops in the original path. But I'm not convinced there isn't a corner case that's subject to a TOCTOU race condition between the first exists test and the while loop during which a symlink loop can be introduced. Suppose we call the function with /foo/bar as an argument and the path exists and contains no symlinks. result is / and cmpts is set to { foo, bar }. Just as the loop is entered, /foo/bar is replaced with a symlink containing /foo/bar. The loop then proceeds like so: 1. The first iteration removes foo from cmpts and sets result to /foo. cmpts is { bar }. 2. The second iteration removes bar from cmpts, sets result to /foo/bar, determines it's a symlink, reads its contents, sees it's an absolute pathname and replaces result with /. It then inserts the symlink's components { foo, bar } into cmpts. cmpts becomes { foo, bar }. exists(result) succeeds. 3. The next iteration of the loop has the same initial state as the first. But I could have very easily missed something that takes care of this corner case. If I did, sorry for the false alarm! No, you're right. The TS says such filesystem races are undefined: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4099.html#fs.race.behavior but it would be nice to fail gracefully rather than DOS the application. The simplest approach would be to increment a counter every time we follow a symlink, and if it reaches some limit decide something is wrong and fail with ELOOP. I don't see how anything else can be 100% bulletproof, because a truly evil attacker could just keep altering the contents of symlinks so we keep ping-ponging between two or more paths. If we keep track of paths we've seen before the attacker could just keep changing the contents to a unique path each time, that initially exists as a file, but by the time we get to is_symlink() its become a symlink to a new path. So if we use a counter, what's a sane maximum? Is MAXSYMLINKS in the value the kernel uses? 20 seems quite low, I was thinking of a much higher number. This patch sets ELOOP after following 40 symlinks. I can also move the exists(result, ec) check to the end, because the is_symlink(result, ec) call will already check for existence on every iteration that adds a component to the result. I've also simplified the error handling (when exists(p, ec) fails it sets ENOENT anyway) and moved !ec into the loop condition, rather than using 'fail(e); break;' on errors. I'm quite happy with this version now. diff --git a/libstdc++-v3/src/filesystem/ops.cc b/libstdc++-v3/src/filesystem/ops.cc index b5c8eb9..95146bf 100644 --- a/libstdc++-v3/src/filesystem/ops.cc +++ b/libstdc++-v3/src/filesystem/ops.cc @@ -98,6 +98,7 @@ fs::canonical(const path& p, const path& base, error_code& ec) { const path pa = absolute(p, base); path result; + #ifdef _GLIBCXX_USE_REALPATH char_ptr buf{ nullptr }; # if _XOPEN_VERSION < 700 @@ -119,18 +120,9 @@ fs::canonical(const path& p, const path& base, error_code& ec) } #endif - auto fail = [, ](int e) mutable { - if (!ec.value()) - ec.assign(e, std::generic_category()); - result.clear(); - }; - if (!exists(pa, ec)) -{ - fail(ENOENT); - return result; -} - // else we can assume no unresolvable symlink loops +return result; + // else: we know there are (currently) no unresolvable symlink loops result = pa.root_path(); @@ -138,18 +130,17 @@ fs::canonical(const path& p, const path& base, error_code& ec) for (auto& f : pa.relative_path()) cmpts.push_back(f); - while (!cmpts.empty()) + int max_allowed_symlinks = 40; + + while (!cmpts.empty() && !ec) { path f = std::move(cmpts.front()); cmpts.pop_front(); if (f.compare(".") == 0) { - if (!is_directory(result, ec)) - { - fail(ENOTDIR); - break; - } + if (!is_directory(result, ec) && !ec) + ec.assign(ENOTDIR, std::generic_category()); } else if (f.compare("..") == 0) { @@ -166,27 +157,30 @@ fs::canonical(const path& p, const path& base, error_code& ec) if (is_symlink(result, ec)) { path link = read_symlink(result, ec); - if (!ec.value()) + if (!ec) { - if (link.is_absolute()) + if (--max_allowed_symlinks == 0) + ec.assign(ELOOP, std::generic_category()); + else { - result = link.root_path(); - link = link.relative_path(); + if (link.is_absolute()) + { + result = link.root_path(); + link = link.relative_path(); + } +
Re: [C++] Coding rule enforcement
On 09/16/15 10:23, Jason Merrill wrote: On 09/16/2015 08:02 AM, Nathan Sidwell wrote: + else if (warn_multiple_inheritance) +warning (OPT_Wmultiple_inheritance, + "%qT defined with multiple direct bases", ref); You don't need to guard the warning with a check of the warning flag; warning will only give the warning if the option is enabled. the spelling mistake on the option I used to experiment on didn't help, but I discovered one must have the Var clause in the options file too, which I didn't expect. Anyway, fixed now. ok? nathan 2015-09-17 Nathan Sidwellc-family/ * c.opt (Wmultiple-inheritance, Wvirtual-inheritance, Wtemplates, Wnamespaces): New C++ warnings. cp/ * decl.c (xref_basetypes): Check virtual and/or multiple inheritance warning.. * parser.c (cp_parser_namespace_definition): Check namespaces warning. * pt.c (push_template_decl_real): Check templates warning. * doc/invoke.texi (-Wmultiple-inheritance, -Wvirtual-inheritance, -Wtemplates, -Wnamespaces): Document. testsuite/ * g++.dg/diagostic/disable.C: New. Index: gcc/c-family/c.opt === --- gcc/c-family/c.opt (revision 227761) +++ gcc/c-family/c.opt (working copy) @@ -573,6 +573,14 @@ Wmissing-field-initializers C ObjC C++ ObjC++ Var(warn_missing_field_initializers) Warning EnabledBy(Wextra) Warn about missing fields in struct initializers +Wmultiple-inheritance +C++ ObjC++ Var(warn_multiple_inheritance) Warning +Warn on direct multiple inheritance + +Wnamespaces +C++ ObjC++ Var(warn_namespaces) Warning +Warn on namespace definition + Wsized-deallocation C++ ObjC++ Var(warn_sized_deallocation) Warning EnabledBy(Wextra) Warn about missing sized deallocation functions @@ -610,6 +618,10 @@ Wswitch-bool C ObjC C++ ObjC++ Var(warn_switch_bool) Warning Init(1) Warn about switches with boolean controlling expression +Wtemplates +C++ ObjC++ Var(warn_templates) Warning +Warn on primary template declaration + Wmissing-format-attribute C ObjC C++ ObjC++ Alias(Wsuggest-attribute=format) ; @@ -936,6 +948,10 @@ Wvolatile-register-var C ObjC C++ ObjC++ Var(warn_volatile_register_var) Warning LangEnabledBy(C ObjC C++ ObjC++,Wall) Warn when a register variable is declared volatile +Wvirtual-inheritance +C++ ObjC++ Var(warn_virtual_inheritance) Warning +Warn on direct virtual inheritance + Wvirtual-move-assign C++ ObjC++ Var(warn_virtual_move_assign) Warning Init(1) Warn if a virtual base has a non-trivial move assignment operator Index: gcc/cp/decl.c === --- gcc/cp/decl.c (revision 227761) +++ gcc/cp/decl.c (working copy) @@ -12729,6 +12729,7 @@ xref_basetypes (tree ref, tree base_list tree binfo, base_binfo; unsigned max_vbases = 0; /* Maximum direct & indirect virtual bases. */ unsigned max_bases = 0; /* Maximum direct bases. */ + unsigned max_dvbases = 0; /* Maximum direct virtual bases. */ int i; tree default_access; tree igo_prev; /* Track Inheritance Graph Order. */ @@ -12766,12 +12767,13 @@ xref_basetypes (tree ref, tree base_list { max_bases++; if (TREE_TYPE (*basep)) - max_vbases++; + max_dvbases++; if (CLASS_TYPE_P (basetype)) max_vbases += vec_safe_length (CLASSTYPE_VBASECLASSES (basetype)); basep = _CHAIN (*basep); } } + max_vbases += max_dvbases; TYPE_MARKED_P (ref) = 1; @@ -12814,6 +12816,9 @@ xref_basetypes (tree ref, tree base_list error ("Java class %qT cannot have multiple bases", ref); return false; } + else + warning (OPT_Wmultiple_inheritance, + "%qT defined with multiple direct bases", ref); } if (max_vbases) @@ -12825,6 +12830,9 @@ xref_basetypes (tree ref, tree base_list error ("Java class %qT cannot have virtual bases", ref); return false; } + else if (max_dvbases) + warning (OPT_Wvirtual_inheritance, + "%qT defined with direct virtual base", ref); } for (igo_prev = binfo; base_list; base_list = TREE_CHAIN (base_list)) Index: gcc/cp/parser.c === --- gcc/cp/parser.c (revision 227761) +++ gcc/cp/parser.c (working copy) @@ -16798,6 +16798,8 @@ cp_parser_namespace_definition (cp_parse has_visibility = handle_namespace_attrs (current_namespace, attribs); + warning (OPT_Wnamespaces, "namepace %qD entered", current_namespace); + /* Parse the body of the namespace. */ cp_parser_namespace_body (parser); Index: gcc/cp/pt.c === --- gcc/cp/pt.c (revision 227761) +++ gcc/cp/pt.c (working copy) @@ -5088,6 +5088,8 @@ push_template_decl_real (tree decl, bool if (is_primary) { + warning (OPT_Wtemplates, "template %qD declared", decl); + if (DECL_CLASS_SCOPE_P (decl)) member_template_p = true; if
Re: [RFC][PATCH] Preferred rename register in regrename pass
On 09/17/2015 08:38 AM, Robert Suchanek wrote: Hi, We came across a situation for MIPS64 where moves for sign-extension were not converted into a nop because of IRA spilled some of the allocnos and assigned different hard register for the output operand in the move. LRA is not fixing this up as most likely the move was not introduced by the LRA itself. I found it hard to fix this in LRA and looked at an alternative solution where regrename pass appeared to be the best candidate. Yea, we've never been great at tying the source & destination of sign/zero extensions. The inherently different modes often caused the old allocator (and pre-allocator bits like regmove, and post-allocator bits like reload) to 'give up'. I'm not sure if this is something that should be enabled by default for everyone or a target hook should be added. Any other comments/suggestions? I'll let Bernd comment on the patch itself. But I would say that if you're setting up cases where we can tie the source/dest of an extension together, then it's a good thing. It'll cause more of them to turn into NOPs and it'll make the redundant extension elimination pass more effective as well. This would be something that I'd expect to benefit most architectures (obviously to varying degrees). Jeff
Re: [patch match.pd c c++]: Ignore results of 'shorten_compare' and move missing patterns in match.pd
On 09/17/2015 03:12 AM, Richard Biener wrote: I wonder if we'd do better to first add new match.pd patterns, one at a time, with tests, and evaluating them along the way by looking at the dumps or .s files across many systems. Then when we think we've got the right set, then look at what happens to those dumps/.s files if we make the changes so that shorten_compare really just emits warnings. As the shorten_compare function covers those transformations, I don't see that this is something leading to something useful. As long as we call shorten_compare on folding in FEs, we won't see those patterns in ME that easy. And even more important is, that patterns getting resolved if function gets invoked. It will tell you what will be missed from a code generation standpoint if you disable shorten_compare. And that is the best proxy we have for performance regressions related to disabling shorten_compare. ie, in a local tree, you disable shorten_compare. Then compare the code we generate with and without shorten compare. Reduce to a simple testcase. Resolve the issue with a match.pd pattern (most likely) and repeat the process. In theory at each step the number of things to deeply investigate should be diminishing while at the same time you're building match.pd patterns and tests we can include in the testsuite. And each step is likely a new patch in the patch series -- the final patch of which disables shorten_compare. It's a lot of work, but IMHO, it's necessary to help ensure we don't have code generation regressions. As said in the other reply I successfully used gcc_unreachable ()s in code in intend to remove and then inspect/fix all fallout from that during bootstrap & test ... Yeah, it's a lot of work (sometimes). But it's way easier than to investigate code changes (which may also miss cases as followup transforms may end up causing the same code to be generated). That'd be a fine approach as well, though it may not work in this case as shorten_compare would get called prior to the transformations in match.pd. Though I would certainly agree in general that your approach is a good one. jeff
RE: [PATCH] Fix PR64078
Hi, On Thu, 17 Sep 2015 10:39:04, Jeff Law wrote: > > On 09/17/2015 09:00 AM, Marek Polacek wrote: >> On Wed, Sep 09, 2015 at 07:48:15PM +0200, Bernd Edlinger wrote: >>> Hi, >>> >>> On Wed, 9 Sep 2015 09:31:33, Jeff Law wrote: You could probably make the function static or change its visibility via a function attribute (there's a visibility attribute which can take the values default, hidden protected or internal). Default visibility essentially means the function can be overridden. I think changing it to "protected" might work. Note if we do that, we may need some kind of target selector on the test since not all targets support the various visibility attributes. >>> >>> Yes, it works both ways: static works, and __attribute__ ((visibility >>> ("protected"))) works too: >>> >>> make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c >>> --target_board='unix{-fpic,-mcmodel=medium,-fpic\ >>> -mcmodel=medium,-mcmodel=large,-fpic\ -mcmodel=large}'" >>> >>> has all tests passed, but.. >>> >>> make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c >>> --target_board='unix{-fno-inline}'" >>> >>> still fails in the same way for all workarounds: inline, static, and >>> __attribute__ ((visibility ("protected"))). >>> >>> Maybe "static" would be preferable? >> >> Yeah, I'd go with static if that helps. I'd rather avoid playing games >> with visibility. > Static is certainly easier and doesn't rely on targets implementing all > the visibility capabilities. So static is probably the best approach. > That's fine for me too, so is the original patch OK for trunk with s/inline/static/ ? Thanks Bernd.
[AArch64][PATCH 4/5] Use atomic load-operate instructions for fetch-update patterns.
Hello, ARMv8.1 adds atomic swap and atomic load-operate instructions with optional memory ordering specifiers. This patch uses the ARMv8.1 atomic load-operate instructions to implement the atomic_fetch_ patterns. This patch also updates the implementation of the atomic_ patterns, which are treated as versions of the atomic_fetch_ which discard the loaded data. The general form of the code generated for an atomic_fetch_, with destination D, source S, memory address A and memory order MO, depends on whether the operation is directly supported by the instruction. If is one of PLUS, IOR or XOR, the code generated is: ld S, D, [A] where is one {add, set, eor} is one of {'', 'a', 'l', 'al'} depending on memory order MO. is one of {'', 'b', 'h'} depending on the data size. If is SUB, the code generated, with scratch register r, is: neg r, S ldadd r, D, [A] If is AND, the code generated is: not r, S ldclr r, D, [A] Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the existing aarch64_split_atomic_op function, to implement the operation using sequences built with the ARMv8 load-exclusive/store-exclusive instructions Tested the series for aarch64-none-linux-gnu with native bootstrap and make check. Also tested for aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator with +lse enabled by default. Ok for trunk? Matthew gcc/ 2015-09-17 Matthew Wahab* config/aarch64/aarch64-protos.h (aarch64_atomic_ldop_supported_p): Declare. * config/aarch64/aarch64.c (aarch64_atomic_ldop_supported_p): New. (enum aarch64_atomic_load_op_code): New. (aarch64_emit_atomic_load_op): New. (aarch64_gen_atomic_load_op): Update to support load-operate patterns. * config/aarch64/atomics.md (atomic_): Change to an expander. (aarch64_atomic_): New. (aarch64_atomic__lse): New. (atomic_fetch_): Change to an expander. (aarch64_atomic_fetch_): New. (aarch64_atomic_fetch__lse): New. gcc/testsuite/ 2015-09-17 Matthew Wahab * gcc.target/aarch64/atomic-inst-ldadd.c: New. * gcc.target/aarch64/atomic-inst-ldlogic.c: New. >From c4b8eb6d2ca62c57f4a011e06335b918f603ad2a Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Fri, 7 Aug 2015 17:10:42 +0100 Subject: [PATCH 4/5] Use atomic instructions for fetch-update patterns. Change-Id: I39759f02e61039067ccaabfd52039e4804eddf2f --- gcc/config/aarch64/aarch64-protos.h| 2 + gcc/config/aarch64/aarch64.c | 176 - gcc/config/aarch64/atomics.md | 109 - .../gcc.target/aarch64/atomic-inst-ldadd.c | 58 +++ .../gcc.target/aarch64/atomic-inst-ldlogic.c | 109 + 5 files changed, 444 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldadd.c create mode 100644 gcc/testsuite/gcc.target/aarch64/atomic-inst-ldlogic.c diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index eba4c76..76ebd6f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -378,6 +378,8 @@ rtx aarch64_load_tp (rtx); void aarch64_expand_compare_and_swap (rtx op[]); void aarch64_split_compare_and_swap (rtx op[]); void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx); + +bool aarch64_atomic_ldop_supported_p (enum rtx_code); void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index dc05c6e..33f9ef3 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11064,6 +11064,33 @@ aarch64_expand_compare_and_swap (rtx operands[]) emit_insn (gen_rtx_SET (bval, x)); } +/* Test whether the target supports using a atomic load-operate instruction. + CODE is the operation and AFTER is TRUE if the data in memory after the + operation should be returned and FALSE if the data before the operation + should be returned. Returns FALSE if the operation isn't supported by the + architecture. + */ + +bool +aarch64_atomic_ldop_supported_p (enum rtx_code code) +{ + if (!TARGET_LSE) +return false; + + switch (code) +{ +case SET: +case AND: +case IOR: +case XOR: +case MINUS: +case PLUS: + return true; +default: + return false; +} +} + /* Emit a barrier, that is appropriate for memory model MODEL, at the end of a sequence implementing an atomic operation. */ @@ -11206,26 +11233,169 @@ aarch64_emit_atomic_swap (machine_mode mode, rtx dst, rtx value, emit_insn (gen (dst, mem, value, model)); } -/* Emit an atomic operation where the architecture supports it. */ +/* Operations supported by
Re: [PATCH] Target hook for disabling the delay slot filler.
On 09/17/2015 03:52 AM, Simon Dardis wrote: The profitability of using an ordinary branch over a delay slot branch depends on how the delay slot is filled. If a delay slot can be filled from an instruction preceding the branch or instructions proceeding that must be executed on both sides then it is profitable to use a delay slot branch. Agreed. It's an over-simplification, but for the purposes of this discussion it's close enough. For cases when instructions are chosen from one side of the branch, the proposed optimization strategy is to not speculatively execute instructions when ordinary branches could be used. Performance-wise this avoids executing instructions which the eager delay filler picked wrongly. Are you trying to say that you have the option as to what kind of branch to use? ie, "ordinary", presumably without a delay slot or one with a delay slot? Is the "ordinary" actually just a nullified delay slot or some form of likely/not likely static hint? Since most branches have a compact form disabling the eager delay filler should be no worse than altering it not to fill delay slots in this case. But what is the compact form at the micro-architectural level? My mips-fu has diminished greatly, but my recollection is the bubble is always there. Is that not the case? fill_eager_delay_slots is most definitely speculative and its profitability is largely dependent on the cost of what insns it finds to fill those delay slots and whether they're from the common or uncommon path. If it is able to find insns from the commonly executed path that don't have a long latency, then the fill is usually profitable (since the pipeline bubble always exists). However, pulling a long latency instruction (say anything that might cache miss or an fdiv/fsqrt) off the slow path and conditionally nullifying it can be *awful*. Everything else is in-between. Jeff
Re: [PATCH] Fix PR64078
On Thu, Sep 17, 2015 at 08:06:48PM +0200, Bernd Edlinger wrote: > Hi, > > On Thu, 17 Sep 2015 10:39:04, Jeff Law wrote: > > > > On 09/17/2015 09:00 AM, Marek Polacek wrote: > >> On Wed, Sep 09, 2015 at 07:48:15PM +0200, Bernd Edlinger wrote: > >>> Hi, > >>> > >>> On Wed, 9 Sep 2015 09:31:33, Jeff Law wrote: > You could probably make the function static or change its visibility via > a function attribute (there's a visibility attribute which can take the > values default, hidden protected or internal). Default visibility > essentially means the function can be overridden. I think changing it > to "protected" might work. Note if we do that, we may need some kind of > target selector on the test since not all targets support the various > visibility attributes. > > >>> > >>> Yes, it works both ways: static works, and __attribute__ ((visibility > >>> ("protected"))) works too: > >>> > >>> make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c > >>> --target_board='unix{-fpic,-mcmodel=medium,-fpic\ > >>> -mcmodel=medium,-mcmodel=large,-fpic\ -mcmodel=large}'" > >>> > >>> has all tests passed, but.. > >>> > >>> make check-gcc-c++ RUNTESTFLAGS="ubsan.exp=object-size-9.c > >>> --target_board='unix{-fno-inline}'" > >>> > >>> still fails in the same way for all workarounds: inline, static, and > >>> __attribute__ ((visibility ("protected"))). > >>> > >>> Maybe "static" would be preferable? > >> > >> Yeah, I'd go with static if that helps. I'd rather avoid playing games > >> with visibility. > > Static is certainly easier and doesn't rely on targets implementing all > > the visibility capabilities. So static is probably the best approach. > > > > That's fine for me too, so is the original patch OK for trunk with > s/inline/static/ ? Yes. Marek
Re: [PATCH, rs6000] Add expansions for min/max vector reductions
On Thu, Sep 17, 2015 at 09:18:42AM -0500, Bill Schmidt wrote: > On Thu, 2015-09-17 at 09:39 +0200, Richard Biener wrote: > > So just to clarify - you need to reduce the vector with max to a scalar > > but want the (same) result in all vector elements? > > Yes. Alan Hayward's cond-reduction patch is set up to perform a > reduction to scalar, followed by a scalar broadcast to get the value > into all positions. It happens that our most efficient expansion to > reduce to scalar will naturally produce the value in all positions. It also is many insns after expand, so relying on combine to combine all that plus the following splat (as Richard suggests below) is not really going to work. If there also are targets where the _scal version is cheaper, maybe we should keep both, and have expand expand to whatever the target supports? Segher
Re: [AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
On Thu, Sep 17, 2015 at 9:54 AM, Matthew Wahabwrote: > Hello, > > ARMv8.1 adds atomic swap and atomic load-operate instructions with > optional memory ordering specifiers. This patch uses the ARMv8.1 > load-operate instructions to implement the atomic__fetch patterns. > > The approach is to use the atomic load-operate instruction to atomically > load the data and update memory and then to use the loaded data to > calculate the value that the instruction would have stored. The > calculation attempts to mirror the operation of the atomic instruction. > For example, atomic_and_fetch is implemented with an atomic > load-bic so the result is also calculated using a BIC instruction. > > The general form of the code generated for an atomic__fetch, with > destination D, source S, memory address A and memory order MO, depends > on whether or not the operation is directly supported by the > instruction. If is one of PLUS, IOR or XOR, the code generated is: > > ld S, D, [A] > D, D, S > where > is one {add, set, eor} > is one of {add, orr, xor} > is one of {'', 'a', 'l', 'al'} depending on memory order MO. > is one of {'', 'b', 'h'} depending on the data size. > > If is SUB, the code generated is: > > neg S, S > ldadd S, D, [A] > add D, D, S > > If is AND, the code generated is: > > not S, S > ldclr S, D, [A] > bic D, S, S > > Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the > existing aarch64_split_atomic_op function, to implement the operation > using sequences built with the ARMv8 load-exclusive/store-exclusive > instructions > > Tested the series for aarch64-none-linux-gnu with native bootstrap and > make check. Also tested for aarch64-none-elf with cross-compiled > check-gcc on an ARMv8.1 emulator with +lse enabled by default. Are you going to add some builtins for MIN/MAX support too? Thanks, Andrew Pinski > > Ok for trunk? > Matthew > > 2015-09-17 Matthew Wahab > > * config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop): > Adjust declaration. > * config/aarch64/aarch64.c (aarch64_emit_bic): New. > (aarch64_gen_atomic_load_op): Adjust comment. Add parameter > out_result. Update to support update-fetch operations. > * config/aarch64/atomics.md (aarch64_atomic_exchange_lse): > Adjust for change to aarch64_gen_atomic_ldop. > (aarch64_atomic__lse): Likewise. > (aarch64_atomic_fetch__lse): Likewise. > (atomic__fetch): Change to an expander. > (aarch64_atomic__fetch): New. > (aarch64_atomic__fetch_lse): New. > > gcc/testsuite > 2015-09-17 Matthew Wahab > > * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for > update-fetch operations. > * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise. >
[AArch64][PATCH 5/5] Use atomic load-operate instructions for update-fetch patterns.
Hello, ARMv8.1 adds atomic swap and atomic load-operate instructions with optional memory ordering specifiers. This patch uses the ARMv8.1 load-operate instructions to implement the atomic__fetch patterns. The approach is to use the atomic load-operate instruction to atomically load the data and update memory and then to use the loaded data to calculate the value that the instruction would have stored. The calculation attempts to mirror the operation of the atomic instruction. For example, atomic_and_fetch is implemented with an atomic load-bic so the result is also calculated using a BIC instruction. The general form of the code generated for an atomic__fetch, with destination D, source S, memory address A and memory order MO, depends on whether or not the operation is directly supported by the instruction. If is one of PLUS, IOR or XOR, the code generated is: ld S, D, [A] D, D, S where is one {add, set, eor} is one of {add, orr, xor} is one of {'', 'a', 'l', 'al'} depending on memory order MO. is one of {'', 'b', 'h'} depending on the data size. If is SUB, the code generated is: neg S, S ldadd S, D, [A] add D, D, S If is AND, the code generated is: not S, S ldclr S, D, [A] bic D, S, S Any operation not in {PLUS, IOR, XOR, SUB, AND} is passed to the existing aarch64_split_atomic_op function, to implement the operation using sequences built with the ARMv8 load-exclusive/store-exclusive instructions Tested the series for aarch64-none-linux-gnu with native bootstrap and make check. Also tested for aarch64-none-elf with cross-compiled check-gcc on an ARMv8.1 emulator with +lse enabled by default. Ok for trunk? Matthew 2015-09-17 Matthew Wahab* config/aarch64/aarch64-protos.h (aarch64_gen_atomic_ldop): Adjust declaration. * config/aarch64/aarch64.c (aarch64_emit_bic): New. (aarch64_gen_atomic_load_op): Adjust comment. Add parameter out_result. Update to support update-fetch operations. * config/aarch64/atomics.md (aarch64_atomic_exchange_lse): Adjust for change to aarch64_gen_atomic_ldop. (aarch64_atomic__lse): Likewise. (aarch64_atomic_fetch__lse): Likewise. (atomic__fetch): Change to an expander. (aarch64_atomic__fetch): New. (aarch64_atomic__fetch_lse): New. gcc/testsuite 2015-09-17 Matthew Wahab * gcc.target/aarch64/atomic-inst-ldadd.c: Add tests for update-fetch operations. * gcc.target/aarch64/atomic-inst-ldlogic.c: Likewise. >From 577bdb656e451df5ce1c8c651a642c3ac4d7c73b Mon Sep 17 00:00:00 2001 From: Matthew Wahab Date: Mon, 17 Aug 2015 11:27:18 +0100 Subject: [PATCH 5/5] Use atomic instructions for update-fetch patterns. Change-Id: I5eef48586fe904f0d2df8c581fb3c12a4a2d9c78 --- gcc/config/aarch64/aarch64-protos.h| 2 +- gcc/config/aarch64/aarch64.c | 72 +++-- gcc/config/aarch64/atomics.md | 61 ++- .../gcc.target/aarch64/atomic-inst-ldadd.c | 53 ++--- .../gcc.target/aarch64/atomic-inst-ldlogic.c | 118 ++--- 5 files changed, 247 insertions(+), 59 deletions(-) diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 76ebd6f..dd8ebcc 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -380,7 +380,7 @@ void aarch64_split_compare_and_swap (rtx op[]); void aarch64_gen_atomic_cas (rtx, rtx, rtx, rtx, rtx); bool aarch64_atomic_ldop_supported_p (enum rtx_code); -void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx); +void aarch64_gen_atomic_ldop (enum rtx_code, rtx, rtx, rtx, rtx, rtx); void aarch64_split_atomic_op (enum rtx_code, rtx, rtx, rtx, rtx, rtx, rtx); bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 33f9ef3..d95b81f 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -11212,6 +11212,25 @@ aarch64_split_compare_and_swap (rtx operands[]) aarch64_emit_post_barrier (model); } +/* Emit a BIC instruction. */ + +static void +aarch64_emit_bic (machine_mode mode, rtx dst, rtx s1, rtx s2, int shift) +{ + rtx shift_rtx = GEN_INT (shift); + rtx (*gen) (rtx, rtx, rtx, rtx); + + switch (mode) +{ +case SImode: gen = gen_bic_lshrsi3; break; +case DImode: gen = gen_bic_lshrdi3; break; +default: + gcc_unreachable (); +} + + emit_insn (gen (dst, s2, shift_rtx, s1)); +} + /* Emit an atomic swap. */ static void @@ -11306,13 +11325,14 @@ aarch64_emit_atomic_load_op (enum aarch64_atomic_load_op_code code, } /* Emit an atomic load+operate. CODE is the operation. OUT_DATA is the - location to store the data read from memory. MEM is the memory location to -
Re: [PATCH 2/5] completely_scalarize arrays as well as records.
On 15/09/15 08:43, Richard Biener wrote: > > Sorry for chiming in so late... Not at all, TYVM for your help! > TREE_CONSTANT isn't the correct thing to test. You should use > TREE_CODE () == INTEGER_CST instead. Done (in some cases, via tree_fits_shwi_p). > Also you need to handle > NULL_TREE TYPE_MIN/MAX_VALUE as that can happen as well. I've not found any documentation as to what these mean, but from experiment, it seems that a zero-length array has MIN_VALUE 0 and MAX_VALUE null (as well as zero TYPE_SIZE) - so I allow that. In contrast a variable-length array also has zero TYPE_SIZE, but a large range of MIN-MAX, and I think I want to rule those out. Another awkward case is Ada arrays with large offset (e.g. array(2^32...2^32+1) which has only two elements); I don't see either of tree_to_shwi or tree_to_uhwi as necessarily being "better" here, each will handle (some) (rare) cases the other will not, so I've tried to use tree_to_shwi throughout for consistency. Summary: taken advantage of tree_fits_shwi_p, as this includes a check against NULL_TREE and that TREE_CODE () == INTEGER_CST. > + if (DECL_P (elem) && DECL_BIT_FIELD (elem)) > + return false; > > that can't happen (TREE_TYPE (array-type) is never a DECL). Removed. > + int el_size = tree_to_uhwi (elem_size); > + gcc_assert (el_size); > > so you assert on el_size being > 0 later but above you test > only array size ... Good point, thanks. > + tree t_idx = build_int_cst (TYPE_DOMAIN (decl_type), idx); > > use t_idx = size_int (idx); Done. I've also added another test, of scalarizing a structure containing a zero-length array, as earlier attempts accidentally prevented this. Bootstrapped + check-gcc,g++,ada,fortran on ARM and x86_64; Bootstrapped + check-gcc,g++,fortran on AArch64. OK for trunk? Thanks, Alan gcc/ChangeLog: PR tree-optimization/67283 * tree-sra.c (type_consists_of_records_p): Rename to... (scalarizable_type_p): ...this, add case for ARRAY_TYPE. (completely_scalarize_record): Rename to... (completely_scalarize): ...this, add ARRAY_TYPE case, move some code to: (scalarize_elem): New. (analyze_all_variable_accesses): Follow renamings. gcc/testsuite/ChangeLog: PR tree-optimization/67283 * gcc.dg/tree-ssa/sra-15.c: New. * gcc.dg/tree-ssa/sra-16.c: New. --- gcc/testsuite/gcc.dg/tree-ssa/sra-15.c | 37 gcc/testsuite/gcc.dg/tree-ssa/sra-16.c | 37 gcc/tree-sra.c | 165 +++-- 3 files changed, 191 insertions(+), 48 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-15.c create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/sra-16.c diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c new file mode 100644 index 000..a22062e --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-15.c @@ -0,0 +1,37 @@ +/* Verify that SRA total scalarization works on records containing arrays. */ +/* { dg-do run } */ +/* { dg-options "-O1 -fdump-tree-release_ssa --param sra-max-scalarization-size-Ospeed=32" } */ + +extern void abort (void); + +struct S +{ + char c; + unsigned short f[2][2]; + int i; + unsigned short f3, f4; +}; + + +int __attribute__ ((noinline)) +foo (struct S *p) +{ + struct S l; + + l = *p; + l.i++; + l.f[1][0] += 3; + *p = l; +} + +int +main (int argc, char **argv) +{ + struct S a = {0, { {5, 7}, {9, 11} }, 4, 0, 0}; + foo (); + if (a.i != 5 || a.f[1][0] != 12) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */ diff --git a/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c b/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c new file mode 100644 index 000..fef34c0 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/sra-16.c @@ -0,0 +1,37 @@ +/* Verify that SRA total scalarization works on records containing arrays. */ +/* { dg-do run } */ +/* { dg-options "-O1 -fdump-tree-release_ssa --param sra-max-scalarization-size-Ospeed=16" } */ + +extern void abort (void); + +struct S +{ + long zilch[0]; + char c; + int i; + unsigned short f3, f4; +}; + + +int __attribute__ ((noinline)) +foo (struct S *p) +{ + struct S l; + + l = *p; + l.i++; + l.f3++; + *p = l; +} + +int +main (int argc, char **argv) +{ + struct S a = { { }, 0, 4, 0, 0}; + foo (); + if (a.i != 5 || a.f3 != 1) +abort (); + return 0; +} + +/* { dg-final { scan-tree-dump-times "l;" 0 "release_ssa" } } */ diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c index 8b3a0ad..8247554 100644 --- a/gcc/tree-sra.c +++ b/gcc/tree-sra.c @@ -915,73 +915,142 @@ create_access (tree expr, gimple stmt, bool write) } -/* Return true iff TYPE is a RECORD_TYPE with fields that are either of gimple - register types or (recursively) records with only these two kinds of fields. - It also returns false if any of these records contains a bit-field. */ +/* Return true iff TYPE is
Re: [RFC] Try vector as a new representation for vector masks
On 09/15/2015 06:52 AM, Ilya Enkovich wrote: > I made a step forward forcing vector comparisons have a mask (vec) > result and disabling bool patterns in case vector comparison is supported by > target. Several issues were met. > > - c/c++ front-ends generate vector comparison with integer vector result. I > had to make some modifications to use vec_cond instead. Don't know if there > are other front-ends producing vector comparisons. > - vector lowering fails to expand vector masks due to mismatch of type and > mode sizes. I fixed vector type size computation to match mode size and > added a special handling of mask expand. > - I disabled canonical type creation for vector mask because we can't layout > it with VOID mode. I don't know why we may need a canonical type here. But > get_mask_mode call may be moved into type layout to get it. > - Expand of vec constants/contstructors requires special handling. > Common case should require target hooks/optabs to expand vector into required > mode. But I suppose we want to have a generic code to handle vector of int > mode case to avoid modification of multiple targets which use default > vec modes. > > Currently 'make check' shows two types of regression. > - missed vector expression pattern recongnition (MIN, MAX, ABX, VEC_COND). > This must be due to my front-end changes. Hope it will be easy to fix. > - missed vectorization. All of them appear due to bool patterns disabling. > I didn't look into all of them but it seems the main problem is in mixed type > sizes. With bool patterns and integer vector masks we just put int->(other > sized int) conversion for masks and it gives us required mask transformation. > With boolean mask we don't have a proper scalar statements to do that. I > think mask widening/narrowing may be directly supported in masked statements > vectorization. Going to look into it. > > I attach what I currently have for a prototype. It grows bigger so I split > into several parts. The general approach looks good. > +/* By defaults a vector of integers is used as a mask. */ > + > +machine_mode > +default_get_mask_mode (unsigned nunits, unsigned vector_size) > +{ > + unsigned elem_size = vector_size / nunits; > + machine_mode elem_mode > += smallest_mode_for_size (elem_size * BITS_PER_UNIT, MODE_INT); Why these arguments as opposed to passing elem_size? It seems that every hook is going to have to do this division... > +#define VECTOR_MASK_TYPE_P(TYPE) \ > + (TREE_CODE (TYPE) == VECTOR_TYPE \ > + && TREE_CODE (TREE_TYPE (TYPE)) == BOOLEAN_TYPE) Perhaps better as VECTOR_BOOLEAN_TYPE_P, since that's exactly what's being tested? > @@ -3464,10 +3464,10 @@ verify_gimple_comparison (tree type, tree op0, tree > op1) >return true; > } > } > - /* Or an integer vector type with the same size and element count > + /* Or a boolean vector type with the same element count > as the comparison operand types. */ >else if (TREE_CODE (type) == VECTOR_TYPE > -&& TREE_CODE (TREE_TYPE (type)) == INTEGER_TYPE) > +&& TREE_CODE (TREE_TYPE (type)) == BOOLEAN_TYPE) VECTOR_BOOLEAN_TYPE_P. > @@ -122,7 +122,19 @@ tree_vec_extract (gimple_stmt_iterator *gsi, tree type, > tree t, tree bitsize, tree bitpos) > { >if (bitpos) > -return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); > +{ > + if (TREE_CODE (type) == BOOLEAN_TYPE) > + { > + tree itype > + = build_nonstandard_integer_type (tree_to_uhwi (bitsize), 0); > + tree field = gimplify_build3 (gsi, BIT_FIELD_REF, itype, t, > + bitsize, bitpos); > + return gimplify_build2 (gsi, NE_EXPR, type, field, > + build_zero_cst (itype)); > + } > + else > + return gimplify_build3 (gsi, BIT_FIELD_REF, type, t, bitsize, bitpos); > +} >else > return gimplify_build1 (gsi, VIEW_CONVERT_EXPR, type, t); > } So... this is us lowering vector operations on a target that doesn't support them. Which means that we get to decide what value is produced for a comparison? Which means that we can settle on the "normal" -1, correct? Which means that we ought not need to extract the entire element and then compare for non-zero, but rather simply extract a single bit from the element, and directly call that a boolean result, correct? I assume you tested all this code with -mno-sse or equivalent arch default? > @@ -1885,7 +1885,9 @@ expand_MASK_LOAD (gcall *stmt) >create_output_operand ([0], target, TYPE_MODE (type)); >create_fixed_operand ([1], mem); >create_input_operand ([2], mask, TYPE_MODE (TREE_TYPE (maskt))); > - expand_insn (optab_handler (maskload_optab, TYPE_MODE (type)), 3, ops); > + expand_insn (convert_optab_handler (maskload_optab, TYPE_MODE (type), > +
[ARM] Add ARMv8.1 command line options.
Hello, ARMv8.1 is a set of architectural extensions to ARMv8. Support has been enabled in binutils for ARMv8.1 for the architechure, using the name "armv8.1-a". This patch adds support to gcc for specifying an ARMv8.1 architecture using options "-march=armv8.1-a" and "-march=armv8.1-a+crc". It also adds the FPU options "-mfpu=neon-fp-armv8.1" and "-mpu=crypto-neon-fp-armv8.1", to specify the ARMv8.1 Adv.SIMD instruction set. The changes set the apropriate architecture and fpu options for binutils but don't otherwise change the code generated by gcc. Tested for arm-none-linux-gnueabihf with native bootstrap and make check. Ok for trunk? Matthew 2015-09-17 Matthew Wahab* config/arm/arm-arches.def: Add "armv8.1-a" and "armv8.1-a+crc". * config/arm/arm-fpus.def: Add "neon-fp-armv8.1" and "crypto-neon-fp-armv8.1". * config/arm/arm-protos.h (FL2_ARCH8_1): New. (FL2_FOR_ARCH8_1A): New. * config/arm/arm-tables.opt: Regenerate. * config/arm/arm.h (FPU_FL_RDMA): New. * doc/invoke.texi (ARM -march): Add "armv8.1-a" and "armv8.1-a+crc". (ARM -mfpu): Add "neon-fp-armv8.1" and "crypto-neon-fp-armv8.1". diff --git a/gcc/config/arm/arm-arches.def b/gcc/config/arm/arm-arches.def index ddf6c3c..4cf71fd 100644 --- a/gcc/config/arm/arm-arches.def +++ b/gcc/config/arm/arm-arches.def @@ -57,6 +57,8 @@ ARM_ARCH("armv7-m", cortexm3, 7M, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ ARM_ARCH("armv7e-m", cortexm4, 7EM, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH7EM)) ARM_ARCH("armv8-a", cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_FOR_ARCH8A)) ARM_ARCH("armv8-a+crc",cortexa53, 8A, ARM_FSET_MAKE_CPU1 (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A)) +ARM_ARCH("armv8.1-a", cortexa53, 8A, ARM_FSET_MAKE (FL_CO_PROC | FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A)) +ARM_ARCH("armv8.1-a+crc",cortexa53, 8A, ARM_FSET_MAKE (FL_CO_PROC | FL_CRC32 | FL_FOR_ARCH8A, FL2_FOR_ARCH8_1A)) ARM_ARCH("iwmmxt", iwmmxt, 5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT)) ARM_ARCH("iwmmxt2", iwmmxt2,5TE, ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_STRONG | FL_FOR_ARCH5TE | FL_XSCALE | FL_IWMMXT | FL_IWMMXT2)) diff --git a/gcc/config/arm/arm-fpus.def b/gcc/config/arm/arm-fpus.def index efd5896..065fb3d9 100644 --- a/gcc/config/arm/arm-fpus.def +++ b/gcc/config/arm/arm-fpus.def @@ -44,5 +44,9 @@ ARM_FPU("fp-armv8", ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_FP16) ARM_FPU("neon-fp-armv8",ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16) ARM_FPU("crypto-neon-fp-armv8", ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_CRYPTO) +ARM_FPU("neon-fp-armv8.1", + ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA) +ARM_FPU("crypto-neon-fp-armv8.1", + ARM_FP_MODEL_VFP, 8, VFP_REG_D32, FPU_FL_NEON | FPU_FL_FP16 | FPU_FL_RDMA | FPU_FL_CRYPTO) /* Compatibility aliases. */ ARM_FPU("vfp3", ARM_FP_MODEL_VFP, 3, VFP_REG_D32, FPU_FL_NONE) diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 8df312f..e60ad4c 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -387,6 +387,8 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_IWMMXT2(1 << 30) /* "Intel Wireless MMX2 technology". */ #define FL_ARCH6KZ(1 << 31) /* ARMv6KZ architecture. */ +#define FL2_ARCH8_1 (1 << 0) /* Architecture 8.1. */ + /* Flags that only effect tuning, not available instructions. */ #define FL_TUNE (FL_WBUF | FL_VFPV2 | FL_STRONG | FL_LDSCHED \ | FL_CO_PROC) @@ -415,6 +417,7 @@ extern bool arm_is_constant_pool_ref (rtx); #define FL_FOR_ARCH7M (FL_FOR_ARCH7 | FL_THUMB_DIV) #define FL_FOR_ARCH7EM (FL_FOR_ARCH7M | FL_ARCH7EM) #define FL_FOR_ARCH8A (FL_FOR_ARCH7VE | FL_ARCH8) +#define FL2_FOR_ARCH8_1A FL2_ARCH8_1 /* There are too many feature bits to fit in a single word so the set of cpu and fpu capabilities is a structure. A feature set is created and manipulated diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index f7a9d63..274bc46 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -336,6 +336,7 @@ typedef unsigned long arm_fpu_feature_set; #define FPU_FL_NEON (1 << 0) /* NEON instructions. */ #define FPU_FL_FP16 (1 << 1) /* Half-precision. */ #define FPU_FL_CRYPTO (1 << 2) /* Crypto extensions. */ +#define FPU_FL_RDMA (1 << 3) /* ARMv8.1 extensions. */ /* Which floating point model to use. */ enum arm_fp_model diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 99c9685..9f49189 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13267,8 +13267,8 @@ of the @option{-mcpu=} option. Permissible names are: @samp{armv2}, @samp{armv6}, @samp{armv6j}, @samp{armv6t2}, @samp{armv6z}, @samp{armv6kz}, @samp{armv6-m}, @samp{armv7}, @samp{armv7-a}, @samp{armv7-r}, @samp{armv7-m}, @samp{armv7e-m}, -@samp{armv7ve},
Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
On Thu, 2015-09-17 at 13:31 -0600, Jeff Law wrote: > On 09/16/2015 02:34 AM, Richard Biener wrote: > > > > Btw, this looks quite expensive - I'm sure we want to limit the effort > > here a bit. > A limiter is reasonable, though as it's been pointed out this only fires > during error processing, so we probably have more leeway to take time > and see if we can do better error recovery. > > FWIW, I've used this algorithm in totally unrelated projects and while > it seems expensive, it's worked out quite nicely. > > > > > So while the idea might be an improvement to selected cases it can cause > > confusion as well. And if using the suggestion for further parsing it can > > cause worse followup errors (unless we can limit such "fixup" use to the > > cases where we can parse the result without errors). Consider > > > > foo() > > { > >foz = 1; > > } > > > > if we suggest 'foo' instead of foz then we'll get a more confusing followup > > error if we actually use it. > True. This kind of problem is probably inherent in this kind of "I'm > going assume you meant..." error recovery mechanisms. > > And just to be clear, even in a successful recovery scenario, we still > issue an error. The error recovery is just meant to try and give the > user a hint what might have gone wrong and gracefully handle the case > where they just made a minor goof. (nods) > Obviously the idea here is to cut > down on the number of iterations of edit-compile cycle one has to do :-) In my mind it's more about saving the user from having to locate the field they really meant within the corresponding structure declaration (either by grep, or by some cross-referencing tool). A lot of the time I find myself wishing that the compiler had issued a note saying "here's the declaration of the struct in question", which would make it easy for me to go straight there in Emacs. I wonder what proportion of our users use a cross-referencing tool or have an IDE that can find this stuff for them, vs those that rely on grep, and if that should mean something for our diagnostics (I tend to just rely on grep). This is rather tangential to this RFE, of course. Dave
Re: [patch] libstdc++/65142 Check read() result in std::random_device.
On Thu, 17 Sep 2015, Jonathan Wakely wrote: >> Any comments on this version? > Committed to trunk. Unfortunately this broke bootstrap on FreeBSD 10.1. /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc: In member function 'std::random_device::result_type std::random_device::_M_getval()': /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:22: error: 'errno' was not declared in this scope else if (e != -1 || errno != EINTR) ^ /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:31: error: 'EINTR' was not declared in this scope else if (e != -1 || errno != EINTR) ^ Makefile:545: recipe for target 'random.lo' failed I probably won't be able to dig in deeper today, but figured this might already send you on the right path? Actually... ...how about he patch below? Bootstraps on i386-unknown-freebsd10.1, no regressions. Gerald 2015-09-17 Gerald Pfeifer* src/c++11/random.cc: Include . Index: libstdc++-v3/src/c++11/random.cc === --- libstdc++-v3/src/c++11/random.cc(revision 227883) +++ libstdc++-v3/src/c++11/random.cc(working copy) @@ -31,6 +31,7 @@ # include #endif +#include #include #ifdef _GLIBCXX_HAVE_UNISTD_H
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On 09/17/2015 10:49 AM, David Malcolm wrote: FWIW, I have a (very messy) implementation of this working for the C frontend, which gives us source ranges on expressions without needing to add any new tree nodes, or add any fields to existing tree structs. The approach I'm using: * ranges are stored directly as fields within cpp_token and c_token (maybe we can ignore cp_token for now) * ranges are stashed in the C FE, both (a) within the "struct c_expr" and (b) within the location_t of each tree expression node as a new field in the adhoc map. Doing it twice may seem slightly redundant, but I think both are needed: (a) doing it within c_expr allows us to support ranges for constants and VAR_DECL etc during parsing, without needing any kind of new tree wrapper node (b) doing it in the ad-hoc map allows the ranges for expressions to survive the parse and be usable in diagnostics later. So this gives us (in the C FE): ranges for everything during parsing, and ranges for expressions afterwards, with no new tree nodes or new fields within tree nodes. I'm working on cleaning it up into a much more minimal set of patches that I hope are reviewable. Hopefully this sounds like a good approach So is the assumption here that the location information is or is not supposed to survive through the gimple optimizers? If I understand what you're doing correctly, I think the location information gets invalidated by const/copy propagations. Though perhaps that's not a major problem because we're typically propagating an SSA_NAME, not a _DECL node. Hmm. jeff
Re: [c++-delayed-folding] code review
Hello Jason, thanks for your review. I addressed most of your mentioned issues already today. To some of them I have further comments ... 2015-09-17 8:18 GMT+02:00 Jason Merrill: >> @@ -216,6 +216,7 @@ libgcov-driver-tool.o-warn = -Wno-error >> libgcov-merge-tool.o-warn = -Wno-error >> gimple-match.o-warn = -Wno-unused >> generic-match.o-warn = -Wno-unused >> +insn-modes.o-warn = -Wno-error > > > This doesn't seem to be needed anymore. Yes, removed. >> @@ -397,11 +397,13 @@ convert_to_real (tree type, tree expr) >> EXPR must be pointer, integer, discrete (enum, char, or bool), float, >> fixed-point or vector; in other cases error is called. >> >> + If DOFOLD is TRZE, we try to simplify newly-created patterns by >> folding. > > > "TRUE" Fixed. Thanks. >> + if (!dofold) >> +{ >> + expr = build1 (CONVERT_EXPR, >> +lang_hooks.types.type_for_size >> + (TYPE_PRECISION (intype), 0), >> +expr); >> + return build1 (CONVERT_EXPR, type, expr); >> + } > > > When we're not folding, I don't think we want to do the two-step conversion, > just the second one. And we might want to use NOP_EXPR instead of > CONVERT_EXPR, but I'm not sure about that. This is indeed a point I was thinking about while writing this code. The comments below regarding other introduced patterns are related to the same thinking. Sure is that we want to remove such code-modification from FEs itself, but that means that those conversions need to be handled elsewhere later in ME. We have a different functional change about shorten_compare already, which makes me headaches, as we don't deal with all cases of it in ME right (which is a different task not directly related to delayed-folding, but linked). Therefore I kept those code-modification-code also for df, which we should move out into ME (with other patterns here in convert.c) in different task. For above code we need to do 2 step folding, as we need to make sure that we convert expr first to language-specific size-type, which isn't necessarily the same as TYPE here. Casting directly to TYPE might cause side-effects (especially sign/unsigned-conversion issues I could think about here) >> @@ -818,10 +828,15 @@ convert_to_integer (tree type, tree expr) >> if (TYPE_UNSIGNED (typex)) >> typex = signed_type_for (typex); >> } >> - return convert (type, >> - fold_build2 (ex_form, typex, >> -convert (typex, arg0), >> -convert (typex, arg1))); >> + if (dofold) >> + return convert (type, >> + fold_build2 (ex_form, typex, >> + convert (typex, arg0), >> + convert (typex, >> arg1))); >> + arg0 = build1 (CONVERT_EXPR, typex, arg0); >> + arg1 = build1 (CONVERT_EXPR, typex, arg1); >> + expr = build2 (ex_form, typex, arg0, arg1); >> + return build1 (CONVERT_EXPR, type, expr); > > > This code path seems to be for pushing a conversion down into a binary > expression. We shouldn't do this at all when we aren't folding. Yes, but see comment above. I agree that such code-modifications are not the thing we intend to do, but this should be part of a different task. >> @@ -845,9 +860,14 @@ convert_to_integer (tree type, tree expr) >> >> if (!TYPE_UNSIGNED (typex)) >> typex = unsigned_type_for (typex); >> + if (!dofold) >> + return build1 (CONVERT_EXPR, type, >> +build1 (ex_form, typex, >> +build1 (CONVERT_EXPR, typex, >> +TREE_OPERAND (expr, 0; > > > Likewise. ^^ >> @@ -867,6 +887,14 @@ convert_to_integer (tree type, tree expr) >> the conditional and never loses. A COND_EXPR may have a >> throw >> as one operand, which then has void type. Just leave void >> operands as they are. */ >> + if (!dofold) >> + return build3 (COND_EXPR, type, TREE_OPERAND (expr, 0), >> + VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, >> 1))) >> + ? TREE_OPERAND (expr, 1) >> + : build1 (CONVERT_EXPR, type, TREE_OPERAND >> (expr, 1)), >> + VOID_TYPE_P (TREE_TYPE (TREE_OPERAND (expr, >> 2))) >> + ? TREE_OPERAND (expr, 2) >> + : build1 (CONVERT_EXPR, type, TREE_OPERAND >> (expr, 2))); > > > Likewise. ^^ >> @@ -903,6 +933,10 @@ convert_to_integer (tree type, tree expr) >>
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
Jonathan Wakelywrites: > + p = "/dev/stdin"; > + if (exists(p)) > +{ > + auto p2 = canonical(p); > + if (is_symlink(p)) > +VERIFY( p != p2 ); > + else > +VERIFY( p == p2 ); > + VERIFY( canonical(p2) == p2 ); This fails if stdin is a pipe, which doesn't have a (real) name, so realpath fails. $ echo | ./canonical.exe terminate called after throwing an instance of 'std::experimental::filesystem::v1::__cxx11::filesystem_error' what(): filesystem error: cannot canonicalize: No such file or directory [/dev/stdin] Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
On 09/16/2015 02:34 AM, Richard Biener wrote: Btw, this looks quite expensive - I'm sure we want to limit the effort here a bit. A limiter is reasonable, though as it's been pointed out this only fires during error processing, so we probably have more leeway to take time and see if we can do better error recovery. FWIW, I've used this algorithm in totally unrelated projects and while it seems expensive, it's worked out quite nicely. So while the idea might be an improvement to selected cases it can cause confusion as well. And if using the suggestion for further parsing it can cause worse followup errors (unless we can limit such "fixup" use to the cases where we can parse the result without errors). Consider foo() { foz = 1; } if we suggest 'foo' instead of foz then we'll get a more confusing followup error if we actually use it. True. This kind of problem is probably inherent in this kind of "I'm going assume you meant..." error recovery mechanisms. And just to be clear, even in a successful recovery scenario, we still issue an error. The error recovery is just meant to try and give the user a hint what might have gone wrong and gracefully handle the case where they just made a minor goof. Obviously the idea here is to cut down on the number of iterations of edit-compile cycle one has to do :-) Jeff
Re: C++ PATCH to implement fold-expressions
Fantastic. I've wanted to get back to this, but since school started back up, I haven't had any time to look at it. Thanks! Andrew On Thu, Sep 17, 2015 at 2:04 PM, Jason Merrillwrote: > Back in January Andrew mostly implemented C++1z fold-expressions > (https://isocpp.org/files/papers/n4295.html), but the patch broke bootstrap. > I've fixed it up now and am committing it to the trunk. > > Andrew: The main change I made was to drop tentative parsing in favor of > scanning the tokens ahead looking for an ellipsis next to a fold-operator. > I also tweaked some diagnostics, fixed handling of dependent expressions > with no TREE_TYPE, and corrected empty fold of modops. > > Tested x86_64-pc-linux-gnu, applying to trunk.
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On Thu, 2015-09-17 at 13:14 -0600, Jeff Law wrote: > On 09/17/2015 10:49 AM, David Malcolm wrote: > > > FWIW, I have a (very messy) implementation of this working for the C > > frontend, which gives us source ranges on expressions without needing to > > add any new tree nodes, or add any fields to existing tree structs. > > > > The approach I'm using: > > > > * ranges are stored directly as fields within cpp_token and c_token > > (maybe we can ignore cp_token for now) > > > > * ranges are stashed in the C FE, both (a) within the "struct c_expr" > > and (b) within the location_t of each tree expression node as a new > > field in the adhoc map. > > > > Doing it twice may seem slightly redundant, but I think both are needed: > >(a) doing it within c_expr allows us to support ranges for constants > > and VAR_DECL etc during parsing, without needing any kind of new tree > > wrapper node > >(b) doing it in the ad-hoc map allows the ranges for expressions to > > survive the parse and be usable in diagnostics later. > > > > So this gives us (in the C FE): ranges for everything during parsing, > > and ranges for expressions afterwards, with no new tree nodes or new > > fields within tree nodes. > > > > I'm working on cleaning it up into a much more minimal set of patches > > that I hope are reviewable. > > > > Hopefully this sounds like a good approach > So is the assumption here that the location information is or is not > supposed to survive through the gimple optimizers? To be honest I hadn't given much thought to that stage; my hope is that most of the diagnostics have been issued by the time we're optimizing. In the proposed implementation the range information is "baked in" to the location_t (via the ad-hoc lookaside), so it's carried along wherever the location_t goes, and ought to have the same chances of survival within the gimple optimizers as the existing location information does. >If I understand > what you're doing correctly, I think the location information gets > invalidated by const/copy propagations. > > Though perhaps that's not a major problem because we're typically > propagating an SSA_NAME, not a _DECL node. Hmm. Well, if the location_t is being invalidated by an optimization, we're already losing source *point* information: file/line/column. Given that, losing range information as well seems like no great loss. Or am I missing something? (I am attempting to preserve the source_range data when block ptrs are baked in to the ad-hoc locations, if that's what you're referring to) Dave
Re: C++ PATCH to implement fold-expressions
On 09/17/2015 02:04 PM, Jason Merrill wrote: Back in January Andrew mostly implemented C++1z fold-expressions (https://isocpp.org/files/papers/n4295.html), but the patch broke bootstrap. I've fixed it up now and am committing it to the trunk. Andrew: The main change I made was to drop tentative parsing in favor of scanning the tokens ahead looking for an ellipsis next to a fold-operator. I also tweaked some diagnostics, fixed handling of dependent expressions with no TREE_TYPE, and corrected empty fold of modops. Tested x86_64-pc-linux-gnu, applying to trunk. Now with the patch... Jason commit 5946e0026988259a71992e3a653556b76460919c Author: Jason MerrillDate: Wed Sep 16 15:27:10 2015 -0400 Implement N4295 fold-expressions. * cp-tree.def: Add UNARY_LEFT_FOLD_EXPR, UNARY_RIGHT_FOLD_EXPR, BINARY_LEFT_FOLD_EXPR, BINARY_RIGHT_FOLD_EXPR. * cp-objcp-common.c (cp_common_init_ts): Handle them. * cp-tree.h (FOLD_EXPR_CHECK, BINARY_FOLD_EXPR_CHECK, FOLD_EXPR_P) (FOLD_EXPR_MODIFY_P, FOLD_EXPR_OP, FOLD_EXPR_PACK, FOLD_EXPR_INIT): New. * parser.c (cp_parser_skip_to_closing_parenthesis): Split out... (cp_parser_skip_to_closing_parenthesis_1): This function. Change or_comma parameter to or_ttype. (cp_parser_fold_operator, cp_parser_fold_expr_p) (cp_parser_fold_expression): New. (cp_parser_primary_expression): Use them. * pt.c (expand_empty_fold, fold_expression, tsubst_fold_expr_pack) (tsubst_fold_expr_init, expand_left_fold, tsubst_unary_left_fold) (tsubst_binary_left_fold, expand_right_fold) (tsubst_unary_right_fold, tsubst_binary_right_fold): New. (tsubst_copy): Use them. (type_dependent_expression_p): Handle fold-expressions. * semantics.c (finish_unary_fold_expr) (finish_left_unary_fold_expr, finish_right_unary_fold_expr) (finish_binary_fold_expr): New. diff --git a/gcc/cp/cp-objcp-common.c b/gcc/cp/cp-objcp-common.c index 808defd..22f063b 100644 --- a/gcc/cp/cp-objcp-common.c +++ b/gcc/cp/cp-objcp-common.c @@ -311,6 +311,10 @@ cp_common_init_ts (void) MARK_TS_TYPED (CTOR_INITIALIZER); MARK_TS_TYPED (ARRAY_NOTATION_REF); MARK_TS_TYPED (REQUIRES_EXPR); + MARK_TS_TYPED (UNARY_LEFT_FOLD_EXPR); + MARK_TS_TYPED (UNARY_RIGHT_FOLD_EXPR); + MARK_TS_TYPED (BINARY_LEFT_FOLD_EXPR); + MARK_TS_TYPED (BINARY_RIGHT_FOLD_EXPR); } #include "gt-cp-cp-objcp-common.h" diff --git a/gcc/cp/cp-tree.def b/gcc/cp/cp-tree.def index 61acf27..7df72c5 100644 --- a/gcc/cp/cp-tree.def +++ b/gcc/cp/cp-tree.def @@ -432,6 +432,26 @@ DEFTREECODE (EXPR_PACK_EXPANSION, "expr_pack_expansion", tcc_expression, 3) index is a machine integer. */ DEFTREECODE (ARGUMENT_PACK_SELECT, "argument_pack_select", tcc_exceptional, 0) +/* Fold expressions allow the expansion of a template argument pack + over a binary operator. + + FOLD_EXPR_MOD_P is true when the fold operation is a compound assignment + operator. + + FOLD_EXPR_OP is an INTEGER_CST storing the tree code for the folded + expression. Note that when FOLDEXPR_MOD_P is true, the operator is + a compound assignment operator for that kind of expression. + + FOLD_EXPR_PACK is an expression containing an unexpanded parameter pack; + when expanded, each term becomes an argument of the folded expression. + + In a BINARY_FOLD_EXPRESSION, FOLD_EXPR_INIT is the non-pack argument. */ +DEFTREECODE (UNARY_LEFT_FOLD_EXPR, "unary_left_fold_expr", tcc_expression, 2) +DEFTREECODE (UNARY_RIGHT_FOLD_EXPR, "unary_right_fold_expr", tcc_expression, 2) +DEFTREECODE (BINARY_LEFT_FOLD_EXPR, "binary_left_fold_expr", tcc_expression, 3) +DEFTREECODE (BINARY_RIGHT_FOLD_EXPR, "binary_right_fold_expr", tcc_expression, 3) + + /** C++ extensions. */ /* Represents a trait expression during template expansion. */ diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 8643e08..80d6c4e 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -80,6 +80,7 @@ c-common.h, not after. COMPOUND_REQ_NOEXCEPT_P (in COMPOUND_REQ) WILDCARD_PACK_P (in WILDCARD_DECL) BLOCK_OUTER_CURLY_BRACE_P (in BLOCK) + FOLD_EXPR_MODOP_P (*_FOLD_EXPR) 1: IDENTIFIER_VIRTUAL_P (in IDENTIFIER_NODE) TI_PENDING_TEMPLATE_FLAG. TEMPLATE_PARMS_FOR_INLINE. @@ -3249,6 +3250,37 @@ extern void decl_shadowed_for_var_insert (tree, tree); TREE_VEC_ELT (ARGUMENT_PACK_ARGS (ARGUMENT_PACK_SELECT_FROM_PACK (NODE)), \ ARGUMENT_PACK_SELECT_INDEX (NODE)) +#define FOLD_EXPR_CHECK(NODE) \ + TREE_CHECK4 (NODE, UNARY_LEFT_FOLD_EXPR, UNARY_RIGHT_FOLD_EXPR, \ + BINARY_LEFT_FOLD_EXPR, BINARY_RIGHT_FOLD_EXPR) + +#define BINARY_FOLD_EXPR_CHECK(NODE) \ + TREE_CHECK2 (NODE, BINARY_LEFT_FOLD_EXPR, BINARY_RIGHT_FOLD_EXPR) + +/* True if NODE is UNARY_FOLD_EXPR or a BINARY_FOLD_EXPR */ +#define FOLD_EXPR_P(NODE) \ + TREE_CODE (NODE) == UNARY_LEFT_FOLD_EXPR \ +|| TREE_CODE (NODE) == UNARY_RIGHT_FOLD_EXPR \ +|| TREE_CODE (NODE) ==
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On 09/15/2015 06:18 AM, Richard Biener wrote: Of course this boils down to "uses" of a VAR_DECL using the shared tree node. On GIMPLE some stmt kinds have separate locations for each operand (PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to wrap such uses to be able to give them distinct locations (can't use sth existing as frontends would need to ignore them in a different way than say NOP_EXPRs or NON_LVALUE_EXPRs). Exactly. Jeff
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On 09/15/2015 06:54 AM, Manuel López-Ibáñez wrote: On 15 September 2015 at 14:18, Richard Bienerwrote: Of course this boils down to "uses" of a VAR_DECL using the shared tree node. On GIMPLE some stmt kinds have separate locations for each operand (PHI nodes), on GENERIC we'd have to invent a no-op expr tree code to wrap such uses to be able to give them distinct locations (can't use sth existing as frontends would need to ignore them in a different way than say NOP_EXPRs or NON_LVALUE_EXPRs). The problem with that approach (besides the waste of memory implied by a whole tree node just to store one location_t) is keeping those wrappers in place while making them transparent for most of the compiler. According to Arnaud, folding made this approach infeasible: https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01222.html The other two alternatives are to store the location of the operands on the expressions themselves or to store them as on-the-side data-structure, but they come with their own drawbacks. I was initially more in favour of the wrapper solution, but after dealing with NOP_EXPRs, having to deal also with LOC_EXPR would be a nightmare (as you say, they will have to be ignored in a different way). The other alternatives seem less invasive and the problems mentioned here https://gcc.gnu.org/ml/gcc-patches/2012-11/msg00164.html do not seem as serious as I thought (passing down the location of the operand is becoming the norm anyway). I suspect any on-the-side data structure to handle this is ultimately doomed to failure. Storing the location info for the operands in the expression means that anything that modifies an operand would have to have access to the expression so that location information could be updated. Ugh. As painful as it will be, the right way is to stop using DECL nodes like this and instead be using another node that isn't shared. That allows atttaching location information. David and I kicked this around before he posted his patch and didn't come up with anything better IIRC. These wrapper nodes are definitely going to get in the way of folders and all kinds of things. So it's not something that's going to be easy to add without digging into and modifying a lot of code. I've always considered this a wart, but fixing that wart hasn't seemed worth the effort until recently. Jeff
Re: [PATCH] PR66870 PowerPC64 Enable gold linker with split stack
Here is my updated patch, with the changes suggested by Ian for gcc/gospec.c and David for gcc/configure.ac. Bootstrap built and tested on ppc64le, ppc64 multilib. 2015-09-17Lynn Bogergcc/ PR target/66870 config/rs6000/sysv4.h: Define TARGET_CAN_SPLIT_STACK_64BIT config.in: Set up HAVE_GOLD_ALTERNATE_SPLIT_STACK configure.ac: Define HAVE_GOLD_ALTERNATE_SPLIT_STACK on Power based on gold linker version configure: Regenerate gcc.c: Add -fuse-ld=gold to STACK_SPLIT_SPEC if HAVE_GOLD_ALTERNATE_SPLIT_STACK defined go/gospec.c: (lang_specific_driver): Set appropriate split stack options for 64 bit compiles based on TARGET_CAN_SPLIT_STACK_64BIT On 09/15/2015 02:58 PM, Ian Lance Taylor wrote: On Tue, Sep 15, 2015 at 11:24 AM, Lynn A. Boger wrote: I need some feedback on whether to enable the gold linker at all for split stack on platforms other than Power in gcc/configure.ac. I don't know if there are gold linker versions that should be verified for non-Power platforms. My first patch only enabled it on Power and that is what I think should be done. Those who would like to use gold with split stack for other platforms can enable it under the appropriate conditions. I think that is fine for now. If you want to enable gold for i386/x86_64 with -fsplit-stack, that is also fine. Ian Index: gcc/config/rs6000/sysv4.h === --- gcc/config/rs6000/sysv4.h (revision 227812) +++ gcc/config/rs6000/sysv4.h (working copy) @@ -940,6 +940,14 @@ ncrtn.o%s" #undef TARGET_ASAN_SHADOW_OFFSET #define TARGET_ASAN_SHADOW_OFFSET rs6000_asan_shadow_offset +/* On ppc64 and ppc64le, split stack is only support for + 64 bit. */ +#undef TARGET_CAN_SPLIT_STACK_64BIT +#if TARGET_GLIBC_MAJOR > 2 \ + || (TARGET_GLIBC_MAJOR == 2 && TARGET_GLIBC_MINOR >= 18) +#define TARGET_CAN_SPLIT_STACK_64BIT +#endif + /* This target uses the sysv4.opt file. */ #define TARGET_USES_SYSV4_OPT 1 Index: gcc/config.in === --- gcc/config.in (revision 227812) +++ gcc/config.in (working copy) @@ -1310,6 +1310,12 @@ #endif +/* Define if the gold linker with split stack is available as a non-default */ +#ifndef USED_FOR_TARGET +#undef HAVE_GOLD_NON_DEFAULT_SPLIT_STACK +#endif + + /* Define if you have the iconv() function. */ #ifndef USED_FOR_TARGET #undef HAVE_ICONV Index: gcc/configure.ac === --- gcc/configure.ac (revision 227812) +++ gcc/configure.ac (working copy) @@ -2247,6 +2247,42 @@ if test x$gcc_cv_ld != x; then fi AC_MSG_RESULT($ld_is_gold) +AC_MSG_CHECKING(gold linker with split stack support as non default) +# Check to see if default ld is not gold, but gold is +# available and has support for split stack. If gcc was configured +# with gold then no checking is done. +# +if test x$ld_is_gold = xno && which ${gcc_cv_ld}.gold >/dev/null 2>&1; then + +# For platforms other than powerpc64*, enable as appropriate. + + gold_non_default=no + ld_gold=`which ${gcc_cv_ld}.gold` +# Make sure this gold has minimal split stack support + if $ld_gold --help 2>/dev/null | grep split-stack-adjust-size >/dev/null 2>&1; then +ld_vers=`$ld_gold --version | sed 1q` +gold_vers=`echo $ld_vers | sed -n \ + -e 's,^[[^)]]*[[ ]]\([[0-9]][[0-9]]*\.[[0-9]][[0-9]]*[[^)]]*\)) .*$,\1,p'` +case $target in +# check that the gold version contains the complete split stack support +# on powerpc64 big and little endian + powerpc64*-*-*) +case "$gold_vers" in + 2.25.[[1-9]]*|2.2[[6-9]][[.0-9]]*|2.[[3-9]][[.0-9]]*|[[3-9]].[[.0-9]]*) gold_non_default=yes + ;; + *) gold_non_default=no + ;; +esac +;; +esac + fi + if test $gold_non_default = yes; then +AC_DEFINE(HAVE_GOLD_NON_DEFAULT_SPLIT_STACK, 1, + [Define if the gold linker supports split stack and is available as a non-default]) + fi +fi +AC_MSG_RESULT($gold_non_default) + ORIGINAL_LD_FOR_TARGET=$gcc_cv_ld AC_SUBST(ORIGINAL_LD_FOR_TARGET) case "$ORIGINAL_LD_FOR_TARGET" in Index: gcc/gcc.c === --- gcc/gcc.c (revision 227812) +++ gcc/gcc.c (working copy) @@ -666,7 +666,11 @@ proper position among the other output files. */ libgcc. This is not yet a real spec, though it could become one; it is currently just stuffed into LINK_SPEC. FIXME: This wrapping only works with GNU ld and gold. */ +#ifdef HAVE_GOLD_NON_DEFAULT_SPLIT_STACK +#define STACK_SPLIT_SPEC " %{fsplit-stack: -fuse-ld=gold --wrap=pthread_create}" +#else #define STACK_SPLIT_SPEC " %{fsplit-stack: --wrap=pthread_create}" +#endif #ifndef LIBASAN_SPEC #define STATIC_LIBASAN_LIBS \
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On 09/15/2015 04:33 AM, Richard Biener wrote: On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinekwrote: On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote: diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h index 760467c..c7558a0 100644 --- a/gcc/cp/parser.h +++ b/gcc/cp/parser.h @@ -61,6 +61,8 @@ struct GTY (()) cp_token { BOOL_BITFIELD purged_p : 1; /* The location at which this token was found. */ location_t location; + /* The source range at which this token was found. */ + source_range range; Is it just me or does location now feel somewhat redundant with range? Can't we compress that somehow? For a token I'd expect it is redundant, I don't see how it would be useful for a single preprocessing token to have more than start and end locations. But generally, for expressions, 3 locations make sense. If you have abc + def ^ then having a range is useful. In any case, I'm surprised that the ranges aren't encoded in location_t (the data structures behind it, where we already stick also BLOCK pointer). Probably lack of encoding space ... I suppose upping location_t to 64bits coud solve some of that (with its own drawback on increasing size of core structures). If we're going to 64 bits, then we might consider making it a pointer so that we don't have to spend so much time building up an encoding scheme. Jeff
Re: [PATCH] Convert SPARC to LRA
From: Eric BotcazouDate: Sat, 12 Sep 2015 16:04:09 +0200 > You need to update https://gcc.gnu.org/backends.html Done.
[wwwdocs] PATCH for Re: Advertisement in the GCC mirrors list, again
On Tue, 15 Sep 2015, niXman wrote: > mirrors.webhostinggeeks.com/gcc/ This is a little disappointing, though I am inclinded to consider it a genuine mistake/migration. Addressed like this. Gerald Index: mirrors.html === RCS file: /cvs/gcc/wwwdocs/htdocs/mirrors.html,v retrieving revision 1.231 diff -u -r1.231 mirrors.html --- mirrors.html9 Sep 2015 16:57:26 - 1.231 +++ mirrors.html17 Sep 2015 19:06:29 - @@ -38,7 +38,6 @@ Hungary, Budapest: http://robotlab.itk.ppke.hu/gcc/;>robotlab.itk.ppke.hu, thanks to Adam Rak (neurhlp at gmail.com) Japan: ftp://ftp.dti.ad.jp/pub/lang/gcc/;>ftp.dti.ad.jp, thanks to IWAIZAKO Takahiro (ftp-admin at dti.ad.jp) Japan: http://ftp.tsukuba.wide.ad.jp/software/gcc/;>ftp.tsukuba.wide.ad.jp, thanks to Kohei Takahashi (tsukuba-ftp-servers at tsukuba.wide.ad.jp) -Latvia, Riga: http://mirrors.webhostinggeeks.com/gcc/;>mirrors.webhostinggeeks.com/gcc/, thanks to Igor (whg.igp at gmail.com) The Netherlands, Amsterdam: http://mirror2.babylon.network/gcc/;>http://mirror2.babylon.network/gcc/ | ftp://mirror2.babylon.network/gcc/;>ftp://mirror2.babylon.network/gcc/ |
Re: debug mode symbols cleanup
On 15/09/2015 23:57, Jonathan Wakely wrote: > On 14/09/15 20:26 +0200, François Dumont wrote: >> On 08/09/2015 22:47, François Dumont wrote: >>> On 07/09/2015 13:03, Jonathan Wakely wrote: On 05/09/15 22:53 +0200, François Dumont wrote: >I remember Paolo saying once that we were not guarantiing any abi > compatibility for debug mode. I haven't found any section for > unversioned symbols in gnu.ver so I simply uncomment the global > export. There is no section, because all exported symbols are versioned. It's OK if objects compiled with Debug Mode using one version of GCC don't link to objects compiled with Debug Mode using a different version of GCC, but you can't change the exported symbols in the DSO. Your changelog doesn't include the changes to config/abi/pre/gnu.ver, but those changes are not OK anyway, they fail the abi-check: FAIL: libstdc++-abi/abi_check === libstdc++ Summary === # of unexpected failures1 >>> Sorry, I though policy regarding debug mode symbols was even more >>> relax. >>> It is not so here is another patch that doesn"t break abi checks. >>> >>> I eventually made all methods that should not be used deprecated, they >>> were normally not used explicitely anyway. Their implementation is now >>> empty. I just needed to add a symbol for the not const _M_message >>> method >>> which is the correct signature. >>> >>> François >>> >> I eventually considered doing it without impacting exported symbols. I >> just kept the const qualifier on _M_messages and introduced a const_cast >> in the implementation. >> >> Is it ok to commit with this version ? > > The changes look OK now (assuming it passes 'make check-abi') but what > does it actually do? Mostly clean code, I have been upset in the past to be forced to introduce new print methods to _Error_formatter or _Parameter type while those methods were only used through a call to _M_error. Now we can change anything to the code used to output the diagnostic without having to also impact _Error_formatter. I also disliked the mutable fields of _Error_formatter, it often indicates a design flow, which was the case here. > > Does it significantly improve the output? Not significantly. I initially start working on that because debug messages were wrapped is a bad way. The message used to be cut on any character that was not alnum, now we cut only on space. I would like to improve rendering of types, they are not respecting the max line length that can be specified by users, but that's not top priority of course. Committed. François
Re: [PATCH, PR 57195] Allow mode iterators inside angle brackets
Michael Collisonwrites: > On 09/14/2015 02:34 AM, Richard Sandiford wrote: >> Michael Collison writes: >>> Here is a modified patch that takes your comments into account. Breaking >>> on depth == 0 with '>' does not work due to the code looking for whitespace. >> What goes wrong? Just to make sure we're talking about the same thing, >> I meant that in: >> >> (match_operand:FOO> ... >> >> the name should be "FOO" and you should get an error on ">" when parsing >> the text after the name, just like you would for: >> >> (match_operand:FOO] ... > > When I try breaking on '>' with a nesting depth of 0 all examples of > fail. I meant break when depth == 0 before the decrement that's associated with '>'. I think that problem would only occur if you broke _after_ decrementing the depth. >> It's not a big deal though, so... >> >>> 2015-08-25 Michael Collison >>> >>> PR other/57195 >>> * read-md.c (read_name): Allow mode iterators inside angle >>> brackets in rtl expressions. >> OK, thanks. > Meaning okay to check the patch in? Yeah, it's OK to check in. Richard
Re: [PATCH, RFC] Implement N4230, Nested namespace definition
On 09/16/2015 07:55 AM, Ville Voutilainen wrote: This is the first stab, I haven't written the tests yet. Feedback would be most welcome; should I put this code into a separate function? Is the minor code duplication with the regular namespace definition ok? I think I'd prefer to keep it in the same function, but avoid the code duplication. Jason
Re: [PATCH WIP] Use Levenshtein distance for various misspellings in C frontend v2
On 17 September 2015 at 21:57, David Malcolmwrote: > In my mind it's more about saving the user from having to locate the > field they really meant within the corresponding structure declaration > (either by grep, or by some cross-referencing tool). I think it is more than that. After a long coding session, one can start to wonder why the compiler cannot find type_of_unknwon_predicate or firstColourInColumn (ah! it was type_of_unknown_predicate and firstColorInColumn!). Or when we extend this to options (PR67613), why I get error: unrecognized command line option '-Weffic++' when I just read it in the manual! Cheers, Manuel.
Re: [PATCH] Convert SPARC to LRA
> > You need to update https://gcc.gnu.org/backends.html > > Done. Nice work! Did you keep the 64-bit promotion in PROMOTE_MODE or...? -- Eric Botcazou
RE: [patch] libstdc++/65142 Check read() result in std::random_device.
> -Original Message- > From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > ow...@gcc.gnu.org] On Behalf Of Jonathan Wakely > Sent: Thursday, September 17, 2015 5:28 PM > To: Gerald Pfeifer > Cc: libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org > Subject: Re: [patch] libstdc++/65142 Check read() result in > std::random_device. > > On 17/09/15 22:21 +0200, Gerald Pfeifer wrote: > >On Thu, 17 Sep 2015, Jonathan Wakely wrote: > >>> Any comments on this version? > >> Committed to trunk. > > > >Unfortunately this broke bootstrap on FreeBSD 10.1. > > > >/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc: In > member function 'std::random_device::result_type > std::random_device::_M_getval()': > >/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:22: > >error: 'errno' was not declared in this scope > > else if (e != -1 || errno != EINTR) > > ^ > >/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:31: > >error: 'EINTR' was not declared in this scope > > else if (e != -1 || errno != EINTR) > > ^ > >Makefile:545: recipe for target 'random.lo' failed > > > >I probably won't be able to dig in deeper today, but figured this might > >already send you on the right path? > > > >Actually... > > > >...how about he patch below? Bootstraps on i386-unknown-freebsd10.1, > >no regressions. > > Sorry about that, I've committed your patch. > > >Gerald > > > > I'm still seeing errors for a build of the mips-sde-elf target with these patches. Errors are: /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc: In function 'void {anonymous}::print_word({anonymous }::PrintContext&, const char*, std::ptrdiff_t)': /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:573:10: error: 'stderr' was not declared in this scop e fprintf(stderr, "\n"); ^ /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:573:22: error: 'fprintf' was not declared in this sco pe fprintf(stderr, "\n"); ^ /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:596:14: error: 'stderr' was not declared in this scop e fprintf(stderr, "%s", spacing); ^ /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:596:35: error: 'fprintf' was not declared in this sco pe fprintf(stderr, "%s", spacing); ^ /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:600:24: error: 'stderr' was not declared in this scope int written = fprintf(stderr, "%s", word); ^ /scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/src/c++11/debug.cc:600:42: error: 'fprintf' was not declared in this scop e int written = fprintf(stderr, "%s", word); ^ Thanks, Catherine
Re: [patch] libstdc++/65142 Check read() result in std::random_device.
On 17/09/15 22:21 +0200, Gerald Pfeifer wrote: On Thu, 17 Sep 2015, Jonathan Wakely wrote: Any comments on this version? Committed to trunk. Unfortunately this broke bootstrap on FreeBSD 10.1. /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc: In member function 'std::random_device::result_type std::random_device::_M_getval()': /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:22: error: 'errno' was not declared in this scope else if (e != -1 || errno != EINTR) ^ /scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc:144:31: error: 'EINTR' was not declared in this scope else if (e != -1 || errno != EINTR) ^ Makefile:545: recipe for target 'random.lo' failed I probably won't be able to dig in deeper today, but figured this might already send you on the right path? Actually... ...how about he patch below? Bootstraps on i386-unknown-freebsd10.1, no regressions. Sorry about that, I've committed your patch. Gerald 2015-09-17 Gerald Pfeifer* src/c++11/random.cc: Include . Index: libstdc++-v3/src/c++11/random.cc === --- libstdc++-v3/src/c++11/random.cc(revision 227883) +++ libstdc++-v3/src/c++11/random.cc(working copy) @@ -31,6 +31,7 @@ # include #endif +#include #include #ifdef _GLIBCXX_HAVE_UNISTD_H
Re: [patch] libstdc++/67173 Fix filesystem::canonical for Solaris 10.
On 17/09/15 21:25 +0200, Andreas Schwab wrote: Jonathan Wakelywrites: + p = "/dev/stdin"; + if (exists(p)) +{ + auto p2 = canonical(p); + if (is_symlink(p)) +VERIFY( p != p2 ); + else +VERIFY( p == p2 ); + VERIFY( canonical(p2) == p2 ); This fails if stdin is a pipe, which doesn't have a (real) name, so realpath fails. $ echo | ./canonical.exe terminate called after throwing an instance of 'std::experimental::filesystem::v1::__cxx11::filesystem_error' what(): filesystem error: cannot canonicalize: No such file or directory [/dev/stdin] Ah, of course, the symlink exists but doesn't point to a real file. Thanks for the explanation. I'll re-add tests for symlinks when I come up with a proper method for testing the Filesystem code.
Re: [PATCH, RFC] Implement N4230, Nested namespace definition
On 17 September 2015 at 23:11, Jason Merrillwrote: > On 09/16/2015 07:55 AM, Ville Voutilainen wrote: >> >> This is the first stab, I haven't written the tests yet. Feedback would be >> most welcome; should I put this code into a separate function? Is the >> minor >> code duplication with the regular namespace definition ok? > > > I think I'd prefer to keep it in the same function, but avoid the code > duplication. Ok. Tested on Linux-PPC64. This patch doesn't handle attributes yet, it looks to me as if gcc doesn't support namespace attributes in the location that the standard grammar puts them into. I had to adjust a couple of point-of-declaration error/warning locations in the testsuite, but those seem reasonable to me, biased as I may be towards keeping this patch simpler than keeping those locations where they were would likely require. Nathan, this patch touches areas close to ones that your "Coding rule enforcement" patch does, please be aware of potential merge conflicts. /cp 2015-09-18 Ville Voutilainen Implement nested namespace definitions. * parser.c (cp_parser_namespace_definition): Grok nested namespace definitions. /testsuite 2015-09-18 Ville Voutilainen Implement nested namespace definitions. * g++.dg/cpp1z/nested-namespace-def.C: New. * g++.dg/lookup/name-clash5.C: Adjust. * g++.dg/lookup/name-clash6.C: Likewise. diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 4f424b6..9fee310 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -16953,6 +16953,8 @@ cp_parser_namespace_definition (cp_parser* parser) tree identifier, attribs; bool has_visibility; bool is_inline; + cp_token* token; + int nested_definition_count = 0; cp_ensure_no_omp_declare_simd (parser); if (cp_lexer_next_token_is_keyword (parser->lexer, RID_INLINE)) @@ -16965,7 +16967,7 @@ cp_parser_namespace_definition (cp_parser* parser) is_inline = false; /* Look for the `namespace' keyword. */ - cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE); + token = cp_parser_require_keyword (parser, RID_NAMESPACE, RT_NAMESPACE); /* Get the name of the namespace. We do not attempt to distinguish between an original-namespace-definition and an @@ -16979,11 +16981,32 @@ cp_parser_namespace_definition (cp_parser* parser) /* Parse any specified attributes. */ attribs = cp_parser_attributes_opt (parser); - /* Look for the `{' to start the namespace. */ - cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE); /* Start the namespace. */ push_namespace (identifier); + /* Parse any nested namespace definition. */ + if (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE)) +{ + if (is_inline) +error_at (token->location, "a nested % definition cannot be inline"); + while (cp_lexer_next_token_is (parser->lexer, CPP_SCOPE)) +{ + cp_lexer_consume_token (parser->lexer); + if (cp_lexer_next_token_is (parser->lexer, CPP_NAME)) +identifier = cp_parser_identifier (parser); + else +{ + cp_parser_error (parser, "nested identifier required"); + break; +} + ++nested_definition_count; + push_namespace (identifier); +} +} + + /* Look for the `{' to validate starting the namespace. */ + cp_parser_require (parser, CPP_OPEN_BRACE, RT_OPEN_BRACE); + /* "inline namespace" is equivalent to a stub namespace definition followed by a strong using directive. */ if (is_inline) @@ -17007,6 +17030,10 @@ cp_parser_namespace_definition (cp_parser* parser) if (has_visibility) pop_visibility (1); + /* Finish the nested namespace definitions. */ + while (nested_definition_count--) +pop_namespace (); + /* Finish the namespace. */ pop_namespace (); /* Look for the final `}'. */ diff --git a/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def.C b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def.C new file mode 100644 index 000..da35835 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/nested-namespace-def.C @@ -0,0 +1,14 @@ +// { dg-options "-std=c++1z" } + +namespace A::B::C +{ + struct X {}; + namespace T::U::V { struct Y {}; }; +}; + +A::B::C::X x; +A::B::C::T::U::V::Y y; + +inline namespace D::E {}; // { dg-error "cannot be inline" } + +namespace F::G:: {}; // { dg-error "nested identifier required" } diff --git a/gcc/testsuite/g++.dg/lookup/name-clash5.C b/gcc/testsuite/g++.dg/lookup/name-clash5.C index 74595c2..9673bb9 100644 --- a/gcc/testsuite/g++.dg/lookup/name-clash5.C +++ b/gcc/testsuite/g++.dg/lookup/name-clash5.C @@ -6,8 +6,8 @@ // "[Note: a namespace name or a class template name must be unique in its // declarative region (7.3.2, clause 14). ]" -namespace N -{ // { dg-message "previous declaration" } +namespace N // { dg-message "previous declaration" } +{ }
Fwd: vector lightweight debug mode
-- Forwarded message -- From: Christopher JeffersonDate: 17 September 2015 at 18:59 Subject: Re: vector lightweight debug mode To: Jonathan Wakely On 16 September 2015 at 21:29, Jonathan Wakely wrote: > On 16/09/15 21:37 +0200, François Dumont wrote: > @@ -1051,6 +1071,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER iterator insert(const_iterator __position, size_type __n, const value_type& __x) { +__glibcxx_assert(__position >= cbegin() && __position <= cend()); difference_type __offset = __position - cbegin(); _M_fill_insert(begin() + __offset, __n, __x); return begin() + __offset; >>> >>> >>> This is undefined behaviour, so I'd rather not add this check (I know >>> it's on the google branch, but it's still undefined behaviour). >> >> >> Why ? Because of the >= operator usage ? Is the attached patch better ? >> < and == operators are well defined for a random access iterator, no ? > > > No, because it is undefined to compare iterators that belong to > different containers, or to compare pointers that point to different > arrays. While that's true, on the other hand it's defined behaviour when the assert passes, and in the case where the thing it's trying to check fails, we are off into undefined-land anyway. A defined check would be to check if __offset is < 0 or > size(). Once again if it's false we are undefined, but the assert line itself is then defined behaviour.
Re: [RFC][PATCH] Preferred rename register in regrename pass
> I'll let Bernd comment on the patch itself. But I would say that if > you're setting up cases where we can tie the source/dest of an extension > together, then it's a good thing. It'll cause more of them to turn into > NOPs and it'll make the redundant extension elimination pass more > effective as well. Not if you do it in regrename though, as it's run very late in the pipeline. If you want to make REE more effective, this would need to be done during RA. -- Eric Botcazou
Re: [PATCH] Convert SPARC to LRA
From: Eric BotcazouDate: Thu, 17 Sep 2015 23:13:21 +0200 > Did you keep the 64-bit promotion in PROMOTE_MODE or...? Yes I had to, the compiler aborts() if I make it use SImode on 64-bit. I'll take a closer look soon.
Re: [PATCH 07/22] Implement token range tracking within libcpp and C/C++ FEs
On Wed, 2015-09-16 at 16:21 -0400, David Malcolm wrote: > On Tue, 2015-09-15 at 12:48 +0200, Jakub Jelinek wrote: > > On Tue, Sep 15, 2015 at 12:33:58PM +0200, Richard Biener wrote: > > > On Tue, Sep 15, 2015 at 12:20 PM, Jakub Jelinekwrote: > > > > On Tue, Sep 15, 2015 at 12:14:22PM +0200, Richard Biener wrote: > > > >> > diff --git a/gcc/cp/parser.h b/gcc/cp/parser.h > > > >> > index 760467c..c7558a0 100644 > > > >> > --- a/gcc/cp/parser.h > > > >> > +++ b/gcc/cp/parser.h > > > >> > @@ -61,6 +61,8 @@ struct GTY (()) cp_token { > > > >> >BOOL_BITFIELD purged_p : 1; > > > >> >/* The location at which this token was found. */ > > > >> >location_t location; > > > >> > + /* The source range at which this token was found. */ > > > >> > + source_range range; > > > >> > > > >> Is it just me or does location now feel somewhat redundant with range? > > > >> Can't we > > > >> compress that somehow? > > > > > > > > For a token I'd expect it is redundant, I don't see how it would be > > > > useful > > > > for a single preprocessing token to have more than start and end > > > > locations. > > > > But generally, for expressions, 3 locations make sense. > > > > If you have > > > > abc + def > > > > ^ > > > > then having a range is useful. In any case, I'm surprised that the > > > > ranges aren't encoded in > > > > location_t (the data structures behind it, where we already stick also > > > > BLOCK pointer). > > > > > > Probably lack of encoding space ... I suppose upping location_t to > > > 64bits coud solve > > > some of that (with its own drawback on increasing size of core > > > structures). > > > > What I had in mind was just add > > source_location start, end; > > to location_adhoc_data struct and use !IS_ADHOC_LOC locations to represent > > just plain locations without block and without range (including the cases > > where the range has both start and end equal to the locus) and IS_ADHOC_LOC > > locations for the cases where either we have non-NULL block, or we have > > some other range, or both. But I haven't spent any time on that, so just > > wondering if such an encoding has been considered. > > I've been attempting to implement that. > > Am I right in thinking that the ad-hoc locations never get purged? i.e. > that once we've registered an ad-hoc location, then is that slot within > location_adhoc_data_map is forever associated with that (locus, block) > pair? [or in the proposed model, the (locus, src_range, block) > triple?]. > > If so, it may make more sense to put the ranges into ad-hoc locations, > but only *after tokenization*: in this approach, the src_range would be > a field within the tokens (like in patch 07/22), in the hope that the > tokens are short-lived (which AIUI is the case for libcpp and C, though > not for C++), presumably also killing the "location" field within > tokens. We then stuff the range into the location_t when building trees > (maybe putting a src_range into c_expr to further delay populating > location_adhoc_data_map). > > That way we avoid bloating the location_adhoc_data_map during lexing, > whilst preserving the range information, and we can stuff the ranges > into the 32-bit location_t within tree/gimple etc (albeit paying a cost > within the location_adhoc_data_map). > > Thoughts? Hope this sounds sane. > Dave FWIW, I have a (very messy) implementation of this working for the C frontend, which gives us source ranges on expressions without needing to add any new tree nodes, or add any fields to existing tree structs. The approach I'm using: * ranges are stored directly as fields within cpp_token and c_token (maybe we can ignore cp_token for now) * ranges are stashed in the C FE, both (a) within the "struct c_expr" and (b) within the location_t of each tree expression node as a new field in the adhoc map. Doing it twice may seem slightly redundant, but I think both are needed: (a) doing it within c_expr allows us to support ranges for constants and VAR_DECL etc during parsing, without needing any kind of new tree wrapper node (b) doing it in the ad-hoc map allows the ranges for expressions to survive the parse and be usable in diagnostics later. So this gives us (in the C FE): ranges for everything during parsing, and ranges for expressions afterwards, with no new tree nodes or new fields within tree nodes. I'm working on cleaning it up into a much more minimal set of patches that I hope are reviewable. Hopefully this sounds like a good approach I've also been looking at ways to mitigate bloat of the ad-hoc map, by using some extra bits of location_t for representing short ranges directly. Dave
C++ PATCH to implement fold-expressions
Back in January Andrew mostly implemented C++1z fold-expressions (https://isocpp.org/files/papers/n4295.html), but the patch broke bootstrap. I've fixed it up now and am committing it to the trunk. Andrew: The main change I made was to drop tentative parsing in favor of scanning the tokens ahead looking for an ellipsis next to a fold-operator. I also tweaked some diagnostics, fixed handling of dependent expressions with no TREE_TYPE, and corrected empty fold of modops. Tested x86_64-pc-linux-gnu, applying to trunk.
RE: [patch] libstdc++/65142 Check read() result in std::random_device.
> -Original Message- > From: Jonathan Wakely [mailto:jwak...@redhat.com] > Sent: Thursday, September 17, 2015 6:54 PM > To: Moore, Catherine; fdum...@gcc.gnu.org > Cc: Gerald Pfeifer; libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org > Subject: Re: [patch] libstdc++/65142 Check read() result in > std::random_device. > > On 17/09/15 22:32 +, Moore, Catherine wrote: > > > > > >> -Original Message- > >> From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- > >> ow...@gcc.gnu.org] On Behalf Of Jonathan Wakely > >> Sent: Thursday, September 17, 2015 5:28 PM > >> To: Gerald Pfeifer > >> Cc: libstd...@gcc.gnu.org; gcc-patches@gcc.gnu.org > >> Subject: Re: [patch] libstdc++/65142 Check read() result in > >> std::random_device. > >> > >> On 17/09/15 22:21 +0200, Gerald Pfeifer wrote: > >> >On Thu, 17 Sep 2015, Jonathan Wakely wrote: > >> >>> Any comments on this version? > >> >> Committed to trunk. > >> > > >> >Unfortunately this broke bootstrap on FreeBSD 10.1. > >> > > >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++-v3/src/c++11/random.cc: In > >> member function 'std::random_device::result_type > >> std::random_device::_M_getval()': > >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++- > v3/src/c++11/random.cc:144:22: > >> >error: 'errno' was not declared in this scope > >> > else if (e != -1 || errno != EINTR) > >> > ^ > >> >/scratch/tmp/gerald/gcc-HEAD/libstdc++- > v3/src/c++11/random.cc:144:31: > >> >error: 'EINTR' was not declared in this scope > >> > else if (e != -1 || errno != EINTR) > >> > ^ > >> >Makefile:545: recipe for target 'random.lo' failed > >> > > >> >I probably won't be able to dig in deeper today, but figured this > >> >might already send you on the right path? > >> > > >> >Actually... > >> > > >> >...how about he patch below? Bootstraps on > >> >i386-unknown-freebsd10.1, no regressions. > >> > >> Sorry about that, I've committed your patch. > >> > >> >Gerald > >> > > >> > > >I'm still seeing errors for a build of the mips-sde-elf target with these > patches. > > > >Errors are: > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc: In function 'void > {anonymous}::print_word({anonymous > >}::PrintContext&, const char*, std::ptrdiff_t)': > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:573:10: error: 'stderr' was not declared in this scop > >e > > fprintf(stderr, "\n"); > > ^ > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:573:22: error: 'fprintf' was not declared in this sco > >pe > > fprintf(stderr, "\n"); > > ^ > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:596:14: error: 'stderr' was not declared in this scop e > > fprintf(stderr, "%s", spacing); > > ^ > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:596:35: error: 'fprintf' was not declared in this sco pe > > fprintf(stderr, "%s", spacing); > > ^ > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:600:24: error: 'stderr' was not declared in this > >scope > > int written = fprintf(stderr, "%s", word); > >^ > >/scratch/cmoore/virgin-upstream-elf-lite/src/gcc-trunk-5/libstdc++-v3/s > >rc/c++11/debug.cc:600:42: error: 'fprintf' was not declared in this > >scop e > > int written = fprintf(stderr, "%s", word); > > That's a different problem, due to https://gcc.gnu.org/r227885 > > François, could you take a look please? > I've now committed this patch to solve this problem (pre-approved by Jonathan). 2015-09-17 Catherine Moore* src/c++11/debug.cc: Include . Index: src/c++11/debug.cc === --- src/c++11/debug.cc (revision 227887) +++ src/c++11/debug.cc (working copy) @@ -32,6 +32,7 @@ #include #include +#include #include // for std::min #include // for _Hash_impl