Re: [PATCH] add option to emit more array bounds warnigs
Jeff Law : > On 01/13/15 17:40, Martin Uecker wrote: > > Jeff Law : > >> On 01/13/15 10:34, Martin Uecker wrote: > >>> Mon, 12 Jan 2015 11:00:44 -0700 > >>> Jeff Law : > On 11/11/14 23:13, Martin Uecker wrote: > > > > ... > > > Has this patch been bootstrapped and regression tested, if so on what > platform. > >>> > >>> x86_64-unknown-linux-gnu > >> Approved. Please install on the trunk. Sorry about the delays. > > > > I don't have write access ;-( > I fixed up the ChangeLog entries and installed the patch for you. Thank you, Jeff! > If you plan to contribute regularly, you should go ahead and apply for > write access to the repository so that you'll be able to commit your own > patches once they're approved. I put a request in with you as sponsor (hope this is ok). > You'll also need to make sure you have an assignment on file with the > FSF.That patch was pretty small (the testcase was larger than the > patch itself, which I always like :-) so I didn't request an assignment. > Further submissions likely will require an assignment. I already have an assignment on file. Martin
Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1
On 01/13/15 11:55, Eric Botcazou wrote: (1) we have a non-paradoxical subreg; (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full hard registers (no padding or unused upper bits); (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number of bytes (X) in each constituent hard register; (4) the offset is a multiple of X, i.e. the data we're accessing is aligned to a register boundary; and (5) endianness is regular (no differences between words and bytes, or between registers and memory) OK, that's a nice translation of the new code. :-) It seems to me that the patch wants to extend the support of generic subregs to modes whose sizes are not multiple of each other, which is a requirement of the existing code, but does that in a very specific case for the sake of the ARM port without saying where all the above restrictions come from. Basically we're lifting the restriction that the the sizes are multiples of each other. The requirements above are the set where we know it will work. They are target independent, but happen to match what the ARM needs. The certainly do short circuit the meat of the function, that's the whole point, there's this set of conditions under which we know this will work and when they hold, we bypass. Now one could argue that instead of bypassing we should put the code to handle this situation further down. I'd be leery of doing that just from a complexity standpoint. But one could also argue that short circuiting like the patch does adds complexity as well and may be a bit kludgy. Maybe the way forward here is for someone to try and integrate this support in the main part of the code and see how it looks. Then we can pick one. The downside is since this probably isn't a regression that work would need to happen quickly to make it into gcc-5. Which leads to another option, get the release managers to sign off on the kludge after gcc-5 branches and only install the kludge on the gcc-5 branch and insisting the other solution go in for gcc-6 and beyond. Not sure if they'd do that, but it's a discussion that could happen. jeff
Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
"Maciej W. Rozycki" writes: > On Tue, 13 Jan 2015, Matthew Fortune wrote: > >> > >> I have tested this for both mips and micromips, and the tests now >> > >> pass successfully. >> > >> The ChangeLog and patch are below. >> > > >> > > Hmm, instead of trying to avoid testing microMIPS code generation >> > > just to satisfy the test suite I'd rather see the test cases updated >> > > so that LWM/SWM register ranges are expected and accepted whenever >> > > microMIPS code is produced. These scan patterns can be made >> > conditional. >> > >> > FWIW I think Andrew's patch is correct. If we want to test microMIPS >> > output against micromips-specific regexps, we should add a separate test >> > that forces micromips, so that it gets tested regardless of people's >> > RUNTESTFLAGS. Doing that shouldn't hold up Andrew's patch though. > > Taking care that the default compilation mode does not conflict (e.g. > MIPS16, incompatible) and taking any exceptions into account (e.g. n64, > unsupported) I presume, right? mips.exp sorts that out for you. Adding "-mmicromips" or "(-micromips)" to dg-options forces (or at least is supposed to force) the overall flags to be compatible with microMIPS. The aim of mips.exp is avoid skipping tests whereever possible. If someone runs the testsuite with -mips16 and we have a -micromips test, it's better to remove -mips16 for that test than to skip the test entirely. >> I was going to suggest a follow up patch to add copies of the three tests >> as Richard suggests. I haven't yet done a micromips run of the testsuite >> to check for any other issues like this but I suspect problems are limited >> to the tests that I recently added. > > Please always try to test changes reasonably, i.e. at least o32, > o32/MIPS16, o32/microMIPS, n32, n64, and then Linux and ELF if applicable, > plus any options that may be relevant, unless it is absolutely clear > ABI/ISA variations do not matter for a change proposed. TBH this seems a bit much. On the one hand it's more testing than you'd get for almost any other target, but on the other it leaves out important differences like MIPS I vs MIPS II vs MIPS 32, MIPS III vs MIPS IV vs MIPS64, r1 vs. r2 vs. r6, Octeon vs. Loongson vs. vanilla, DSP vs. no DSP, etc. I think we just have to accept that there are so many possible combinations that we can't test everything that's potentially relevant. I think it's more useful to be flexible than prescribe a particular list. Having everyone test the same multilib combinations on the same target isn't necessarily a good thing anyway. Diversity in testing (between developers) is useful too. Thanks, Richard
Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1
On 01/10/15 06:05, Richard Sandiford wrote: Sorry for the slow response. Jeff has approved the patch in the meantime, but I didn't want to go ahead and apply it while there was still disagreement... Thanks. I didn't realize there was a disagreement when I approved. Let's continue to hash this out a bit in the hopes that we can all get to a place where we're comfortable with the final change, whatever it happens to me. jeff
Re: [PATCH] Correct target selector in -mfentry tests
On 01/13/15 14:27, H.J. Lu wrote: -fprofile -mfentry works with PIE if gcrt1.o is compiled with -fPIC. A glibc has been filed, PR 17836, and a glibc patch has been submitted. OK for trunk? Thanks. H.J. -- * gcc.target/i386/fentry-override.c: Properly place {} in target selector. Remove nonpic. * gcc.target/i386/fentry.c: Likewise. Does this change the pass/fail result of the test on a system without an updated glibc? jeff
Re: [PATCH] PR59448 - Promote consume to acquire
On 01/13/15 15:56, Andrew MacLeod wrote: On 01/13/2015 02:06 PM, Andrew MacLeod wrote: On 01/13/2015 01:38 PM, Torvald Riegel wrote: On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: On 01/13/2015 09:59 AM, Richard Biener wrote: On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod wrote: Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 Basically we can generate incorrect code for an atomic consume operation in some circumstances. The general feeling seems to be that we should simply promote all consume operations to an acquire operation until there is a better definition/understanding of the consume model and how GCC can track it. I proposed a simple patch in the PR, and I have not seen or heard of any dissenting opinion. We should get this in before the end of stage 3 I think. The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. This new patch simply makes the adjustment after any errors are checked on the originally specified model. It bootstraps on x86_64-unknown-linux-gnu and passes all regression testing. I also built an aarch64 compiler and it appears to issue the LDAR as specified in the PR, but anyone with a vested interest really ought to check it out with a real build to be sure. OK for trunk? Why not patch get_memmodel? (not sure if that catches all cases) Richard. That was the original patch. The issue is that it promotes consume to acquire before any error checking gets to look at the model, so then we allow illegal specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that. I imagine it was probably in a previous incarnation of the standard... Most of this was actually implemented based on very early draft standards years and years ago and never revised. It wasnt put in by me unless the standard at some point said had such wording. The current standard appears to make no mention of the situation. It seems that it should be safe to move back to the original patch, and remove that error test for using consume on an exchange... Andrew Here's the original patch along with the lien removed from the testcase. x86_64-unknown-linux-gnu bootstraps, no regressions, and so forth. OK for trunk? -ENOPATCH However, I can get it from the BZ and it's OK assuming you also fixup the one testcase we've discussed on this thread. Jeff
Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics
> On 09/12/14 08:17, Yangfei (Felix) wrote: > >> On 28 November 2014 at 09:23, Yangfei (Felix) > wrote: > >>> Hi, > >>>This patch converts vpmaxX & vpminX intrinsics to use builtin > >>> functions > >> instead of the previous inline assembly syntax. > >>>Regtested with aarch64-linux-gnu on QEMU. Also passed the > >>> glorious > >> testsuite of Christophe Lyon. > >>>OK for the trunk? > >> > >> Hi Felix, We know from experience that the advsimd intrinsics tend > >> to be fragile for big endian and in general it is fairly easy to > >> break the big endian case. For these advsimd improvements that you > >> are working on (that we very much appreciate) it is important to run > >> both little endian and big endian regressions. > >> > >> Thanks > >> /Marcus > > > > > > Okay. Any plan for the advsimd big-endian improvement? > > I rebased this patch over Alan Lawrance's patch: > > https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html > > No regressions for aarch64_be-linux-gnu target too. OK for the thunk? > > > > > > Index: gcc/ChangeLog > > > = > == > > --- gcc/ChangeLog (revision 218464) > > +++ gcc/ChangeLog (working copy) > > @@ -1,3 +1,18 @@ > > +2014-12-09 Felix Yang > > + > > + * config/aarch64/aarch64-simd.md > (aarch64_p): New > > + pattern. > > + * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp, > > + uminp, smax_nanp, smin_nanp): New builtins. > > + * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32, > > + vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, > vpmaxq_s32, > > + vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, > vpmaxq_f64, > > + vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, > vpmaxnmq_f64, > > + vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, > vpmin_u8, > > + vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, > vpminq_u8, > > + vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, > vpminqd_f64, > > + vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, > > + vpminnmqd_f64, > > + > > > > __extension__ static __inline float32x2_t __attribute__ > > ((__always_inline__)) > > Index: gcc/config/aarch64/aarch64-simd.md > > > = > == > > --- gcc/config/aarch64/aarch64-simd.md (revision 218464) > > +++ gcc/config/aarch64/aarch64-simd.md (working copy) > > @@ -1017,6 +1017,28 @@ > > DONE; > > }) > > > > +;; Pairwise Integer Max/Min operations. > > +(define_insn "aarch64_p" > > + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") > > + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 > "register_operand" "w") > > +(match_operand:VDQ_BHSI 2 "register_operand" > "w")] > > + MAXMINV))] > > + "TARGET_SIMD" > > + "p\t%0., %1., %2." > > + [(set_attr "type" "neon_minmax")] > > +) > > + > > Hi Felix, > > Sorry for the delay in getting back to you on this. > > If you've rolled aarch64_reduc__internalv2si into the above > pattern, do you still need it? For all its call points, just point them to > aarch64_p? > > Thanks, > Tejas. > Hello Tejas, I didn't do this yet. Currently the aarch64_reduc__internalv2si is only called by reduc__scal_. I find it kind of trouble to handle this due to the use of iterators in the caller pattern. Are you going to rework this part?
Re: [PATCH] add option to emit more array bounds warnigs
On 01/13/15 17:40, Martin Uecker wrote: Jeff Law : On 01/13/15 10:34, Martin Uecker wrote: Mon, 12 Jan 2015 11:00:44 -0700 Jeff Law : On 11/11/14 23:13, Martin Uecker wrote: ... Has this patch been bootstrapped and regression tested, if so on what platform. x86_64-unknown-linux-gnu Approved. Please install on the trunk. Sorry about the delays. I don't have write access ;-( I fixed up the ChangeLog entries and installed the patch for you. If you plan to contribute regularly, you should go ahead and apply for write access to the repository so that you'll be able to commit your own patches once they're approved. You'll also need to make sure you have an assignment on file with the FSF.That patch was pretty small (the testcase was larger than the patch itself, which I always like :-) so I didn't request an assignment. Further submissions likely will require an assignment. Thanks, jeff
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On 01/13/15 09:18, Jakub Jelinek wrote: Hi! My PR60663 fix unfortunately stopped CSE of all inline-asms, even when they e.g. only have the clobbers added by default. This patch attempts to restore the old behavior, with the exceptions: 1) as always, asm volatile is not CSEd 2) inline-asm with multiple outputs are not CSEd 3) on request from Richard (which Segher on IRC argues against), "memory" clobber also prevents CSE; this can be removed by removing the int j, lim = XVECLEN (x, 0); and loop below it 4) inline-asm with clobbers is never copied into an insn that wasn't inline-asm before, so if there are clobbers, we allow CSEing of e.g. two same inline-asms, but only by reusing results of one of those Bootstrapped/regtested on x86_64-linux and i686-linux, tested also with arm cross after reverting the PR60663 arm cost fix. Ok for trunk this way, or with 3) removed? 2015-01-13 Jakub Jelinek PR rtl-optimization/63637 PR rtl-optimization/60663 * cse.c (merge_equiv_classes): Set new_elt->cost to MAX_COST if elt->cost is MAX_COST for ASM_OPERANDS. (find_sets_in_insn): Fix up comment typo. (cse_insn): Don't set src_volatile for all non-volatile ASM_OPERANDS in PARALLELs, but just those with multiple outputs or with "memory" clobber. Set elt->cost to MAX_COST for ASM_OPERANDS in PARALLEL. Set src_elt->cost to MAX_COST if new_src is ASM_OPERANDS and elt->cost is MAX_COST. * gcc.dg/pr63637-1.c: New test. * gcc.dg/pr63637-2.c: New test. * gcc.dg/pr63637-3.c: New test. * gcc.dg/pr63637-4.c: New test. * gcc.dg/pr63637-5.c: New test. * gcc.dg/pr63637-6.c: New test. * gcc.target/i386/pr63637-1.c: New test. * gcc.target/i386/pr63637-2.c: New test. * gcc.target/i386/pr63637-3.c: New test. * gcc.target/i386/pr63637-4.c: New test. * gcc.target/i386/pr63637-5.c: New test. * gcc.target/i386/pr63637-6.c: New test. --- gcc/cse.c.jj2015-01-09 21:59:44.0 +0100 +++ gcc/cse.c 2015-01-13 13:26:23.391216064 +0100 @@ -1792,6 +1792,8 @@ merge_equiv_classes (struct table_elt *c } new_elt = insert (exp, class1, hash, mode); new_elt->in_memory = hash_arg_in_memory; + if (GET_CODE (exp) == ASM_OPERANDS && elt->cost == MAX_COST) + new_elt->cost = MAX_COST; } } } @@ -4258,7 +4260,7 @@ find_sets_in_insn (rtx_insn *insn, struc { int i, lim = XVECLEN (x, 0); - /* Go over the epressions of the PARALLEL in forward order, to + /* Go over the expressions of the PARALLEL in forward order, to put them in the same order in the SETS array. */ for (i = 0; i < lim; i++) { @@ -4634,12 +4636,27 @@ cse_insn (rtx_insn *insn) && REGNO (dest) >= FIRST_PSEUDO_REGISTER) sets[i].src_volatile = 1; - /* Also do not record result of a non-volatile inline asm with -more than one result or with clobbers, we do not want CSE to -break the inline asm apart. */ else if (GET_CODE (src) == ASM_OPERANDS && GET_CODE (x) == PARALLEL) - sets[i].src_volatile = 1; + { + /* Do not record result of a non-volatile inline asm with +more than one result. */ + if (n_sets > 1) + sets[i].src_volatile = 1; + + int j, lim = XVECLEN (x, 0); + for (j = 0; j < lim; j++) + { + rtx y = XVECEXP (x, 0, j); + /* And do not record result of a non-volatile inline asm +with "memory" clobber. */ + if (GET_CODE (y) == CLOBBER && MEM_P (XEXP (y, 0))) Can you please add a comment here which references the full form of the "memory" tag. (clobber (mem:BLK (scratch))). If we ever have to look at this again (say perhaps to break out the read anything vs write anything into separate tags :-) it'll save considerable time and angst trying to track all this stuff down. The tests you've got are a step forward, but there's obviously a lot more we could do. For example testing DSE around ASMs without and without a memory "clobber", testing CSE of unrelated memory references around an ASM without and without a memory clobber come to mind. You don't have to add them to get approval, but if you were to take the time to cobble them together it'd be hugely appreciated. Given the discussion with Segher, let's give him a chance to chime in on tonight's messages before we make a final decision. jeff
[PATCH, nds32] Committed: Remove some features that are not available yet in nds32 port of GNU binutils package.
Hi, all, The nds32 target supports two features, fp-as-gp and ex9, designed for code size optimizations. They are majorly performed by linker so that compiler is merely to give some hints or directives with -mforce-fp-as-gp, -mforbid-fp-as-gp, and -mex9 options. However, those two features are not available yet in the current nds32 port of GNU binutils package. For consistency concern, I think it would be better to remove them from gcc trunk as well for now. Committed as Rev. 219576: https://gcc.gnu.org/r219576 gcc/ChangeLog 2015-01-14 Chung-Ju Wu * config/nds32/nds32.opt (mforce-fp-as-gp): Remove. (mforbid-fp-as-gp): Remove. (mex9): Remove. * config/nds32/nds32-fp-as-gp.c (nds32_have_prologue_p): Remove. (nds32_symbol_load_store_p): Remove. (nds32_fp_as_gp_check_available): Clean up implementation. * config/nds32/nds32.h (LINK_SPEC): Remove -mforce-as-gp and -mex9 cases. * config/nds32/nds32.c (nds32_asm_file_start): No need to consider fp-as-gp and ex9 cases. Best regards, jasonwucj 0010-Remove-some-features-that-are-not-available-yet-in-n.patch Description: Binary data
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On 01/13/15 17:03, Segher Boessenkool wrote: On Tue, Jan 13, 2015 at 03:17:08PM -0700, Jeff Law wrote: And finally there is the case of non-volatile asm with "memory" clobber with no memory stores in between the two - the posted (safer) patch will not allow to CSE the two, while in theory we could CSE them into just one asm. I think we have to assume that CSEing them is wrong. The first may set something in memory that is read by the second. Thoughts? I agree with pretty much everything you say in the thread, except for this idea that a memory clobber reads memory. No clobber reads anything. The commit that introduced the memory clobber concept, 426b38c9 (svn 1207), by rms, has as only comment /* `memory', don't cache memory across asm */ RMS botched this and you can see it in that the scheduler was not updated at the same time. The scheduler absolutely must track if an ASM does a memory read of an arbitrary location. I'd have to dig deeper to see when this got fixed, but it was clearly botched. Many years later another pass which needs to precisely track such things came along, namely DSE. The code in DSE is actually easier to grok. First, if you look at the ASM handling in cfgexpand.c you'll find: if (j == -4) /* `memory', don't cache memory across asm */ { XVECEXP (body, 0, i++) = gen_rtx_CLOBBER (VOIDmode, gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode))); continue; } So we generate (CLOBBER (MEM:BLK (SCRATCH))) when we see "memory" in the "clobber" list of an ASM. If you then look at dse.c we have this in record_store: /* At this point we know mem is a mem. */ if (GET_MODE (mem) == BLKmode) { if (GET_CODE (XEXP (mem, 0)) == SCRATCH) { if (dump_file && (dump_flags & TDF_DETAILS)) fprintf (dump_file, " adding wild read for (clobber (mem:BLK (scratch))\n"); add_wild_read (bb_info); insn_info->cannot_delete = true; return 0; } Which says very precisely that we treat (CLOBBER (MEM:BLK (SCRATCH))) as potentially *reading* any location. If you trace through how the scheduler builds dependencies, paying particular attention to alias.c you'll see that (CLOBBER (MEM:BLK (SCRATCH))) is treated as both a read and a write of an arbitrary location. It's unfortunate that RMS put the "memory" tag in the "clobber" list. But he really wasn't a compiler junkie and didn't realize the right thing to do was to have a memory tag in both the inputs and [output|clobber] section to represent a read of an arbitrary location and a write to an arbitrary location independently. But it is what it is at this point and we have to treat "memory" appearing in the "clobber" list as an arbitrary memory read and an arbitrary memory write. Jeff
Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian
On 01/13/15 15:42, Joseph Myers wrote: On Tue, 13 Jan 2015, Jeff Law wrote: In many ways having the compiler or assembler spitting out an error here is preferable to silently compiling the code. That would also help explain why As usual, an error is incorrect in such a case that only has undefined behavior at runtime (but it may be compiled into an abort if the behavior is unconditionally undefined, and the abort doesn't replace anything before the undefined behavior that might have stopped the undefined behavior from occurring). You are, of course, correct. We can't error here, but we can generate a conditional warning. jeff
Patch ping...
Hi, I would like to ping the patch to fix divergence between a type and its main variant introduced by C++ FE. https://gcc.gnu.org/ml/gcc-patches/2014-12/msg01202.html Honza
Drop workaround for old binutils linker plugin bug
Hi, this workaround actually triggers bug in quite recent golds, so it seems to be good motivation to finally drop it. The bug is long fixed. Bootstrapped/regtested x86_64-linux, will commit it shortly. Honza * tree-profile.c (init_ic_make_global_vars): Drop workaround for bintuils bug 14342. (init_ic_make_global_vars): Likewise. (gimple_init_edge_profiler): Likewise. (gimple_gen_ic_func_profiler): Likewise. Index: tree-profile.c === --- tree-profile.c (revision 219571) +++ tree-profile.c (working copy) @@ -105,30 +105,15 @@ init_ic_make_global_vars (void) ptr_void = build_pointer_type (void_type_node); - /* Workaround for binutils bug 14342. Once it is fixed, remove lto path. */ - if (flag_lto) -{ - ic_void_ptr_var - = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier ("__gcov_indirect_call_callee_ltopriv"), - ptr_void); - TREE_PUBLIC (ic_void_ptr_var) = 1; - DECL_COMMON (ic_void_ptr_var) = 1; - DECL_VISIBILITY (ic_void_ptr_var) = VISIBILITY_HIDDEN; - DECL_VISIBILITY_SPECIFIED (ic_void_ptr_var) = true; -} - else -{ - ic_void_ptr_var - = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier ( - (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? - "__gcov_indirect_call_topn_callee" : - "__gcov_indirect_call_callee")), - ptr_void); - TREE_PUBLIC (ic_void_ptr_var) = 1; - DECL_EXTERNAL (ic_void_ptr_var) = 1; -} + ic_void_ptr_var += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier ( + (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? + "__gcov_indirect_call_topn_callee" : + "__gcov_indirect_call_callee")), + ptr_void); + TREE_PUBLIC (ic_void_ptr_var) = 1; + DECL_EXTERNAL (ic_void_ptr_var) = 1; TREE_STATIC (ic_void_ptr_var) = 1; DECL_ARTIFICIAL (ic_void_ptr_var) = 1; DECL_INITIAL (ic_void_ptr_var) = NULL; @@ -138,30 +123,16 @@ init_ic_make_global_vars (void) varpool_node::finalize_decl (ic_void_ptr_var); gcov_type_ptr = build_pointer_type (get_gcov_type ()); - /* Workaround for binutils bug 14342. Once it is fixed, remove lto path. */ - if (flag_lto) -{ - ic_gcov_type_ptr_var - = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier ("__gcov_indirect_call_counters_ltopriv"), - gcov_type_ptr); - TREE_PUBLIC (ic_gcov_type_ptr_var) = 1; - DECL_COMMON (ic_gcov_type_ptr_var) = 1; - DECL_VISIBILITY (ic_gcov_type_ptr_var) = VISIBILITY_HIDDEN; - DECL_VISIBILITY_SPECIFIED (ic_gcov_type_ptr_var) = true; -} - else -{ - ic_gcov_type_ptr_var - = build_decl (UNKNOWN_LOCATION, VAR_DECL, - get_identifier ( - (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? - "__gcov_indirect_call_topn_counters" : - "__gcov_indirect_call_counters")), - gcov_type_ptr); - TREE_PUBLIC (ic_gcov_type_ptr_var) = 1; - DECL_EXTERNAL (ic_gcov_type_ptr_var) = 1; -} + + ic_gcov_type_ptr_var += build_decl (UNKNOWN_LOCATION, VAR_DECL, + get_identifier ( + (PARAM_VALUE (PARAM_INDIR_CALL_TOPN_PROFILE) ? + "__gcov_indirect_call_topn_counters" : + "__gcov_indirect_call_counters")), + gcov_type_ptr); + TREE_PUBLIC (ic_gcov_type_ptr_var) = 1; + DECL_EXTERNAL (ic_gcov_type_ptr_var) = 1; TREE_STATIC (ic_gcov_type_ptr_var) = 1; DECL_ARTIFICIAL (ic_gcov_type_ptr_var) = 1; DECL_INITIAL (ic_gcov_type_ptr_var) = NULL; @@ -230,33 +201,18 @@ gimple_init_edge_profiler (void) init_ic_make_global_vars (); - /* Workaround for binutils bug 14342. Once it is fixed, remove lto path. */ - if (flag_lto) -{ - /* void (*) (gcov_type, void *) */ - ic_profiler_fn_type - = build_function_type_list (void_type_node, - gcov_type_ptr, gcov_type_node, - ptr_void, ptr_void, - NULL_TREE); - tree_indirect_call_profiler_fn - = build_fn_decl ("__gcov_indirect_call_profiler", -ic_profiler_fn_type); -} - else -{ - /* void (*) (gcov_type, void *) */ - ic_profiler_fn_type - = build_function_type_list (void_type_node, - gcov_type_node, - ptr_void, -
[PATCH, AArch64] Fix abitest for ilp32
Hi, Please find attached the patch that fixes abitest for ilp32. "testfunc_ptr" is a 32bit pointer in ILP32 but is being loaded as 64bit. Hence some of the func-ret testcases FAIL's for ILP32. Please review the patch and let us know if its okay? Regression tested on aarch64-elf. Thanks, Naveen gcc/testsuite 2015-01-15 Andrew Pinski Naveen H.S * gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Load testfunc_ptr as 32bit for ILP32 and 64bit for LP64.--- gcc/testsuite/ChangeLog 2015-01-14 10:00:59.524914610 +0530 +++ gcc/testsuite/ChangeLog 2015-01-14 10:21:20.928932740 +0530 @@ -1,3 +1,9 @@ +2015-01-15 Andrew Pinski + Naveen H.S + + * gcc.target/aarch64/aapcs64/abitest.S (LABEL_TEST_FUNC_RETURN): Load + testfunc_ptr as 32bit for ILP32 and 64bit for LP64. + 2015-01-13 David Malcolm * jit.dg/harness.h (set_up_logging): Move string concatenation --- gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S 2015-01-14 09:37:46.368893934 +0530 +++ gcc/testsuite/gcc.target/aarch64/aapcs64/abitest.S 2015-01-14 10:13:08.456925431 +0530 @@ -2,6 +2,13 @@ .global myfunc .type dumpregs,%function .type myfunc,%function + +#ifdef __LP64__ +#define PTR_REG(n) x##n +#else +#define PTR_REG(n) w##n +#endif + dumpregs: myfunc: mov x16, sp @@ -48,7 +55,7 @@ myfunc: LABEL_TEST_FUNC_RETURN: adrp x9, testfunc_ptr add x9, x9, :lo12:testfunc_ptr - ldr x9, [x9, #0] + ldr PTR_REG(9), [x9, #0] blr x9// function return value test adrp x9, saved_return_address add x9, x9, :lo12:saved_return_address
[PATCH] Fix PR c++/16160
This patch fixes the above PR where it was reported that the C++ frontend does not reject the malformed class declaration struct X<5>; Instead of rejecting it, the FE treats this declaration as if it were a forward declaration of a template specialization, i.e. as if it were written template<> struct X<5>; First off, the FE should reject the declaration because it is malformed (not 100% sure, though). Second, since the user probably intended to have written an explicit template instantiation (as in the PR), the FE should suggest adding "template" before such a declaration, that is the declaration struct X<5>; // error + suggest adding "template" This patch does both these things along with adding error messages + suggestions for struct X<5> { }; // error + suggest adding "template <>" and template struct X<5> { }; // error + suggest replacing with "template <>" Bootstrap and regtesting in progress. Does this patch look OK for trunk? gcc/cp/ChangeLog: PR c++/16160 * parser.c (cp_parser_class_head): Identify and reject malformed template-id declarations and definitions. --- gcc/cp/parser.c | 53 +++- gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C | 2 +- gcc/testsuite/g++.dg/ext/attrib9.C | 2 +- gcc/testsuite/g++.dg/template/crash54.C | 2 +- gcc/testsuite/g++.dg/template/error55.C | 11 +++ 5 files changed, 53 insertions(+), 17 deletions(-) create mode 100644 gcc/testsuite/g++.dg/template/error55.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index 3290dfa..f6dc004 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -20264,6 +20264,34 @@ cp_parser_class_head (cp_parser* parser, } virt_specifiers = cp_parser_virt_specifier_seq_opt (parser); + /* Make sure a top-level template-id declaration or definition is preceded + by "template" or "template <>". */ + if (template_id_p + && at_namespace_scope_p () + && parser->num_template_parameter_lists == 0 + && !processing_explicit_instantiation) +{ + if (cp_parser_next_token_starts_class_definition_p (parser)) + { + error_at (type_start_token->location, + "an explicit specialization must be preceded by " + "%%>"); + invalid_explicit_specialization_p = true; + /* Try to recover gracefully by taking the same action that would +have been taken by cp_parser_explicit_specialization. */ + ++parser->num_template_parameter_lists; + begin_specialization (); + } + else if (cp_parser_declares_only_class_p (parser)) + { + error_at (type_start_token->location, + "an explicit instantiation must be preceded by " + "%"); + type = error_mark_node; + goto out; + } +} + /* If it's not a `:' or a `{' then we can't really be looking at a class-head, since a class-head only appears as part of a class-specifier. We have to detect this situation before calling @@ -20275,6 +20303,16 @@ cp_parser_class_head (cp_parser* parser, goto out; } + if (processing_explicit_instantiation) +{ + error_at (type_start_token->location, + "an explicit instantiation may not have a definition"); + inform (type_start_token->location, + "use %%> to define an explicit specialization"); + type = error_mark_node; + goto out; +} + /* At this point, we're going ahead with the class-specifier, even if some other problem occurs. */ cp_parser_commit_to_tentative_parse (parser); @@ -20346,20 +20384,7 @@ cp_parser_class_head (cp_parser* parser, num_templates = 0; } } - /* An explicit-specialization must be preceded by "template <>". If - it is not, try to recover gracefully. */ - if (at_namespace_scope_p () - && parser->num_template_parameter_lists == 0 - && template_id_p) -{ - error_at (type_start_token->location, - "an explicit specialization must be preceded by %%>"); - invalid_explicit_specialization_p = true; - /* Take the same action that would have been taken by -cp_parser_explicit_specialization. */ - ++parser->num_template_parameter_lists; - begin_specialization (); -} + /* There must be no "return" statements between this point and the end of this function; set "type "to the correct return value and use "goto done;" to return. */ diff --git a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C index 3dc51ee..4957ba1 100644 --- a/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C +++ b/gcc/testsuite/g++.dg/cpp0x/gen-attrs-9.C @@ -9,4 +9,4 @@ enum [[gnu::unused]] e; // { dg-warning "already defined" } struct [[gnu::unused]] B *p; // { dg-warning "attributes" } template struct A { }; -struct [[gnu::unused]] A; // { dg-warning "att
PR 64481 (bootstrap miscompare)
Hi, in December I conditoinally disabled expensive sanity checking in inliner. This triggeres bootstrap miscompare because caches are getting out of sync. This patch fixes the problem found by sanity check - the node growth cache was removed from use in badness calculation by Richard a while ago, but the cache itself remained while the updating logic was dropped. This of course leads to somewhat randomish results. The other problem fixed is that in some cases we forget to walk through aliases to get into the callee. Bootstrapped/regtested x86_64-linux, comitted. PR ipa/64481 * ipa-inline-analysis.c (node_growth_cache): Remove. (initialize_growth_caches): Do not initialize it. (free_growth_caches): Do not free it. (do_estimate_growth): Rename to ... (estimate_growth): ... this one; drop growth cache code. (growth_likely_positive): Always go the heuristics way. * ipa-inline.c (can_inline_edge_p): Walk through aliases. (reset_edge_caches): Do not reset node growth. (heap_edge_removal_hook): Do not maintain cache. (inline_small_functions): Likewise; strenghten sanity check. (ipa_inline): Do not maintain caches. * ipa-inline.h (node_growth_cache): Remove. (do_estimate_growth): Remove to ... (estimate_growth): this one; remove inline version. (reset_node_growth_cache): Remove. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 219571) +++ ipa-inline-analysis.c (working copy) @@ -167,7 +167,6 @@ function_summary *inl vec inline_edge_summary_vec; /* Cached node/edge growths. */ -vec node_growth_cache; vec edge_growth_cache; /* Edge predicates goes here. */ @@ -1341,8 +1340,6 @@ initialize_growth_caches (void) { if (symtab->edges_max_uid) edge_growth_cache.safe_grow_cleared (symtab->edges_max_uid); - if (symtab->cgraph_max_uid) -node_growth_cache.safe_grow_cleared (symtab->cgraph_max_uid); } @@ -1352,7 +1349,6 @@ void free_growth_caches (void) { edge_growth_cache.release (); - node_growth_cache.release (); } @@ -3931,7 +3927,7 @@ do_estimate_growth_1 (struct cgraph_node /* Estimate the growth caused by inlining NODE into all callees. */ int -do_estimate_growth (struct cgraph_node *node) +estimate_growth (struct cgraph_node *node) { struct growth_data d = { node, 0, false }; struct inline_summary *info = inline_summaries->get (node); @@ -3960,12 +3956,6 @@ do_estimate_growth (struct cgraph_node * + 50) / 100; } - if (node_growth_cache.exists ()) -{ - if ((int) node_growth_cache.length () <= node->uid) - node_growth_cache.safe_grow_cleared (symtab->cgraph_max_uid); - node_growth_cache[node->uid] = d.growth + (d.growth >= 0); -} return d.growth; } @@ -3979,7 +3969,6 @@ bool growth_likely_positive (struct cgraph_node *node, int edge_growth ATTRIBUTE_UNUSED) { int max_callers; - int ret; struct cgraph_edge *e; gcc_checking_assert (edge_growth > 0); @@ -3999,10 +3988,6 @@ growth_likely_positive (struct cgraph_no || !node->can_remove_if_no_direct_calls_p ()) return true; - /* If there is cached value, just go ahead. */ - if ((int)node_growth_cache.length () > node->uid - && (ret = node_growth_cache[node->uid])) -return ret > 0; if (!node->will_be_removed_from_program_if_no_direct_calls_p () && (!DECL_COMDAT (node->decl) || !node->can_remove_if_no_direct_calls_p ())) Index: ipa-inline.c === --- ipa-inline.c(revision 219571) +++ ipa-inline.c(working copy) @@ -388,11 +388,11 @@ can_inline_edge_p (struct cgraph_edge *e else if (caller_tree != callee_tree) { if (((opt_for_fn (e->caller->decl, optimize) - > opt_for_fn (e->callee->decl, optimize)) + > opt_for_fn (callee->decl, optimize)) || (opt_for_fn (e->caller->decl, optimize_size) - != opt_for_fn (e->callee->decl, optimize_size))) + != opt_for_fn (callee->decl, optimize_size))) /* gcc.dg/pr43564.c. Look at forced inline even in -O0. */ - && !DECL_DISREGARD_INLINE_LIMITS (e->callee->decl)) + && !DECL_DISREGARD_INLINE_LIMITS (callee->decl)) { e->inline_failed = CIF_OPTIMIZATION_MISMATCH; inlinable = false; @@ -1095,9 +1095,6 @@ reset_edge_caches (struct cgraph_node *n if (where->global.inlined_to) where = where->global.inlined_to; - /* WHERE body size has changed, the cached growth is invalid. */ - reset_node_growth_cache (where); - for (edge = where->callers; edge; edge = edge->next_caller) if (edge->inline_failed) reset_edge_growth_cache (edge); @@ -1428,8 +1425,6 @@ add_new_edges_to_heap (edge_heap_t *heap static void heap_edge_removal_hook (struc
Re: [PATCH 0/6, nds32] Committed: Have -mcmodel option to support varied code model of programs.
2015-01-14 6:22 GMT+08:00 Joseph Myers : > On Tue, 13 Jan 2015, Chung-Ju Wu wrote: > >> To fix this issue, we are going to use -mcmodel=X options, which probably >> gives more flexibility to support varied code model on code generation. >> The -mgp-direct option now becomes meaningless and can be discarded. > > If you add or remove command-line options, you need to update invoke.texi > accordingly. > Thanks for your reminder. I will propose another patch to update documentation as soon as possible. Best regards, jasonwucj > -- > Joseph S. Myers > jos...@codesourcery.com
Re: Housekeeping work in backends.html
Eric Botcazou writes: > Some ports are missing (lm32, moxie, nios2, nvptx, rl78, rx) so the relevant > maintainers are CCed (see 6.3.9 Anatomy of a Target Back End in the doc). I think I got this right | Characteristics Target | HMSLQNFICBD lqrcpfgmbdiates ---+ moxie | F g ds AG
Re: [PATCH] add option to emit more array bounds warnigs
Jeff Law : > On 01/13/15 10:34, Martin Uecker wrote: > > Mon, 12 Jan 2015 11:00:44 -0700 > > Jeff Law : > >> On 11/11/14 23:13, Martin Uecker wrote: ... > >> Has this patch been bootstrapped and regression tested, if so on what > >> platform. > > > > x86_64-unknown-linux-gnu > Approved. Please install on the trunk. Sorry about the delays. I don't have write access ;-( Martin
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On Tue, Jan 13, 2015 at 03:17:08PM -0700, Jeff Law wrote: > >And finally there is the case of non-volatile asm with "memory" clobber > >with > >no memory stores in between the two - the posted (safer) patch will not > >allow to CSE the two, while in theory we could CSE them into just one asm. > I think we have to assume that CSEing them is wrong. The first may set > something in memory that is read by the second. > > Thoughts? I agree with pretty much everything you say in the thread, except for this idea that a memory clobber reads memory. No clobber reads anything. The commit that introduced the memory clobber concept, 426b38c9 (svn 1207), by rms, has as only comment /* `memory', don't cache memory across asm */ Segher
RE: [MIPS] Re-enable ABI->ISA inference
> -Original Message- > From: Matthew Fortune [mailto:matthew.fort...@imgtec.com] > Sent: Monday, January 05, 2015 6:09 PM > To: Moore, Catherine > Cc: 'gcc-patches@gcc.gnu.org' (gcc-patches@gcc.gnu.org) > Subject: [MIPS] Re-enable ABI->ISA inference > > The R6 patch introduced MIPS_ISA_LEVEL_SPEC into DRIVER_SELF_SPECS for > all configurations. One part of MIPS_ISA_LEVEL_SPEC is however > incompatible with those configurations which infer an ISA from an ABI > without specifically setting the default ISAs using --with-arch-[32|64]. > > I.e. a generic mips-linux (--enable-targets=all) and mips64-linux would fail > to > build the n32/n64 multilibs as -mips1 would be introduced by > DRIVER_SELF_SPECS. > > I have therefore split MIPS_ISA_LEVEL_SPEC into two. One part is suitable > for all confgurations and one part is only suitable for configurations that > infer > an ABI from an ISA (these tend to be cross-compiler vendor configurations) > > I have built and checked the driver generated options for all relevant > configurations and everything appears to work. > > Let me know if you can see any problems with this? > > Thanks, > Matthew > > gcc/ > > * config/mips/mips.h (MIPS_ISA_LEVEL_SPEC): Only infer an ISA > level from an ARCH; do not inject the default. > (MIPS_DEFAULT_ISA_LEVEL_SPEC): New macro split out from > MIPS_ISA_LEVEL_SPEC. > (MIPS_ISA_NAN2008_SPEC): Update comment. > (BASE_DRIVER_SELF_SPECS): Likewise. > * config/mips/elfoabi.h (DRIVER_SELF_SPECS): Add > MIPS_DEFAULT_ISA_LEVEL_SPEC. > * config/mips/mti-elf.h (DRIVER_SELF_SPECS): Likewise. > * config/mips/mti-linux.h (DRIVER_SELF_SPECS): Likewise. > * config/mips/sde.h (DRIVER_SELF_SPECS): Likewise. > --- This looks OK.
Re: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
On Tue, Jan 13, 2015 at 5:03 AM, H.J. Lu wrote: > On Mon, Jan 12, 2015 at 11:50:41PM +, Joseph Myers wrote: >> On Mon, 12 Jan 2015, H.J. Lu wrote: >> >> > +if test x$enable_default_pie = xyes; then >> > + AC_MSG_CHECKING(if $target supports default PIE) >> > + enable_default_pie=no >> > + case $target in >> > +i?86*-*-linux* | x86_64*-*-linux*) >> > + saved_LDFLAGS="$LDFLAGS" >> > + saved_CFLAGS="$CFLAGS" >> > + CFLAGS="$CFLAGS -fPIE" >> > + LDFLAGS="$LDFLAGS -fPIE -pie" >> > + AC_TRY_LINK(,,[enable_default_pie=yes],) >> > + LDFLAGS="$saved_LDFLAGS" >> > + CFLAGS="$saved_CFLAGS" >> > + ;; >> > +*) >> > + ;; >> > +esac >> >> There should not be any such hardcoding of targets here without concrete >> evidence that the targets for which this sets enable_default_pie=no really >> cannot support PIE. In particular, there is no reason at all for this to >> be architecture-specific; all GNU/Linux architectures should support PIE. >> >> I believe AC_TRY_LINK here will test for the host, whereas what you want >> to know is what's supported for the target (but it's not possible to run >> link tests for the target at this point; the compiler for the target >> hasn't even been built). >> >> So: just presume that if the user passes --enable-default-pie then they >> know what they are doing, and don't try to override their choice. >> >> > diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi >> > index c9e3bf1..89fc305 100644 >> > --- a/gcc/doc/install.texi >> > +++ b/gcc/doc/install.texi >> > @@ -1583,6 +1583,10 @@ not be built. >> > Specify that the run-time libraries for stack smashing protection >> > should not be built. >> > >> > +@item --enable-default-pie >> > +Turn on @option{-fPIE} and @option{-pie} by default if supported. >> > +Currently supported targets are i?86-*-linux* and x86-64-*-linux*. >> >> The "if supported" and target list can then be removed here. >> > > Here is the updated patch. To support --enable-default-pie, each target > must update STARTFILE_SPEC to support PIE_SPEC and NO_PIE_SPEC. I can > provide STARTFILE_SPEC patch if needed. > > Thanks. > > > H.J. > --- > gcc/ > > 2015-01-12 Magnus Granberg > H.J. Lu > > * Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@. > (LINKER): Add @NO_PIE_FLAG@. > (libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for > --enable-default-pie. > * common.opt (fPIE): Initialize to -1. > (fpie): Likewise. > (static): Add "RejectNegative Negative(shared)". > (no-pie): New option. > (pie): Replace "Negative(shared)" with "Negative(no-pie)". > * configure.ac: Add --enable-default-pie. > (NO_PIE_CFLAGS): New. Check if -fno-PIE works. AC_SUBST. > (NO_PIE_FLAG): New. Check if -no-pie works. AC_SUBST. > * defaults.h (DEFAULT_FLAG_PIE): New. Default PIE to -fPIE. > * gcc.c (NO_PIE_SPEC): New. > (PIE_SPEC): Likewise. > (LD_PIE_SPEC): Likewise. > (LINK_PIE_SPEC): Handle -no-pie. Use PIE_SPEC and LD_PIE_SPEC. > * opts.c (DEFAULT_FLAG_PIE): New. Set to 0 if ENABLE_DEFAULT_PIE > is undefined. > (finish_options): Update opts->x_flag_pie if it is -1. > * config/gnu-user.h (FVTABLE_VERIFY_SPEC): New. > (GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC. Use > NO_PIE_SPEC and NO_PIE_SPEC if ENABLE_DEFAULT_PIE is defined. > (GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC. > * doc/install.texi: Document --enable-default-pie. > * doc/invoke.texi: Document -no-pie. > * config.in: Regenerated. > * configure: Likewise. > > gcc/ada/ > > 2015-01-12 H.J. Lu > > * gcc-interface/Makefile.in (TOOLS_LIBS): Add @NO_PIE_FLAG@. > > libgcc/ > > 2015-01-12 H.J. Lu > > * Makefile.in (CRTSTUFF_CFLAGS): Add $(NO_PIE_CFLAGS). > This is the updated patch. I fixed the -r regression. LTO tests pass now. -- H.J. gcc/ 2015-01-12 Magnus Granberg H.J. Lu * Makefile.in (COMPILER): Add @NO_PIE_CFLAGS@. (LINKER): Add @NO_PIE_FLAG@. (libgcc.mvars): Set NO_PIE_CFLAGS to -fno-PIE for --enable-default-pie. * common.opt (fPIE): Initialize to -1. (fpie): Likewise. (no-pie): New option. (pie): Replace "Negative(shared)" with "Negative(no-pie)". * configure.ac: Add --enable-default-pie. (NO_PIE_CFLAGS): New. Check if -fno-PIE works. AC_SUBST. (NO_PIE_FLAG): New. Check if -no-pie works. AC_SUBST. * defaults.h (DEFAULT_FLAG_PIE): New. Default PIE to -fPIE. * gcc.c (NO_PIE_SPEC): New. (PIE_SPEC): Likewise. (LD_PIE_SPEC): Likewise. (LINK_PIE_SPEC): Handle -no-pie. Use PIE_SPEC and LD_PIE_SPEC. * opts.c (DEFAULT_FLAG_PIE): New. Set to 0 if ENABLE_DEFAULT_PIE is undefined. (finish_options): Update opts->x_flag_pie if it is -1. * config/gnu-user.h (FVTABLE_VERIFY_SPEC): New. (GNU_USER_TARGET_STARTFILE_SPEC): Use FVTABLE_VERIFY_SPEC. Use P
Re: [PATCH] PR59448 - Promote consume to acquire
On 01/13/2015 02:06 PM, Andrew MacLeod wrote: On 01/13/2015 01:38 PM, Torvald Riegel wrote: On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: On 01/13/2015 09:59 AM, Richard Biener wrote: On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod wrote: Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 Basically we can generate incorrect code for an atomic consume operation in some circumstances. The general feeling seems to be that we should simply promote all consume operations to an acquire operation until there is a better definition/understanding of the consume model and how GCC can track it. I proposed a simple patch in the PR, and I have not seen or heard of any dissenting opinion. We should get this in before the end of stage 3 I think. The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. This new patch simply makes the adjustment after any errors are checked on the originally specified model. It bootstraps on x86_64-unknown-linux-gnu and passes all regression testing. I also built an aarch64 compiler and it appears to issue the LDAR as specified in the PR, but anyone with a vested interest really ought to check it out with a real build to be sure. OK for trunk? Why not patch get_memmodel? (not sure if that catches all cases) Richard. That was the original patch. The issue is that it promotes consume to acquire before any error checking gets to look at the model, so then we allow illegal specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that. I imagine it was probably in a previous incarnation of the standard... Most of this was actually implemented based on very early draft standards years and years ago and never revised. It wasnt put in by me unless the standard at some point said had such wording. The current standard appears to make no mention of the situation. It seems that it should be safe to move back to the original patch, and remove that error test for using consume on an exchange... Andrew Here's the original patch along with the lien removed from the testcase. x86_64-unknown-linux-gnu bootstraps, no regressions, and so forth. OK for trunk? Andrew
Re: Housekeeping work in backends.html
> H - you could argue a hardware implementation does not exist as it's a > virtual target, but it can obviously be compiled down to run on > hardware that does exist. Agreed. > Q - "registers" are typed and you can declare 64 bit registers, so > probably this is true OK. > f - Not even sure what this is about. It only defines a small epilogue > pattern. I think this means that the port still uses asm prologue/epilogue. That's not the case here, so 'f' should probably be omitted and 'g' be added. As a matter of fact, there are no ports left with 'f'. > a - Port uses neither LRA nor reload. Might be a new characteristic > (along with several others). > > The closest string is probably > > SQCqfbde Thanks, I have installed "SQCqgbde" based on the discussion above. -- Eric Botcazou
Re: RFC: Two minor optimization patterns
On Tue, Jan 13, 2015 at 2:41 PM, Rasmus Villemoes wrote: > [My first attempt at submitting a patch for gcc, so please forgive me > if I'm not following the right protocol.] There are a few things missing. For one, a testcase or two for the added optimizations. > > Sometimes rounding a variable to the next even integer is written x += x > & 1. This usually means using an extra register (and hence at least an > extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The > first pattern below tries to do this transformation. > > While playing with various ways of rounding down, I noticed that gcc > already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply > x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this > is not handled, so I included the second pattern below. > > I've tested the below in the sense that gcc compiles and that trivial > test cases get compiled as expected. The other thing you missed is a changelog entry for the change you did. Also you mentioned you tested the patch below but did not mention which target you tested it on and you should run the full GCC testsuite. https://gcc.gnu.org/contribute.html is a good page to start with how to handle most of the items above. https://gcc.gnu.org/wiki/HowToPrepareATestcase is a good page on how to write the testcase for testing the added optimization. Thanks, Andrew Pinski > > Rasmus > > > > diff --git gcc/match.pd gcc/match.pd > index 81c4ee6..04a0bc4 100644 > --- gcc/match.pd > +++ gcc/match.pd > @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3. If not see > (abs tree_expr_nonnegative_p@0) > @0) > > +/* x + (x & 1) -> (x + 1) & ~1 */ > +(simplify > + (plus @0 (bit_and @0 integer_onep@1)) > + (bit_and (plus @0 @1) (bit_not @1))) > + > +/* x | ~(x | y) -> x | ~y */ > +(simplify > + (bit_ior @0 (bit_not (bit_ior @0 @1))) > + (bit_ior @0 (bit_not @1))) > + > > /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) > when profitable.
Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian
On Tue, 13 Jan 2015, Jeff Law wrote: > In many ways having the compiler or assembler spitting out an error here is > preferable to silently compiling the code. That would also help explain why As usual, an error is incorrect in such a case that only has undefined behavior at runtime (but it may be compiled into an abort if the behavior is unconditionally undefined, and the abort doesn't replace anything before the undefined behavior that might have stopped the undefined behavior from occurring). -- Joseph S. Myers jos...@codesourcery.com
RFC: Two minor optimization patterns
[My first attempt at submitting a patch for gcc, so please forgive me if I'm not following the right protocol.] Sometimes rounding a variable to the next even integer is written x += x & 1. This usually means using an extra register (and hence at least an extra mov instruction) compared to the equivalent x = (x + 1) & ~1. The first pattern below tries to do this transformation. While playing with various ways of rounding down, I noticed that gcc already optimizes all of x-(x&3), x^(x&3) and x&~(x&3) to simply x&~3. In fact, x&~(x&y) is rewritten as x&~y. However, the dual of this is not handled, so I included the second pattern below. I've tested the below in the sense that gcc compiles and that trivial test cases get compiled as expected. Rasmus diff --git gcc/match.pd gcc/match.pd index 81c4ee6..04a0bc4 100644 --- gcc/match.pd +++ gcc/match.pd @@ -262,6 +262,16 @@ along with GCC; see the file COPYING3. If not see (abs tree_expr_nonnegative_p@0) @0) +/* x + (x & 1) -> (x + 1) & ~1 */ +(simplify + (plus @0 (bit_and @0 integer_onep@1)) + (bit_and (plus @0 @1) (bit_not @1))) + +/* x | ~(x | y) -> x | ~y */ +(simplify + (bit_ior @0 (bit_not (bit_ior @0 @1))) + (bit_ior @0 (bit_not @1))) + /* Try to fold (type) X op CST -> (type) (X op ((type-x) CST)) when profitable.
Re: [PATCH] PR59448 - Promote consume to acquire
On Tue, 13 Jan 2015, Andrew MacLeod wrote: > It seems that it should be safe to move back to the original patch, and remove > that error test for using consume on an exchange... I don't think there should be any such errors, for any of the atomic built-in functions, only warnings (with the model converted to MEMMODEL_SEQ_CST if not valid, just like a non-constant model). This is just like any other invalid function argument of a suitable type: undefined behavior only if the call is executed. http://www.open-std.org/jtc1/sc22/wg14/12553 -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote: > On 01/13/15 09:38, Segher Boessenkool wrote: > >On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote: > >>3) on request from Richard (which Segher on IRC argues against), "memory" > >>clobber also prevents CSE; > > > >As extend.texi used to say: > > > >" > >If your assembler instructions access memory in an unpredictable > >fashion, add @samp{memory} to the list of clobbered registers. This > >causes GCC to not keep memory values cached in registers across the > >assembler instruction and not optimize stores or loads to that memory. > >You also should add the @code{volatile} keyword if the memory > >affected is not listed in the inputs or outputs of the @code{asm}, as > >the @samp{memory} clobber does not count as a side-effect of the > >@code{asm}. > >" > > > >so a "memory" clobber in a non-volatile asm should not prevent CSE. > My reading of that paragraph is somewhat different. It seems so. I read that as "GCC can delete a memory clobber if it wants to" (just like it can delete any other clobber when it wants to). The only difference between ASM_OPERANDS and any other RTL is that recog is useless for ASM_OPERANDS, it cannot tell you if after you modify the construct you are left with something valid; so the only thing the compiler can change about an asm is to delete it whole. So unlike most RTL, where the compiler is free to remove a clobber if what is left is valid RTL, the only way to delete a clobber from an asm is to delete the whole asm. > The key here is the memory clobber affects optimization of instructions > around the asm while the volatile specifier affects the optimization of > the ASM itself. Those are roughly the effects, yes. Writing unspecified stuff to unspecified memory is a pretty heavy hammer ;-) > A memory clobber must inhibit CSE of memory references on either side of > the asm because the asm must be assumed to read or write memory in > unpredictable ways. I don't see how that follows. The asm itself can be CSEd; its clobber then disappears in a puff of smoke. > The volatile specifier tells the compiler that the asm itself must be > preserved, even if dataflow shows the outputs as not used. Yes. Segher
Re: [PATCH 0/6, nds32] Committed: Have -mcmodel option to support varied code model of programs.
On Tue, 13 Jan 2015, Chung-Ju Wu wrote: > To fix this issue, we are going to use -mcmodel=X options, which probably > gives more flexibility to support varied code model on code generation. > The -mgp-direct option now becomes meaningless and can be discarded. If you add or remove command-line options, you need to update invoke.texi accordingly. -- Joseph S. Myers jos...@codesourcery.com
[PATCH, committed] jit: New API entrypoint: gcc_jit_context_dump_reproducer_to_file
When debugging libgccjit or client code, the recipe for reproducing a bug can be very awkward. For example, consider client code, linked against many libraries, which parses some source file into some internal representation, and then walks this IR, calling into libgccjit. If this encounters a bug (e.g. an ICE deep inside gcc) then reproducing it would require the client code and its dependencies (e.g. data). This patch adds an API entrypoint: gcc_jit_context_dump_reproducer_to_file which will write out C code for a simple executable linked only to libgccjit that performs the equivalent calls into libgccjit, without needing the client code and its data. This may be useful when debugging the library or client code, for reducing a complicated recipe for reproducing a bug into a simpler form. To give us test coverage, the patch also updates the testsuite, so that most testcase now call this entrypoint, and jit.exp then verifies that .c reproducers were indeed written, and verifies that they compile. It doesn't yet verify that they run. Doing so would be almost a quine; in theory we could verify that each reproducer can output its own source code, diffing the results. Implementing that is left for another day (or as an exercise to the reader...). This patch takes jit.sum from 7362 to 7494 passes. Committed to trunk as r219564. gcc/jit/ChangeLog: * docs/cp/topics/contexts.rst (Debugging): Add gccjit::context::dump_reproducer_to_file. * docs/internals/index.rst (Design notes): New section, discussing input validation and gcc_jit_context_dump_reproducer_to_file. * docs/topics/contexts.rst (Debugging): Add gcc_jit_context_dump_reproducer_to_file. * docs/_build/texinfo/libgccjit.texi: Regenerate. * jit-common.h (gcc::jit::dump::get_context): New accessor. * jit-recording.c: Include "hash-map.h". Within namespace ::gcc::jit... (dump::write): Flush each line. (dump::make_location): Pass false for new param "created_by_user". (class allocator): New class. (allocator::~allocator): New function. (allocator::xstrdup_printf): New function. (allocator::xstrdup_printf_va): New function. (class reproducer): New subclass of dump. (reproducer::reproducer): New function. (reproducer::write_params): New function. (reproducer::write_args): New function. (reproducer::make_identifier): New function. (reproducer::make_tmp_identifier): New function. (reproducer::get_identifier): New pair of functions. (reproducer::get_identifier_as_rvalue): New function. (reproducer::get_identifier_as_lvalue): New function. (reproducer::get_identifier_as_type): New function. (reproducer::xstrdup_printf): New function. (recording::context::context): Initialize m_toplevel_ctxt. (recording::context::new_location): Add param created_by_user. (str_option_reproducer_strings): New table of strings. (int_option_reproducer_strings): Likewise. (bool_option_reproducer_strings): Likewise. (get_type_enum_strings): Likewise. (names_of_function_kinds): Likewise. (global_kind_reproducer_strings): Likewise. (unary_op_reproducer_strings): Likewise. (binary_op_reproducer_strings): Likewise. (comparison_reproducer_strings): Likewise. Within namespace ::gcc::jit::recording::... (context::dump_reproducer_to_file): New function. (string::write_reproducer): Likewise. (location::write_reproducer): Likewise. (type::access_as_type): Likewise. (memento_of_get_type::write_reproducer): Likewise. (memento_of_get_pointer::write_reproducer): Likewise. (memento_of_get_const::write_reproducer): Likewise. (memento_of_get_volatile::write_reproducer): Likewise. (array_type::write_reproducer): Likewise. (function_type::write_reproducer): Likewise. (function_type::write_deferred_reproducer): Likewise. (field::write_reproducer): Likewise. (struct_::access_as_type): Likewise. (struct_::write_reproducer): Likewise. (union_::write_reproducer): Likewise. (fields::write_reproducer): Likewise. (rvalue::access_as_rvalue): Likewise. (lvalue::access_as_rvalue): Likewise. (lvalue::access_as_lvalue): Likewise. (param::access_as_rvalue): Likewise. (param::access_as_lvalue): Likewise. (param::write_reproducer): Likewise. (function::write_reproducer): Likewise. (block::write_reproducer): Likewise. (global::write_reproducer): Likewise. (memento_of_new_rvalue_from_const ::write_reproducer): Likewise. (memento_of_new_rvalue_from_const ::write_reproducer): Likewise. (memento_of_new_rvalue_from_const ::write_reproducer): Likewise. (
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On 01/13/15 13:13, Jakub Jelinek wrote: On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote: On 01/13/15 09:38, Segher Boessenkool wrote: On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote: 3) on request from Richard (which Segher on IRC argues against), "memory" clobber also prevents CSE; As extend.texi used to say: " If your assembler instructions access memory in an unpredictable fashion, add @samp{memory} to the list of clobbered registers. This causes GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory. You also should add the @code{volatile} keyword if the memory affected is not listed in the inputs or outputs of the @code{asm}, as the @samp{memory} clobber does not count as a side-effect of the @code{asm}. " so a "memory" clobber in a non-volatile asm should not prevent CSE. My reading of that paragraph is somewhat different. The key here is the memory clobber affects optimization of instructions around the asm while the volatile specifier affects the optimization of the ASM itself. A memory clobber must inhibit CSE of memory references on either side of the asm because the asm must be assumed to read or write memory in unpredictable ways. The volatile specifier tells the compiler that the asm itself must be preserved, even if dataflow shows the outputs as not used. That is not necessarily in conflict. Possibly not :-) This stuff isn't trivial and as well meaning as folks trying to update the docs have been, it's possible subtle issues like these have been missed, even in the review process. I know that for me it's easier to reason about code changes like this than their associated documentation :-) My reading of Jeff's comment is that in int a; int foo (void) { int b, c, d, e; b = a; asm ("..." : "=r" (c) : : "memory"); d = a; asm ("..." : "=r" (e) : : "memory"); return b + d + 2 * (c + e); } we are not allowed to CSE d = a; into d = b; Precisely. At least that's how I read things and it makes sense to the part of my brain that used to split time between kernel & GCC development in a previous life. In effect the "memory" clobber is an aggregation of the read, write, clobbers dataflow for memory (and imprecise as it hits all memory). . CSE invalidate_from_clobbers should ensure that already, even when we don't do anything special about "memory" clobber in the patch. OK. Another thing is if there is a store in between the two non-volatile asms with "memory" clobber, here I'm not sure if with the alternate patch we'd treat the "memory" clobber as use of everything previously stored into memory (in this regard the posted version is safe). I woudln't be terribly surprised if DSE isn't safe in this regard. I don't recall CSE doing any kind of dead store elimination so it wouldn't likely care that the memory clobber implies a read as well. And finally there is the case of non-volatile asm with "memory" clobber with no memory stores in between the two - the posted (safer) patch will not allow to CSE the two, while in theory we could CSE them into just one asm. I think we have to assume that CSEing them is wrong. The first may set something in memory that is read by the second. Thoughts? Jeff
Re: [PATCH/expand] PR64011 Adjust bitsize when partial overflow happen for big-endian
On 01/09/15 06:39, Jiong Wang wrote: the bug testcase is === typedef short U __attribute__((may_alias, aligned (1))); struct S { _Complex float d __attribute__((aligned (8))); }; void bar(struct S); void f5 (int x) { struct S s; ((U *)((char *) &s.d + 1))[3] = x; bar (s); } So I'm going to focus on that assignment statement. Doesn't that write outside the bounds of "s"?If I'm reading everything correctly we have an object that is 8 bytes (a complex float). The assignment is a 2 byte write starting at byte 7 in the object. ISTM that writes one byte beyond the underlying object, at which point we're in undefined behaviour territory. In many ways having the compiler or assembler spitting out an error here is preferable to silently compiling the code. That would also help explain why we haven't seen this on other big endian targets with rich bitfield support (PA and H8 come to mind) -- it only arises in cases of undefined behaviour AFAICT. What I do not like is the ICE or unrecognizable insn error. The odds that someone running into those messages is going to realize it's because the input source is bogus is pretty small. I'm really tempted here to use the conditional you want to add to store_bit_field_using_insv and when it triggers issue an error/warning instead of or in addition to truncating the size of the assignment. Thoughts? jeff
[PATCH] Correct target selector in -mfentry tests
-fprofile -mfentry works with PIE if gcrt1.o is compiled with -fPIC. A glibc has been filed, PR 17836, and a glibc patch has been submitted. OK for trunk? Thanks. H.J. -- * gcc.target/i386/fentry-override.c: Properly place {} in target selector. Remove nonpic. * gcc.target/i386/fentry.c: Likewise. --- gcc/testsuite/gcc.target/i386/fentry-override.c | 2 +- gcc/testsuite/gcc.target/i386/fentry.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/testsuite/gcc.target/i386/fentry-override.c b/gcc/testsuite/gcc.target/i386/fentry-override.c index 0464454..4dd87a8 100644 --- a/gcc/testsuite/gcc.target/i386/fentry-override.c +++ b/gcc/testsuite/gcc.target/i386/fentry-override.c @@ -1,5 +1,5 @@ /* Test -mfentry override */ -/* { dg-do compile { target { *-*-linux* } && { nonpic || ! { ia32 } } } } */ +/* { dg-do compile { target { { *-*-linux* } && { ! { ia32 } } } } } */ /* { dg-options "-mfentry" } */ /* { dg-final { scan-assembler-not "__fentry__" } } */ /* Origin: Andi Kleen */ diff --git a/gcc/testsuite/gcc.target/i386/fentry.c b/gcc/testsuite/gcc.target/i386/fentry.c index d0d70c6..3548dd7 100644 --- a/gcc/testsuite/gcc.target/i386/fentry.c +++ b/gcc/testsuite/gcc.target/i386/fentry.c @@ -1,5 +1,5 @@ /* Test -mfentry */ -/* { dg-do compile { target { *-*-linux* } && { nonpic || ! { ia32 } } } } */ +/* { dg-do compile { target { { *-*-linux* } && { ! { ia32 } } } } } */ /* { dg-options "-fprofile -mfentry" } */ /* { dg-final { scan-assembler "__fentry__" } } */ /* Origin: Andi Kleen */ -- 1.9.3
[committted] PATCH: Add dg-require-profiling to gcc.dg/aru-2.c
Hi, I checked in this patch to add dg-require-profiling to gcc.dg/aru-2.c as an obvious fix. H.J. --- Index: ChangeLog === --- ChangeLog (revision 219560) +++ ChangeLog (working copy) @@ -1,5 +1,9 @@ 2015-01-13 H.J. Lu + * gcc.dg/aru-2.c: Add dg-require-profiling. + +2015-01-13 H.J. Lu + * lib/target-supports.exp (check_profiling_available): Check if -pg links. Index: gcc.dg/aru-2.c === --- gcc.dg/aru-2.c (revision 219560) +++ gcc.dg/aru-2.c (working copy) @@ -1,4 +1,5 @@ /* { dg-do run } */ +/* { dg-require-profiling "-pg" } */ /* { dg-options "-O2 -pg" } */ static int __attribute__((noinline))
[Patch, Fortran, F03] PR 58023: ICE on invalid with bad PPC declaration
Hi all, the attached patch fixes an ICE-on-invalid problem with procedure-pointer components by making sure that we continue resolving all components of a derived type, even after an error is thrown. Regtested on x86_64-unknown-linux-gnu. Ok for trunk? Cheers, Janus 2015-01-13 Janus Weil PR fortran/58023 * resolve.c (resolve_fl_derived0): Continue resolving next component after error. 2015-01-13 Janus Weil PR fortran/58023 * gfortran.dg/proc_ptr_comp_43.f90: New. Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (Revision 219552) +++ gcc/fortran/resolve.c (Arbeitskopie) @@ -12389,7 +12389,7 @@ resolve_fl_derived0 (gfc_symbol *sym) { gfc_error ("Coarray component %qs at %L must be allocatable with " "deferred shape", c->name, &c->loc); - return false; + continue; } /* F2008, C443. */ @@ -12398,7 +12398,7 @@ resolve_fl_derived0 (gfc_symbol *sym) { gfc_error ("Component %qs at %L of TYPE(C_PTR) or TYPE(C_FUNPTR) " "shall not be a coarray", c->name, &c->loc); - return false; + continue; } /* F2008, C444. */ @@ -12409,7 +12409,7 @@ resolve_fl_derived0 (gfc_symbol *sym) gfc_error ("Component %qs at %L with coarray component " "shall be a nonpointer, nonallocatable scalar", c->name, &c->loc); - return false; + continue; } /* F2008, C448. */ @@ -12417,7 +12417,7 @@ resolve_fl_derived0 (gfc_symbol *sym) { gfc_error ("Component %qs at %L has the CONTIGUOUS attribute but " "is not an array pointer", c->name, &c->loc); - return false; + continue; } if (c->attr.proc_pointer && c->ts.interface) @@ -12427,7 +12427,7 @@ resolve_fl_derived0 (gfc_symbol *sym) if (!sym->attr.vtype && !check_proc_interface (ifc, &c->loc)) { c->tb->error = 1; - return false; + continue; } if (ifc->attr.if_source || ifc->attr.intrinsic) @@ -12471,7 +12471,10 @@ resolve_fl_derived0 (gfc_symbol *sym) gfc_charlen *cl = gfc_new_charlen (sym->ns, ifc->ts.u.cl); if (cl->length && !cl->resolved && !gfc_resolve_expr (cl->length)) - return false; + { + c->tb->error = 1; + continue; + } c->ts.u.cl = cl; } } @@ -12514,7 +12517,7 @@ resolve_fl_derived0 (gfc_symbol *sym) "at %L has no argument %qs", c->name, c->tb->pass_arg, &c->loc, c->tb->pass_arg); c->tb->error = 1; - return false; + continue; } } else @@ -12528,7 +12531,7 @@ resolve_fl_derived0 (gfc_symbol *sym) "must have at least one argument", c->name, &c->loc); c->tb->error = 1; - return false; + continue; } me_arg = c->ts.interface->formal->sym; } @@ -12544,7 +12547,7 @@ resolve_fl_derived0 (gfc_symbol *sym) " the derived type %qs", me_arg->name, c->name, me_arg->name, &c->loc, sym->name); c->tb->error = 1; - return false; + continue; } /* Check for C453. */ @@ -12554,7 +12557,7 @@ resolve_fl_derived0 (gfc_symbol *sym) "must be scalar", me_arg->name, c->name, me_arg->name, &c->loc); c->tb->error = 1; - return false; + continue; } if (me_arg->attr.pointer) @@ -12563,7 +12566,7 @@ resolve_fl_derived0 (gfc_symbol *sym) "may not have the POINTER attribute", me_arg->name, c->name, me_arg->name, &c->loc); c->tb->error = 1; - return false; + continue; } if (me_arg->attr.allocatable) @@ -12572,12 +12575,15 @@ resolve_fl_derived0 (gfc_symbol *sym) "may not be ALLOCATABLE", me_arg->name, c->name, me_arg->name, &c->loc); c->tb->error = 1; - return false; + continue; } if (gfc_type_is_extensible (sym) && me_arg->ts.type != BT_CLASS) - gfc_error ("Non-polymorphic passed-object dummy argument of %qs" - " at %L", c->name, &c->loc); + { + gfc_error ("Non-polymorphic passed-object dummy argument of %qs" +" at %L", c-
Re: PR54442 build_qualified_type produces a non-canonical type
On 01/13/2015 01:46 PM, Paolo Carlini wrote: In fact, I noticed today that this is a 4.8/4.9 Regression too. Shall I try to apply the patchlet to 4_9-branch too and, if testing passes, commit there and close the bug? OK. Jason
Re: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR
On 01/13/15 13:47, Martin Liška wrote: Hello. As pointed out in the following PR64307, IPA ICF is missing support for aforementioned TREE types. Apart from that, the patch also fixes BIT_FIELD_REF, which was broken. Tested on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin 0001-IPA-ICF-handle-IMAGPART_EXPR-and-REALPART_EXPR.patch From 036a27ec12ad556160cb92a86fd99ed02b66ed1a Mon Sep 17 00:00:00 2001 From: mliska Date: Tue, 13 Jan 2015 18:52:22 +0100 Subject: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR. gcc/testsuite/ChangeLog: 2015-01-13 Martin Liska * gcc.dg/ipa/pr64307.c: New test. gcc/ChangeLog: 2015-01-13 Martin Liska * ipa-icf-gimple.c (func_checker::compare_operand): Add support for IMAGPART_EXPR and REALPART_EXPR and fix BIT_FIELD_REF comparison. OK. Jeff
Re: [testsuite] PATCH: Check if -pg available
On 01/13/15 13:42, H.J. Lu wrote: since check_profiling_available result is cached. It is wrong to use the cached result from -m32 for -m64. Here is the updated patch. Tested with RUNTESTFLAGS="--target_board='unix{-m32,-m64}'. OK for trunk? Yes, the updated patch is OK. jeff
Re: [PATCH][ARM] FreeBSD ARM support, EABI, v3
On 13.01.15 11:25, Ramana Radhakrishnan wrote: On Thu, Jan 8, 2015 at 8:51 PM, Andreas Tobler wrote: On 08.01.15 17:27, Richard Earnshaw wrote: On 29/12/14 18:44, Andreas Tobler wrote: All, here is the third attempt to support ARM with FreeBSD. In the meantime we found another issue in the unwinder where I had to adapt some stuff. The unwind_phase2_forced function in libgcc calls a stop_fn function. This stop_fn is in FreeBSD's libthr implementation and is called thread_unwind_stop. This thread_unwind_stop is a generic function used on all FreeBSD archs. The issue is now that this thread_unwind_stop expects a double int for the exception_class, like on every other arch. For ARM EABI this exception_class is an array of char which is passed in one register as pointer vs. two registers for a double int. To solve this issue we defined the exception_class as double integer for FreeBSD. My apologies for the slow response, some other work and then holidays intervened. Np, the only issue which made me hurry was the stage 4 entering this week. From my understanding of the ABI document the implementation is currently as mandated by the ABI. Also this isn't a part of the ABI that's available for the platform (here FreeBSD to manipulate and change as per it's wishes). ARM EHABI is special for software, making FreeBSD more "special" for ARM appears to be counter intuitive from my point of view. A number of exception unwinding libraries. for e.g. libobjc , libstdc++ all use this implementation of exception_class. Therefore this creates a divergence for the FreeBSD port which is different from everything else. I expect that a number of language run time support libraries that supported the ARM EHABI would be using such an implementation, therefore you need to fix every single implementation of this in every unwinder that supports the ARM EHABI which I expect to have been ported to in a number of libraries already. (I already see this in libobjc and libstdc++ in the GCC tree) Grr ;) I didn't want to hear this answer, but I expected it somehow. My proposal was the least effort for me. The other way round is going to be very hard. Maybe impossible. It is not only FreeBSD which is affected but also llvm and friends. They use for exception_class uint64_t. I have to take a picture how the effort is and if it would be possible to do such a change in FreeBSD and more important in llvm etc. I would rather fix the thread_unwind_stop implementation in libthr for ARM EHABI rather than make this change. It wouldn't be a 'fix' but more a wrapper I think. This adaptation reduced the failure count in libstdc++ by about 40 fails. I build and test this port on a regular basis and I post the results to the usual place. Thanks for doing this. I'm really glad that FreeBSD is finally moving to EABI. Thanks for the review and the feedback. Gruss, Andreas
C++ PATCH for c++/64356 and libstdc++/58777 (constexpr pointer arithmetic)
In this testcase we are iterating through an array which is a local variable in a constexpr function. So its address is not constant, but we can still do arithmetic on it to get to the address of an element and then pull out the value of the element; we shouldn't reject the evaluation as non-constant unless the non-constant address is part of the value of the complete expression. This patch also fixes libstdc++/58777, another issue with giving up on an intermediate non-constant address even though the final value is constant. Tested x86_64-pc-linux-gnu, applying to trunk. commit 10760eb60bfd8f5de38ff1194718851a3d0f9073 Author: Jason Merrill Date: Tue Jan 13 07:45:03 2015 -0500 PR c++/64356 PR libstdc++/58777 * constexpr.c (cxx_eval_binary_expression): Don't VERIFY_CONSTANT pointer expressions. (cxx_eval_increment_expression): Likewise. diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c index 650250b..1432506 100644 --- a/gcc/cp/constexpr.c +++ b/gcc/cp/constexpr.c @@ -1616,10 +1616,15 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, tree lhs, rhs; lhs = cxx_eval_constant_expression (ctx, orig_lhs, /*lval*/false, non_constant_p, overflow_p); - VERIFY_CONSTANT (lhs); + /* Don't VERIFY_CONSTANT if this might be dealing with a pointer to + a local array in a constexpr function. */ + bool ptr = POINTER_TYPE_P (TREE_TYPE (lhs)); + if (!ptr) +VERIFY_CONSTANT (lhs); rhs = cxx_eval_constant_expression (ctx, orig_rhs, /*lval*/false, non_constant_p, overflow_p); - VERIFY_CONSTANT (rhs); + if (!ptr) +VERIFY_CONSTANT (lhs); location_t loc = EXPR_LOCATION (t); enum tree_code code = TREE_CODE (t); @@ -1634,7 +1639,8 @@ cxx_eval_binary_expression (const constexpr_ctx *ctx, tree t, } else if (cxx_eval_check_shift_p (loc, ctx, code, type, lhs, rhs)) *non_constant_p = true; - VERIFY_CONSTANT (r); + if (!ptr) +VERIFY_CONSTANT (lhs); return r; } @@ -2704,7 +2710,11 @@ cxx_eval_increment_expression (const constexpr_ctx *ctx, tree t, tree val = rvalue (op); val = cxx_eval_constant_expression (ctx, val, false, non_constant_p, overflow_p); - VERIFY_CONSTANT (val); + /* Don't VERIFY_CONSTANT if this might be dealing with a pointer to + a local array in a constexpr function. */ + bool ptr = POINTER_TYPE_P (TREE_TYPE (val)); + if (!ptr) +VERIFY_CONSTANT (val); /* The modified value. */ bool inc = (code == PREINCREMENT_EXPR || code == POSTINCREMENT_EXPR); @@ -2719,7 +2729,8 @@ cxx_eval_increment_expression (const constexpr_ctx *ctx, tree t, } else mod = fold_build2 (inc ? PLUS_EXPR : MINUS_EXPR, type, val, offset); - VERIFY_CONSTANT (mod); + if (!ptr) +VERIFY_CONSTANT (mod); /* Storing the modified value. */ tree store = build2 (MODIFY_EXPR, type, op, mod); diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C b/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C new file mode 100644 index 000..fd6143b --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-local2.C @@ -0,0 +1,22 @@ +// PR c++/64356 +// { dg-do compile { target c++14 } } + +typedef unsigned long size_t; + +template +constexpr size_t f(const char (&x)[N]) { + size_t s = 0; + for(size_t c : x) +s += c; + return s; +} + +template +constexpr size_t g(const char (&x)[N]) { + char y[N] = {0}; + for(size_t i = 0; i < N; ++i) +y[i] = x[i]; + return f(y); +} + +constexpr auto x = g(__DATE__); diff --git a/gcc/testsuite/g++.dg/cpp1y/pr63996.C b/gcc/testsuite/g++.dg/cpp1y/pr63996.C index d0bf9b5..8f66cdc 100644 --- a/gcc/testsuite/g++.dg/cpp1y/pr63996.C +++ b/gcc/testsuite/g++.dg/cpp1y/pr63996.C @@ -6,5 +6,5 @@ foo (int i) int a[i] = { }; // { dg-error "forbids variable length" } } -constexpr int j = foo (1); // { dg-error "is not a constant expression" } +constexpr int j = foo (1); // { dg-error "flows off the end" } diff --git a/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc b/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc index f3b4388..d57cf5c 100644 --- a/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc +++ b/libstdc++-v3/testsuite/experimental/optional/constexpr/make_optional.cc @@ -1,7 +1,4 @@ // { dg-options "-std=gnu++14" } -// XFAIL pending resolution of PR libstdc++/58777 -// { dg-do compile { xfail *-*-* } } -// { dg-excess-errors "" } // Copyright (C) 2013-2015 Free Software Foundation, Inc. // diff --git a/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc b/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc index 20c307d..803927b 100644 --- a/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc +++ b/libstdc++-v3/testsuite/experimental/optional/constexpr/observers/2.cc @@ -1,7 +1,4 @@ // { dg-options "-std=gnu++14" } -// XFAIL pending resolution of PR libstdc++/58777 -// { dg-do compile { xfail *-*-*
Re: [arm][patch] fix arm_neon_ok check on !arm_arch7
On 12/01/15 13:50, Ramana Radhakrishnan wrote: In principle ok, but I'd like a comment in there explaining why we've done this. Can you also post under what configurations these have been tested ? Is this better? I tested it by running the vect.exp tests with a variety of -mcpu flags. Andrew 2015-01-13 Andrew Stubbs gcc/testsuite/ * lib/target-supports.exp (check_effective_target_arm_neon_ok_nocache): Don't try to test Neon on ARM architures before v7. Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 219058) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2565,6 +2565,11 @@ if { [check_no_compiler_messages_nocache arm_neon_ok object { #include "arm_neon.h" int dummy; + /* Avoid the case where a test adds -mfpu=neon, but the toolchain is + configured for -mcpu=arm926ej-s, for example. */ + #if __ARM_ARCH < 7 + #error Architecture too old for NEON. + #endif } "$flags"] } { set et_arm_neon_flags $flags return 1
Fix ICE in inline_small_functions
Hi, this patch fixes ICE in inline_small_functions checking consistency of the growth cache. When removing speculation the cache needs to be reset because the frequencies of edges change. Bootstrapped/regtested x86_64-linux, comitted. Honza PR ipa/64565 * g++.dg/torture/pr64565.C: New testcase. * ipa-inline.c (inline_small_functions): Update callee keys after resolving speculation (inline_small_functions): Always check monotonicity of the queue. Index: testsuite/g++.dg/torture/pr64565.C === --- testsuite/g++.dg/torture/pr64565.C (revision 0) +++ testsuite/g++.dg/torture/pr64565.C (revision 0) @@ -0,0 +1,89 @@ +/* { dg-do compile } */ +typedef enum +{ + NS_OK +} nsresult; +struct A +{ + static int kIID; +}; +class B +{ +}; +class C +{ +public: + C (B p1) { m_fn1 (p1, A::kIID); } + void m_fn1 (B, int); +}; +class D; +class F +{ +public: + F (int); +}; +class G +{ +public: + D *operator[](int); +}; +class H +{ + virtual nsresult m_fn2 (); + +public: + void m_fn3 (); +}; +class J : H +{ + G mQueries; + int mLiveUpdate; + nsresult m_fn2 (); +}; +class D +{ +public: + nsresult m_fn4 (int); + void m_fn5 (int); +}; +class I +{ +public: + static I * + m_fn6 () + { +B __trans_tmp_3; +if (!gHistoryService) + C serv = __trans_tmp_3; + } + void m_fn7 (); + static I *gHistoryService; +}; +D *Refresh___trans_tmp_2; +D Refresh___trans_tmp_6, Refresh___trans_tmp_5; +int Refresh_hasDomain; +nsresult +J::m_fn2 () +{ + m_fn3 (); + I history = *I::m_fn6 (); + switch (mLiveUpdate) +{ +case 1: + { +mQueries[0]; +F query = 0; +if (Refresh_hasDomain) + return NS_OK; + } +case 0: + { +Refresh___trans_tmp_2 = mQueries[0]; +F query = Refresh___trans_tmp_5.m_fn4 (0); +history.m_fn7 (); +Refresh___trans_tmp_6.m_fn5 (0); + } +case 3: + m_fn2 (); +} +} Index: ipa-inline.c === --- ipa-inline.c(revision 219452) +++ ipa-inline.c(working copy) @@ -1626,6 +1626,8 @@ inline_small_functions (void) reset_edge_caches (where); update_caller_keys (&edge_heap, where, updated_nodes, NULL); + update_callee_keys (&edge_heap, where, + updated_nodes); bitmap_clear (updated_nodes); } } @@ -1661,7 +1663,7 @@ inline_small_functions (void) /* Disable checking for profile because roundoff errors may cause slight deviations in the order. */ gcc_assert (max_count || cached_badness == current_badness); - gcc_assert (max_count || current_badness >= badness); + gcc_assert (current_badness >= badness); #else current_badness = edge_badness (edge, false); #endif
[PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR
Hello. As pointed out in the following PR64307, IPA ICF is missing support for aforementioned TREE types. Apart from that, the patch also fixes BIT_FIELD_REF, which was broken. Tested on x86_64-linux-pc and no regression has been seen. Ready for trunk? Thanks, Martin >From 036a27ec12ad556160cb92a86fd99ed02b66ed1a Mon Sep 17 00:00:00 2001 From: mliska Date: Tue, 13 Jan 2015 18:52:22 +0100 Subject: [PATCH] IPA ICF: handle IMAGPART_EXPR and REALPART_EXPR. gcc/testsuite/ChangeLog: 2015-01-13 Martin Liska * gcc.dg/ipa/pr64307.c: New test. gcc/ChangeLog: 2015-01-13 Martin Liska * ipa-icf-gimple.c (func_checker::compare_operand): Add support for IMAGPART_EXPR and REALPART_EXPR and fix BIT_FIELD_REF comparison. --- gcc/ipa-icf-gimple.c | 14 +- gcc/testsuite/gcc.dg/ipa/pr64307.c | 32 2 files changed, 45 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.dg/ipa/pr64307.c diff --git a/gcc/ipa-icf-gimple.c b/gcc/ipa-icf-gimple.c index ed3cdf5..05c2a23 100644 --- a/gcc/ipa-icf-gimple.c +++ b/gcc/ipa-icf-gimple.c @@ -448,6 +448,8 @@ func_checker::compare_operand (tree t1, tree t2) return return_with_debug (ret); } +case IMAGPART_EXPR: +case REALPART_EXPR: case ADDR_EXPR: { x1 = TREE_OPERAND (t1, 0); @@ -458,7 +460,17 @@ func_checker::compare_operand (tree t1, tree t2) } case BIT_FIELD_REF: { - ret = compare_decl (t1, t2); + x1 = TREE_OPERAND (t1, 0); + x2 = TREE_OPERAND (t2, 0); + y1 = TREE_OPERAND (t1, 1); + y2 = TREE_OPERAND (t2, 1); + z1 = TREE_OPERAND (t1, 2); + z2 = TREE_OPERAND (t2, 2); + + ret = compare_operand (x1, x2) + && compare_cst_or_decl (y1, y2) + && compare_cst_or_decl (z1, z2); + return return_with_debug (ret); } case SSA_NAME: diff --git a/gcc/testsuite/gcc.dg/ipa/pr64307.c b/gcc/testsuite/gcc.dg/ipa/pr64307.c new file mode 100644 index 000..e8c1b02 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ipa/pr64307.c @@ -0,0 +1,32 @@ +/* { dg-do compile } */ +/* { dg-options "-O0 -fipa-icf -fdump-ipa-icf" } */ + +#include + +typedef _Complex float COMPLEX_FLOAT; + +__attribute__ ((noinline)) +static float real_part(COMPLEX_FLOAT a) +{ + return *(float*)(&a); +} + +__attribute__ ((noinline)) +static float real_part_2(COMPLEX_FLOAT a) +{ + return ((float*)(&a))[0]; +} + +int main() +{ + COMPLEX_FLOAT f = 1.0f + _Complex_I; + + float r1 = real_part(f); + float r2 = real_part(f); + + return r1 - r2; +} + +/* { dg-final { scan-ipa-dump "Semantic equality hit:real_part_2->real_part" "icf" } } */ +/* { dg-final { scan-ipa-dump "Equal symbols: 1" "icf" } } */ +/* { dg-final { cleanup-ipa-dump "icf" } } */ -- 2.1.2
C++ PATCH for c++/64514 (zero-length fixed parameter pack)
The code I added for dealing with fixed parameter packs wasn't dealing properly with zero-length packs: if there were any elements we would see the pack expansion argument and punt, but we need to do that in the zero-length case as well. Tested x86_64-pc-linux-gnu, applying to trunk and 4.9. commit 8fa00de341538399cd675d50a6b1cb86bda39159 Author: Jason Merrill Date: Mon Jan 12 22:00:26 2015 -0500 PR c++/64514 * pt.c (coerce_template_parameter_pack): Return NULL for a zero-length fixed parameter pack with a pack expansion arg. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 3ac93db..55871e5 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -6825,6 +6825,9 @@ coerce_template_parameter_pack (tree parms, if (invalid_nontype_parm_type_p (t, complain)) return error_mark_node; } + /* We don't know how many args we have yet, just + use the unconverted ones for now. */ + return NULL_TREE; } packed_args = make_tree_vec (TREE_VEC_LENGTH (packed_parms)); diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic165.C b/gcc/testsuite/g++.dg/cpp0x/variadic165.C new file mode 100644 index 000..862931f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/variadic165.C @@ -0,0 +1,17 @@ +// PR c++/64514 +// { dg-do compile { target c++11 } } + +template +struct Functor +{ +template +struct Inner +{}; +}; + +template struct Functor<>::Inner<>; + +int main() +{ + +}
Re: [testsuite] PATCH: Check if -pg available
On Tue, Jan 13, 2015 at 11:27 AM, Jeff Law wrote: > On 01/13/15 05:54, H.J. Lu wrote: >> >> On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote: >>> >>> On 01/12/15 14:51, Magnus Granberg wrote: måndag 12 januari 2015 12.11.17 skrev H.J. Lu: > > On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law wrote: >> >> On 01/12/15 12:59, H.J. Lu wrote: >>> >>> I don't know if -pg will work PIE on any targets. For Linux/x86 >>> the choices of crt1.o are >>> >>> %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} >>> >>> -shared, -pg and -pie are mutually exclusive. Those crt1 files are >>> only crt1 files provided by glibc. You can't even try -pg -pie on >>> Linux without changing glibc. >> >> >> You're totally missing the point. What I care about is *why*. >> With -pg it use gcrt1.o object file and that file is not compile with -fPIC. When you build a shared lib on x86_64 all the objects files need to be buiit with -fPIC else you get a error like that one abow and it is the same problems when you build bin with -fPIE and linke with -pie. Glibc do not provide one that is compile with -fPIC >>> >>> Is there some reason why glibc could not provide gcrt1.o compiled with >>> -fPIC? >>> >>> >> >> Here is a patch to check if -pg is available. If -pg doesn't link, >> profiling isn't available. OK for trunk? > > OK with a suitable ChangeLog entry. > Unfortunately it doesn't work with make check-gcc RUNTESTFLAGS="--target_board='unix{-m32,-m64}' since check_profiling_available result is cached. It is wrong to use the cached result from -m32 for -m64. Here is the updated patch. Tested with RUNTESTFLAGS="--target_board='unix{-m32,-m64}'. OK for trunk? Thanks. -- H.J. * lib/target-supports.exp (check_profiling_available): Check if -pg links. --- gcc/testsuite/lib/target-supports.exp | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 0ac9646..61bff53 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -549,7 +549,16 @@ proc check_profiling_available { test_what } { } } -return $profiling_available_saved +# -pg link test result can't be cached since it may change between +# runs. +set profiling_working $profiling_available_saved +if { $profiling_available_saved == 1 + && ![check_no_compiler_messages_nocache profiling executable { + int main() { return 0; } } "-pg"] } { + set profiling_working 0 +} + +return $profiling_working } # Check to see if a target is "freestanding". This is as per the definition -- 1.9.3
C++ PATCH for c++/64520 (ICE with std::initializer_list)
The special rules for deduction of std::initializer_list don't support a pack expansion, but we shouldn't crash. Tested x86_64-pc-linux-gnu, applying to trunk. commit a1607c38aaf6e04c2a601ee78dca67984e179986 Author: Jason Merrill Date: Mon Jan 12 13:27:20 2015 -0500 PR c++/64520 * pt.c (unify): Don't try to deduce to std::initializer_list. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index d8652fb..3ac93db 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -17854,7 +17854,13 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, if (TREE_CODE (parm) == ARRAY_TYPE) elttype = TREE_TYPE (parm); else - elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0); + { + elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0); + /* Deduction is defined in terms of a single type, so just punt + on the (bizarre) std::initializer_list. */ + if (PACK_EXPANSION_P (elttype)) + return unify_success (explain_p); + } FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg), i, elt) { diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist91.C b/gcc/testsuite/g++.dg/cpp0x/initlist91.C new file mode 100644 index 000..1387557 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/initlist91.C @@ -0,0 +1,8 @@ +// PR c++/64520 +// { dg-do compile { target c++11 } } + +#include +struct A { + template A(std::initializer_list); +}; +A a { 0 }; // { dg-error "" }
Re: flatten expr.h (version 2)
On 13 January 2015 at 22:02, Prathamesh Kulkarni wrote: > On 13 January 2015 at 16:06, Prathamesh Kulkarni > wrote: >> On 13 January 2015 at 15:34, Richard Biener wrote: >>> On Sun, 11 Jan 2015, Prathamesh Kulkarni wrote: >>> Hi, This is a revamped expr.h flattening flattening patch rebased on tree.h and tree-core.h flattening patch (r219402). It depends upon the following patch to get committed. https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00565.html Changes: * Removed all includes except tree-core.h. Put includes required by expr.h in a comment. * Moved stmt.c, expmed.c prototypes to stmt.h, expmed.h respectively. * Adjusted generator programs: genemit.c, gengtype.c, genopinit.c, genoutput.c. * Did not put includes in gcc-plugin.h since expr.h cannot be included by plugins (putting them broke building a file in c-family/ since expr.h is not allowed in front-ends) * Affects java front-end (expr.h is allowed in java front-end). Bootstrapped and tested on x86_64-unknown-linux-gnu with languages: all,go,ada,jit Built on all targets in config-list.mk with languages: all, go. OK to commit ? >>> >>> diff --git a/gcc/expr.c b/gcc/expr.c >>> index fc22862..824541e 100644 >>> --- a/gcc/expr.c >>> +++ b/gcc/expr.c >>> @@ -41,11 +41,17 @@ along with GCC; see the file COPYING3. If not see >>> #include "regs.h" >>> #include "hard-reg-set.h" >>> #include "except.h" >>> -#include "input.h" >>> #include "function.h" >>> #include "insn-config.h" >>> #include "insn-attr.h" >>> /* Include expr.h after insn-config.h so we get HAVE_conditional_move. >>> */ >>> +#include "hashtab.h" >>> +#include "emit-rtl.h" >>> +#include "expmed.h" >>> +#include "stmt.h" >>> +#include "statistics.h" >>> +#include "real.h" >>> +#include "fixed-value.h" >>> #include "expr.h" >>> >>> Please move the comment to the proper place >> ah, my flattening tool doesn't look at comments. I will move the >> comment before expr.h include, thanks. >>> >>> diff --git a/gcc/expr.h b/gcc/expr.h >>> index a7638b8..f1be8dc 100644 >>> --- a/gcc/expr.h >>> +++ b/gcc/expr.h >>> @@ -20,7 +20,8 @@ along with GCC; see the file COPYING3. If not see >>> #ifndef GCC_EXPR_H >>> #define GCC_EXPR_H >>> >>> -/* For inhibit_defer_pop */ >>> +/* expr.h required includes */ >>> +#if 0 >>> #include "hashtab.h" >>> #include "hash-set.h" >>> #include "vec.h" >>> @@ -29,15 +30,17 @@ along with GCC; see the file COPYING3. If not see >>> #include "hard-reg-set.h" >>> #include "input.h" >>> #include "function.h" >>> -/* For XEXP, GEN_INT, rtx_code */ >>> #include "rtl.h" >>> -/* For optimize_size */ >>> #include "flags.h" >>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, >>> - ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ >>> #include "tree-core.h" >>> -/* For GET_MODE_BITSIZE, word_mode */ >>> #include "insn-config.h" >>> +#include "alias.h" >>> +#include "emit-rtl.h" >>> +#include "expmed.h" >>> +#include "stmt.h" >>> +#endif >>> >>> Err, please remove the #if 0 section >> I kept it because if something breaks later (hopefully not!), it will >> be easier to fix. >> I will remove it. >>> >>> + >>> +#include "tree-core.h" >>> >>> Why? The original comment says >>> >>> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, >>> - ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ >>> >>> but all those are declared in tree.h. Which means the files including >>> expr.h must already include tree.h. >>> >>> If that's not the reason we need to include tree-core.h from expr.c >>> please add a comment explaining why. >> bt-load.c fails to compile because it includes expr.h but does not >> include tree.h >> I will place tree.h include in all files that include expr.h and rebuild. > This is not going to work, since tree.h is now flattened. Shall also > require including all headers required by > tree.h in all files that include expr.h. Could we retain tree-core.h > in expr.h for now ? > Or should I insert tree.h (along with tree.h required includes) in all > files that include expr.h ? I am including tree.h along with required includes in all files that include expr.h. This removes all includes from expr.h. Thanks, Prathamesh > > Thanks, > Prathamesh >>> >>> -/* Definitions from emit-rtl.c */ >>> -#include "emit-rtl.h" >>> - >>> /* Return a memory reference like MEMREF, but with its mode widened to >>> MODE and adjusted by OFFSET. */ >>> extern rtx widen_memory_access (rtx, machine_mode, HOST_WIDE_INT); >>> >>> err - functions defined in emit-rtl.c should be declared in emit-rtl.h. >>> Please fix that first. expr.h should _only_ contain prototypes >>> for stuff defined in expr.c. >> oops, missed it :( >>> >>> Andrew did a good job with this, first cleaning up a header moving >>> declarations to proper places and only after that flattening it. >>> >>> The rest of the patch looks good t
RE: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
On Tue, 13 Jan 2015, Matthew Fortune wrote: > > >> I have tested this for both mips and micromips, and the tests now > > >> pass successfully. > > >> The ChangeLog and patch are below. > > > > > > Hmm, instead of trying to avoid testing microMIPS code generation > > > just to satisfy the test suite I'd rather see the test cases updated > > > so that LWM/SWM register ranges are expected and accepted whenever > > > microMIPS code is produced. These scan patterns can be made > > conditional. > > > > FWIW I think Andrew's patch is correct. If we want to test microMIPS > > output against micromips-specific regexps, we should add a separate test > > that forces micromips, so that it gets tested regardless of people's > > RUNTESTFLAGS. Doing that shouldn't hold up Andrew's patch though. Taking care that the default compilation mode does not conflict (e.g. MIPS16, incompatible) and taking any exceptions into account (e.g. n64, unsupported) I presume, right? > > Whereever possible gcc.target/mips should not have conditional dg- > > finals. OK, if that's been the policy. > I was going to suggest a follow up patch to add copies of the three tests > as Richard suggests. I haven't yet done a micromips run of the testsuite > to check for any other issues like this but I suspect problems are limited > to the tests that I recently added. Please always try to test changes reasonably, i.e. at least o32, o32/MIPS16, o32/microMIPS, n32, n64, and then Linux and ELF if applicable, plus any options that may be relevant, unless it is absolutely clear ABI/ISA variations do not matter for a change proposed. Running regression tests is just processing time after all (cheap!), you don't have to spend your brain cycles on it (expensive!). Maciej
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On Tue, Jan 13, 2015 at 12:45:27PM -0700, Jeff Law wrote: > On 01/13/15 09:38, Segher Boessenkool wrote: > >On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote: > >>3) on request from Richard (which Segher on IRC argues against), "memory" > >>clobber also prevents CSE; > > > >As extend.texi used to say: > > > >" > >If your assembler instructions access memory in an unpredictable > >fashion, add @samp{memory} to the list of clobbered registers. This > >causes GCC to not keep memory values cached in registers across the > >assembler instruction and not optimize stores or loads to that memory. > >You also should add the @code{volatile} keyword if the memory > >affected is not listed in the inputs or outputs of the @code{asm}, as > >the @samp{memory} clobber does not count as a side-effect of the > >@code{asm}. > >" > > > >so a "memory" clobber in a non-volatile asm should not prevent CSE. > My reading of that paragraph is somewhat different. > > The key here is the memory clobber affects optimization of instructions > around the asm while the volatile specifier affects the optimization of the > ASM itself. > > A memory clobber must inhibit CSE of memory references on either side of the > asm because the asm must be assumed to read or write memory in unpredictable > ways. > > The volatile specifier tells the compiler that the asm itself must be > preserved, even if dataflow shows the outputs as not used. That is not necessarily in conflict. My reading of Jeff's comment is that in int a; int foo (void) { int b, c, d, e; b = a; asm ("..." : "=r" (c) : : "memory"); d = a; asm ("..." : "=r" (e) : : "memory"); return b + d + 2 * (c + e); } we are not allowed to CSE d = a; into d = b;. CSE invalidate_from_clobbers should ensure that already, even when we don't do anything special about "memory" clobber in the patch. Another thing is if there is a store in between the two non-volatile asms with "memory" clobber, here I'm not sure if with the alternate patch we'd treat the "memory" clobber as use of everything previously stored into memory (in this regard the posted version is safe). And finally there is the case of non-volatile asm with "memory" clobber with no memory stores in between the two - the posted (safer) patch will not allow to CSE the two, while in theory we could CSE them into just one asm. Jakub
RE: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
Richard Sandiford writes: > "Maciej W. Rozycki" writes: > > On Tue, 13 Jan 2015, Andrew Bennett wrote: > > > >> The call-saved-{4-6}.c tests in the mips testsuite fail for > micromips. > >> The reason is > >> that micromips uses the swm and lwm instructions to save/restore the > >> call-saved registers rather than using the sw and lw instructions. > >> The swm and lwm instructions only list the range of registers to use > >> ie. $16-$25 and hence some of the scan-assembler patterns fail. This > >> fix adds the NO_COMPRESSION attribute to the foo function to force > >> the tests to always compile as mips. > >> > >> I have tested this for both mips and micromips, and the tests now > >> pass successfully. > >> The ChangeLog and patch are below. > > > > Hmm, instead of trying to avoid testing microMIPS code generation > > just to satisfy the test suite I'd rather see the test cases updated > > so that LWM/SWM register ranges are expected and accepted whenever > > microMIPS code is produced. These scan patterns can be made > conditional. > > FWIW I think Andrew's patch is correct. If we want to test microMIPS > output against micromips-specific regexps, we should add a separate test > that forces micromips, so that it gets tested regardless of people's > RUNTESTFLAGS. Doing that shouldn't hold up Andrew's patch though. > > Whereever possible gcc.target/mips should not have conditional dg- > finals. I was going to suggest a follow up patch to add copies of the three tests as Richard suggests. I haven't yet done a micromips run of the testsuite to check for any other issues like this but I suspect problems are limited to the tests that I recently added. I certainly agree that we shouldn't just ignore micromips expected output given it is pretty easy to test. Please go ahead and commit this patch so we clean up the test results for GCC 5 in case you (or anyone else) doesn't get to submitting the extra test cases before we hit stage 4. Thanks, Matthew
Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
"Maciej W. Rozycki" writes: > On Tue, 13 Jan 2015, Andrew Bennett wrote: > >> The call-saved-{4-6}.c tests in the mips testsuite fail for micromips. >> The reason is >> that micromips uses the swm and lwm instructions to save/restore the >> call-saved registers >> rather than using the sw and lw instructions. The swm and lwm >> instructions only list >> the range of registers to use ie. $16-$25 and hence some of the >> scan-assembler >> patterns fail. This fix adds the NO_COMPRESSION attribute to the foo >> function to >> force the tests to always compile as mips. >> >> I have tested this for both mips and micromips, and the tests now pass >> successfully. >> The ChangeLog and patch are below. > > Hmm, instead of trying to avoid testing microMIPS code generation just to > satisfy the test suite I'd rather see the test cases updated so that > LWM/SWM register ranges are expected and accepted whenever microMIPS code > is produced. These scan patterns can be made conditional. FWIW I think Andrew's patch is correct. If we want to test microMIPS output against micromips-specific regexps, we should add a separate test that forces micromips, so that it gets tested regardless of people's RUNTESTFLAGS. Doing that shouldn't hold up Andrew's patch though. Whereever possible gcc.target/mips should not have conditional dg-finals. Thanks, Richard
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On 01/13/15 09:38, Segher Boessenkool wrote: On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote: 3) on request from Richard (which Segher on IRC argues against), "memory" clobber also prevents CSE; As extend.texi used to say: " If your assembler instructions access memory in an unpredictable fashion, add @samp{memory} to the list of clobbered registers. This causes GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory. You also should add the @code{volatile} keyword if the memory affected is not listed in the inputs or outputs of the @code{asm}, as the @samp{memory} clobber does not count as a side-effect of the @code{asm}. " so a "memory" clobber in a non-volatile asm should not prevent CSE. My reading of that paragraph is somewhat different. The key here is the memory clobber affects optimization of instructions around the asm while the volatile specifier affects the optimization of the ASM itself. A memory clobber must inhibit CSE of memory references on either side of the asm because the asm must be assumed to read or write memory in unpredictable ways. The volatile specifier tells the compiler that the asm itself must be preserved, even if dataflow shows the outputs as not used. eff
Re: [testsuite] PATCH: Check if -pg available
On 01/13/15 05:54, H.J. Lu wrote: On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote: On 01/12/15 14:51, Magnus Granberg wrote: måndag 12 januari 2015 12.11.17 skrev H.J. Lu: On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law wrote: On 01/12/15 12:59, H.J. Lu wrote: I don't know if -pg will work PIE on any targets. For Linux/x86 the choices of crt1.o are %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} -shared, -pg and -pie are mutually exclusive. Those crt1 files are only crt1 files provided by glibc. You can't even try -pg -pie on Linux without changing glibc. You're totally missing the point. What I care about is *why*. With -pg it use gcrt1.o object file and that file is not compile with -fPIC. When you build a shared lib on x86_64 all the objects files need to be buiit with -fPIC else you get a error like that one abow and it is the same problems when you build bin with -fPIE and linke with -pie. Glibc do not provide one that is compile with -fPIC Is there some reason why glibc could not provide gcrt1.o compiled with -fPIC? Here is a patch to check if -pg is available. If -pg doesn't link, profiling isn't available. OK for trunk? OK with a suitable ChangeLog entry. jeff
Re: [testsuite] PATCH: Add check_effective_target_pie
On 01/13/15 05:52, H.J. Lu wrote: On Mon, Jan 12, 2015 at 03:04:20PM -0700, Jeff Law wrote: On 01/12/15 14:51, Magnus Granberg wrote: måndag 12 januari 2015 12.11.17 skrev H.J. Lu: On Mon, Jan 12, 2015 at 12:03 PM, Jeff Law wrote: On 01/12/15 12:59, H.J. Lu wrote: I don't know if -pg will work PIE on any targets. For Linux/x86 the choices of crt1.o are %{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} -shared, -pg and -pie are mutually exclusive. Those crt1 files are only crt1 files provided by glibc. You can't even try -pg -pie on Linux without changing glibc. You're totally missing the point. What I care about is *why*. With -pg it use gcrt1.o object file and that file is not compile with -fPIC. When you build a shared lib on x86_64 all the objects files need to be buiit with -fPIC else you get a error like that one abow and it is the same problems when you build bin with -fPIE and linke with -pie. Glibc do not provide one that is compile with -fPIC Is there some reason why glibc could not provide gcrt1.o compiled with -fPIC? That is a good question. We can compile gcrt1.o with -fPIC and it will work with both -pg and -pg -pie. I will open a glibc bug. Thanks for getting the bug opened, there's a reasonable chance that we'll have the gcrt1.o we want in the not too distant future. Here is the updated patch without the check_profiling_available change. OK for trunk? Thanks. H.J. --- Subject: [PATCH 1/5] Add check_effective_target_pie Hi, This patch adds check_effective_target_pie to check if the current multilib generates PIE by default. Thanks. H.J. --- 2015-01-11 H.J. Lu * gcc.target/i386/pie.c: New test. * lib/target-supports.exp (check_effective_target_pie): New. OK. Jeff
Re: [PATCH] Fix ICE with -fgnu-tm and pragma ivdep (PR middle-end/64391)
On 01/13/15 09:28, Marek Polacek wrote: We ICE on this testcase, because the usage of #pragma GCC ivdep pulls in the ANNOTATE internal functions which don't have underlying fndecls, hence we segv on a NULL_TREE. This patch makes get_attrs_for be prepared for such a scenario. The callers of get_attrs_for already check for NULL_TREE. I don't think internal fns can have transaction_* attributes anyway. While at it, I did some cleanups. Bootstrapped/regtested on {ppc64,x86_64}-linux, ok for trunk? 2015-01-13 Marek Polacek PR middle-end/64391 * trans-mem.c (get_attrs_for): Return NULL_TREE if X is NULL_TREE. * gcc.dg/tm/pr64391.c: New test. OK. I looked briefly at perhaps catching this earlier in the call chain, but your approach looks best to me. Thanks, Jeff
Re: [PATCH] Fix REE for vector modes (PR rtl-optimization/64286, take 2)
On 01/13/15 09:11, Jakub Jelinek wrote: On Mon, Jan 12, 2015 at 02:29:53PM -0700, Jeff Law wrote: On 01/12/15 12:59, Jakub Jelinek wrote: Hi! As mentioned in the PR, giving up for all vector mode extensions is unnecessary, but unlike scalar integer extensions, where the low part of the extended value is the original value, for vectors this is not true, thus the old value is lost. Which means we can perform REE, but only if all uses of the definition are the same (code+mode) extension. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-01-12 Jakub Jelinek PR rtl-optimization/64286 * ree.c (add_removable_extension): Don't add vector mode extensions if all uses of the source register aren't the same vector extensions. * gcc.target/i386/avx2-pr64286.c: New test. Does it make sense to remove your change for 59754 in combine_reaching_defs? Shouldn't this patch handle that case as well? You're right, this patch handles that too. New patch, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-01-13 Jakub Jelinek PR rtl-optimization/64286 * ree.c (combine_reaching_defs): Move part of comment earlier, remove !SCALAR_INT_MODE_P check. (add_removable_extension): Don't add vector mode extensions if all uses of the source register aren't the same vector extensions. * gcc.target/i386/avx2-pr64286.c: New test. OK. Thanks for taking care of this. I can't seem to find time for doing any real debugging or bugfixing. jeff
Re: [PATCH] PR59448 - Promote consume to acquire
On 01/13/2015 01:38 PM, Torvald Riegel wrote: On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: On 01/13/2015 09:59 AM, Richard Biener wrote: On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod wrote: Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 Basically we can generate incorrect code for an atomic consume operation in some circumstances. The general feeling seems to be that we should simply promote all consume operations to an acquire operation until there is a better definition/understanding of the consume model and how GCC can track it. I proposed a simple patch in the PR, and I have not seen or heard of any dissenting opinion. We should get this in before the end of stage 3 I think. The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. This new patch simply makes the adjustment after any errors are checked on the originally specified model. It bootstraps on x86_64-unknown-linux-gnu and passes all regression testing. I also built an aarch64 compiler and it appears to issue the LDAR as specified in the PR, but anyone with a vested interest really ought to check it out with a real build to be sure. OK for trunk? Why not patch get_memmodel? (not sure if that catches all cases) Richard. That was the original patch. The issue is that it promotes consume to acquire before any error checking gets to look at the model, so then we allow illegal specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that. I imagine it was probably in a previous incarnation of the standard... Most of this was actually implemented based on very early draft standards years and years ago and never revised. It wasnt put in by me unless the standard at some point said had such wording. The current standard appears to make no mention of the situation. It seems that it should be safe to move back to the original patch, and remove that error test for using consume on an exchange... Andrew
Re: [PATCH] add option to emit more array bounds warnigs
On 01/13/15 10:34, Martin Uecker wrote: Mon, 12 Jan 2015 11:00:44 -0700 Jeff Law : On 11/11/14 23:13, Martin Uecker wrote: ... * gcc/tree-vrp.c (check_array_ref): Emit more warnings for warn_array_bounds >= 2. * gcc/testsuite/gcc.dg/Warray-bounds-11.c: New test-case. * gcc/c-family/c.opt: New option -Warray-bounds=. * gcc/common.opt: New option -Warray-bounds=. * gcc/doc/invoke.texi: Document new option. Has this patch been bootstrapped and regression tested, if so on what platform. x86_64-unknown-linux-gnu Approved. Please install on the trunk. Sorry about the delays. Thanks, Jeff
Re: [PATCH] Fix for PR64081 in RTL loop unroller
On 01/13/15 11:01, Zamyatin, Igor wrote: Is it really sufficient here to verify that all the defs are on latch predecessors, what about the case where there is a predecessor without a def. How do you guarantee domination in that case? ISTM that given the structure for the code you're writing that you'd want to verify that in the event of multiple definitions that all of them appear on immediate predecessors of the latch *and* that each immediate predecessor has a definition. Yes, do you think it's better to check exactly immediate predecessors? I'd use the same structure that you have in iv_get_reaching_def. If there was a reasonable way to factor that test into a single function and call it from both places that would be even better. Jeff
Re: [PATCH] PR59448 - Promote consume to acquire
On 01/13/15 11:38, Torvald Riegel wrote: On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: On 01/13/2015 09:59 AM, Richard Biener wrote: On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod wrote: Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 Basically we can generate incorrect code for an atomic consume operation in some circumstances. The general feeling seems to be that we should simply promote all consume operations to an acquire operation until there is a better definition/understanding of the consume model and how GCC can track it. I proposed a simple patch in the PR, and I have not seen or heard of any dissenting opinion. We should get this in before the end of stage 3 I think. The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. This new patch simply makes the adjustment after any errors are checked on the originally specified model. It bootstraps on x86_64-unknown-linux-gnu and passes all regression testing. I also built an aarch64 compiler and it appears to issue the LDAR as specified in the PR, but anyone with a vested interest really ought to check it out with a real build to be sure. OK for trunk? Why not patch get_memmodel? (not sure if that catches all cases) Richard. That was the original patch. The issue is that it promotes consume to acquire before any error checking gets to look at the model, so then we allow illegal specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that. AFAICT that test has been there since the initial commit of sync-mem-invalid.c (which was later renamed to atomic-invalid). In fact, that was the only test initially in sync-mem-invalid.c commit 64d1dbf10e3f08305f4a8569e27fc2224f9074d2 Author: amacleod Date: Thu Jun 23 13:09:31 2011 + Basica tests for __sync_mem_exchange and framework for further additions. * lib/target-support.exp (check_effective_target_sync_int_128, check_effective_target_sync_long_long): Check whether the target supports 64 and 128 bit __sync builtins. * gcc.dg/sync-mem.h: New. Common code to check memory model __syncs. * gcc.dg/sync-mem-1.c: New. Check char size. * gcc.dg/sync-mem-2.c: New. Check short size. * gcc.dg/sync-mem-3.c: New. Check int size. * gcc.dg/sync-mem-4.c: New. Check long long. * gcc.dg/sync-mem-5.c: New. Check 128 bit. * gcc.dg/sync-mem-invalid.c: New. Check invalid memory modes. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/cxx-mem-model@175331 138bc75d-0d04-0410-961f-82ee72b054a4 Mostly hoping this refreshes Andrew's memory and he can provide some insight on why we test this particular combination and consider it invalid. I was kind of hoping that we'd track this down to something like a particular target didn't support this capability with the old sync builtins and we carried it into the atomics when we made that switch. I don't have a vested interest in either approach. I just want to see us DTRT. jeff
Re: [PATCH][rtlanal.c][BE][1/2] Fix vector load/stores to not use ld1/st1
> Sorry for the slow response. Jeff has approved the patch in the > meantime, but I didn't want to go ahead and apply it while there > was still disagreement... I still think that it isn't appropriate to short-circuit the main computation as the patch does, but I don't want to block it after Jeff's approval. > (1) we have a non-paradoxical subreg; > (2) both (reg:ymode xregno) and (reg:xmode xregno) occupy full > hard registers (no padding or unused upper bits); > (3) (reg:ymode xregno) and (reg:xmode xregno) store the same number > of bytes (X) in each constituent hard register; > (4) the offset is a multiple of X, i.e. the data we're accessing > is aligned to a register boundary; and > (5) endianness is regular (no differences between words and bytes, > or between registers and memory) OK, that's a nice translation of the new code. :-) It seems to me that the patch wants to extend the support of generic subregs to modes whose sizes are not multiple of each other, which is a requirement of the existing code, but does that in a very specific case for the sake of the ARM port without saying where all the above restrictions come from. -- Eric Botcazou
Re: PR54442 build_qualified_type produces a non-canonical type
Hi, On 06/09/2014 04:46 PM, Jason Merrill wrote: On 06/09/2014 10:32 AM, Marc Glisse wrote: On Mon, 9 Jun 2014, Jason Merrill wrote: On 06/09/2014 10:18 AM, Marc Glisse wrote: I doubt the patch can be wrong, but it may be that this is a situation that is not supposed to happen and should be fixed elsewhere? Seems likely. What is the difference between the type returned from build_qualified_type (TYPE_CANONICAL and it's TYPE_CANONICAL? I would expect them to be the same. throws >> (in what build_qualified_type returns) I guess that makes sense, given that the exception specification isn't really part of the type. The patch is OK. In fact, I noticed today that this is a 4.8/4.9 Regression too. Shall I try to apply the patchlet to 4_9-branch too and, if testing passes, commit there and close the bug? Thanks, Paolo.
Re: [PATCH] PR59448 - Promote consume to acquire
On Tue, 2015-01-13 at 10:11 -0500, Andrew MacLeod wrote: > On 01/13/2015 09:59 AM, Richard Biener wrote: > > On Tue, Jan 13, 2015 at 3:56 PM, Andrew MacLeod wrote: > >> Lengthy discussion : https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59448 > >> > >> Basically we can generate incorrect code for an atomic consume operation in > >> some circumstances. The general feeling seems to be that we should simply > >> promote all consume operations to an acquire operation until there is a > >> better definition/understanding of the consume model and how GCC can track > >> it. > >> > >> I proposed a simple patch in the PR, and I have not seen or heard of any > >> dissenting opinion. We should get this in before the end of stage 3 I > >> think. > >> > >> The problem with the patch in the PR is the memory model is immediately > >> promoted from consume to acquire. This happens *before* any of the > >> memmodel checks are made. If a consume is illegally specified (such as in > >> a > >> compare_exchange), it gets promoted to acquire and the compiler doesn't > >> report the error because it never sees the consume. > >> > >> This new patch simply makes the adjustment after any errors are checked on > >> the originally specified model. It bootstraps on x86_64-unknown-linux-gnu > >> and passes all regression testing. > >> I also built an aarch64 compiler and it appears to issue the LDAR as > >> specified in the PR, but anyone with a vested interest really ought to > >> check > >> it out with a real build to be sure. > >> > >> OK for trunk? > > Why not patch get_memmodel? (not sure if that catches all cases) > > > > Richard. > > > > > That was the original patch. > > The issue is that it promotes consume to acquire before any error > checking gets to look at the model, so then we allow illegal > specification of consume. (It actually triggers a failure in the testsuite) (This is this test: gcc/testsuite/gcc.dg/atomic-invalid.c) The documentation of the atomic builtins also disallows mo_consume on atomic_exchange. However, I don't see any such requirement in C11 or C++14 (and I'd be surprised to see it in C++11). It would be surprising also because for other atomic read-modify-write operations (eg, fetch_add), we don't make such a requirement in the builtins docs -- and atomic_exchange is just a read-modify-write with a noop, basically. Does anyone remember why this requirement for no consume on exchange was added, or sees a reason to keep it? If not, I think we should drop it. This would solve the testsuite failure for Andrew. Dropping it would prevent GCC from checking the consume-on-success / acquire-on-failure case for compare_excahnge I mentioned previously, but I think that this is pretty harmless. I could imagine that, for some reason, either backends or libatomic do not implement consume on atomic_exchange just because the docs disallowed it -- but I haven't checked that.
Re: [PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
On Tue, 13 Jan 2015, Andrew Bennett wrote: > The call-saved-{4-6}.c tests in the mips testsuite fail for micromips. The > reason is > that micromips uses the swm and lwm instructions to save/restore the > call-saved registers > rather than using the sw and lw instructions. The swm and lwm instructions > only list > the range of registers to use ie. $16-$25 and hence some of the > scan-assembler > patterns fail. This fix adds the NO_COMPRESSION attribute to the foo > function to > force the tests to always compile as mips. > > I have tested this for both mips and micromips, and the tests now pass > successfully. > The ChangeLog and patch are below. Hmm, instead of trying to avoid testing microMIPS code generation just to satisfy the test suite I'd rather see the test cases updated so that LWM/SWM register ranges are expected and accepted whenever microMIPS code is produced. These scan patterns can be made conditional. Maciej
[PATCH][AArch64 Intrinsics] Replace temporary assembler for vst1_lane
Nowadays, just storing the (bigendian-corrected) vector element to the address, generates exactly the same assembler for all cases except {float,int,uint}64x1_t, where st1 {v0.d}[0], [x0] becomes str d0, [x0] This is not a problem, and the change will be much better for optimization through the midend, as well as making use of previous improvements in error reporting. Also move the /* vst1q */ comment, which was a couple intrinsics too late. gcc/ChangeLog: * config/aarch64/arm_neon.h (vst1_lane_f32, vst1_lane_f64, vst1_lane_p8, vst1_lane_p16, vst1_lane_s8, vst1_lane_s16, vst1_lane_s32, vst1_lane_s64, vst1_lane_u8, vst1_lane_u16, vst1_lane_u32, vst1_lane_u64, vst1q_lane_f32, vst1q_lane_f64, vst1q_lane_p8, vst1q_lane_p16, vst1q_lane_s8, vst1q_lane_s16, vst1q_lane_s32, vst1q_lane_s64, vst1q_lane_u8, vst1q_lane_u16, vst1q_lane_u32, vst1q_lane_u64): Reimplement with pointer dereference and __aarch64_vget_lane_any. Cross-tested check-gcc on aarch64-none-elf and aarch64_be-none-elf. Ok for trunk? Cheers, Alancommit 926aec661699e52f617f16068075ef0242a43609 Author: Alan Lawrence Date: Thu Dec 11 17:29:54 2014 + Replace temporary inline assembler for vst1_lane, move /* vst1q */ comment. Note for (float|u?int)64x1 vectors, st1 {v0.d}[0], [x0] becomes str d0, [x0] diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index 3d1bcd5..980490f 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -10304,272 +10304,6 @@ vrsqrtss_f32 (float32_t a, float32_t b) result; \ }) -#define vst1_lane_f32(a, b, c) \ - __extension__ \ -({ \ - float32x2_t b_ = (b);\ - float32_t * a_ = (a);\ - __asm__ ("st1 {%1.s}[%2],[%0]" \ -: \ -: "r"(a_), "w"(b_), "i"(c) \ -: "memory");\ - }) - -#define vst1_lane_f64(a, b, c) \ - __extension__ \ -({ \ - float64x1_t b_ = (b);\ - float64_t * a_ = (a);\ - __asm__ ("st1 {%1.d}[%2],[%0]" \ -: \ -: "r"(a_), "w"(b_), "i"(c) \ -: "memory");\ - }) - -#define vst1_lane_p8(a, b, c) \ - __extension__ \ -({ \ - poly8x8_t b_ = (b); \ - poly8_t * a_ = (a); \ - __asm__ ("st1 {%1.b}[%2],[%0]" \ -: \ -: "r"(a_), "w"(b_), "i"(c) \ -: "memory");\ - }) - -#define vst1_lane_p16(a, b, c) \ - __extension__ \ -({ \ - poly16x4_t b_ = (b); \ - poly16_t * a_ = (a); \ - __asm__ ("st1 {%1.h}[%2],[%0]" \ -: \ -: "r"(a_), "w"(b_), "i"(c) \ -: "memory");\ - }) - -#define vst1_lane_s8(a, b, c) \ - __extension__ \ -({ \ - int8x8_t b_ = (b); \ - int8_t * a_ = (a); \ - __asm__ ("st1 {%1.b}[%2],[%0]" \ -:
RE: [PATCH] Fix for PR64081 in RTL loop unroller
> > Is it really sufficient here to verify that all the defs are on latch > predecessors, > what about the case where there is a predecessor without a def. How do > you guarantee domination in that case? > > ISTM that given the structure for the code you're writing that you'd want to > verify that in the event of multiple definitions that all of them appear on > immediate predecessors of the latch *and* that each immediate > predecessor has a definition. Yes, do you think it's better to check exactly immediate predecessors? > > - > > - if (!just_once_each_iteration_p (current_loop, DF_REF_BB (adef))) > > - return false; > > + { > > + def_num++; > > + if (!(def_pred_latch = def_pred_latch_p (adef)) > > + || !rtx_equal_p( PATTERN (DF_REF_INSN (single_rd)), > > Whitespace nit here. Whitespace goes before the open paren for the > function call, not after. Thanks for catching this! > > > > @@ -351,10 +384,10 @@ latch_dominating_def (rtx reg, df_ref *def) > > static enum iv_grd_result > > iv_get_reaching_def (rtx_insn *insn, rtx reg, df_ref *def) > > And in this routine, you appear to do both checks. ie, each def is on an > immediate predecessor and each immediate predecessor has a def. Is there > some reason why iv_get_reaching_def has the stronger check while > latch_dominating_def does not? Looks like I was sure that latch_dominating_def always goes after iv_get_reaching_def but now I see it is not true. Will add another check in latch_dominating_def. Thanks, Igor > > jeff
Re: shift/extract SHIFT_COUNT_TRUNCATED combine bug
On Tue, Jan 13, 2015 at 10:51:27AM +0100, Richard Biener wrote: > IMHO SHIFT_COUNT_TRUNCATED should be removed and instead > backends should provide shift patterns with a (and:QI ...) for the > shift amount which simply will omit that operation if suitable. Note that that catches less though, e.g. in int f(int x, int n) { return x << ((2*n) & 31); } without SHIFT_COUNT_TRUNCATED it will try to match an AND with 30, not with 31. Segher
Re: [PATCH] add option to emit more array bounds warnigs
Mon, 12 Jan 2015 11:00:44 -0700 Jeff Law : > On 11/11/14 23:13, Martin Uecker wrote: ... > > > > > > * gcc/tree-vrp.c (check_array_ref): Emit more warnings > > for warn_array_bounds >= 2. > > * gcc/testsuite/gcc.dg/Warray-bounds-11.c: New test-case. > > * gcc/c-family/c.opt: New option -Warray-bounds=. > > * gcc/common.opt: New option -Warray-bounds=. > > * gcc/doc/invoke.texi: Document new option. > Has this patch been bootstrapped and regression tested, if so on what > platform. x86_64-unknown-linux-gnu > Given the new warnings (as implemented by the patch) are not enabled by > default, I'm inclined to approve once Martin verifies things via > bootstrap and regression test. Thank you, Martin
[PATCH] Allow MIPS call-saved-{4-6}.c tests to correctly run for micromips
Hi, The call-saved-{4-6}.c tests in the mips testsuite fail for micromips. The reason is that micromips uses the swm and lwm instructions to save/restore the call-saved registers rather than using the sw and lw instructions. The swm and lwm instructions only list the range of registers to use ie. $16-$25 and hence some of the scan-assembler patterns fail. This fix adds the NO_COMPRESSION attribute to the foo function to force the tests to always compile as mips. I have tested this for both mips and micromips, and the tests now pass successfully. The ChangeLog and patch are below. Ok to commit? Many thanks, Andrew testsuite/ * gcc.target/mips/call-saved-4.c: Add NO_COMPRESSION attribute. * gcc.target/mips/call-saved-5.c: Likewise. * gcc.target/mips/call-saved-6.c: Likewise. diff --git a/gcc/testsuite/gcc.target/mips/call-saved-4.c b/gcc/testsuite/gcc.target/mips/call-saved-4.c index 846ea32..92881c4 100644 --- a/gcc/testsuite/gcc.target/mips/call-saved-4.c +++ b/gcc/testsuite/gcc.target/mips/call-saved-4.c @@ -3,7 +3,7 @@ void bar (void); -void +NOCOMPRESSION void foo (int x) { __builtin_unwind_init (); diff --git a/gcc/testsuite/gcc.target/mips/call-saved-5.c b/gcc/testsuite/gcc.target/mips/call-saved-5.c index 2937b31..152b28f 100644 --- a/gcc/testsuite/gcc.target/mips/call-saved-5.c +++ b/gcc/testsuite/gcc.target/mips/call-saved-5.c @@ -3,7 +3,7 @@ void bar (void); -void +NOCOMPRESSION void foo (int x) { __builtin_unwind_init (); diff --git a/gcc/testsuite/gcc.target/mips/call-saved-6.c b/gcc/testsuite/gcc.target/mips/call-saved-6.c index 0d1a4c8..a384d4a 100644 --- a/gcc/testsuite/gcc.target/mips/call-saved-6.c +++ b/gcc/testsuite/gcc.target/mips/call-saved-6.c @@ -3,7 +3,7 @@ void bar (void); -void +NOCOMPRESSION void foo (int x) { __builtin_unwind_init ();
Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics
On 09/12/14 08:17, Yangfei (Felix) wrote: On 28 November 2014 at 09:23, Yangfei (Felix) wrote: Hi, This patch converts vpmaxX & vpminX intrinsics to use builtin functions instead of the previous inline assembly syntax. Regtested with aarch64-linux-gnu on QEMU. Also passed the glorious testsuite of Christophe Lyon. OK for the trunk? Hi Felix, We know from experience that the advsimd intrinsics tend to be fragile for big endian and in general it is fairly easy to break the big endian case. For these advsimd improvements that you are working on (that we very much appreciate) it is important to run both little endian and big endian regressions. Thanks /Marcus Okay. Any plan for the advsimd big-endian improvement? I rebased this patch over Alan Lawrance's patch: https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html No regressions for aarch64_be-linux-gnu target too. OK for the thunk? Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 218464) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,18 @@ +2014-12-09 Felix Yang + + * config/aarch64/aarch64-simd.md (aarch64_p): New + pattern. + * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp, + uminp, smax_nanp, smin_nanp): New builtins. + * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32, + vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32, + vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64, + vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64, + vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8, + vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8, + vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64, + vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64, + __extension__ static __inline float32x2_t __attribute__ ((__always_inline__)) Index: gcc/config/aarch64/aarch64-simd.md === --- gcc/config/aarch64/aarch64-simd.md (revision 218464) +++ gcc/config/aarch64/aarch64-simd.md (working copy) @@ -1017,6 +1017,28 @@ DONE; }) +;; Pairwise Integer Max/Min operations. +(define_insn "aarch64_p" + [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w") + (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand" "w") +(match_operand:VDQ_BHSI 2 "register_operand" "w")] + MAXMINV))] + "TARGET_SIMD" + "p\t%0., %1., %2." + [(set_attr "type" "neon_minmax")] +) + Hi Felix, Sorry for the delay in getting back to you on this. If you've rolled aarch64_reduc__internalv2si into the above pattern, do you still need it? For all its call points, just point them to aarch64_p? Thanks, Tejas.
Re: [committed] Update copyright years, part 2
On Tue, Jan 13, 2015 at 05:06:35PM +, Richard Sandiford wrote: > Jakub Jelinek writes: > > Patch too large to attach uncompressed, this > > has been created with update-copyright.py --this-year. > > Note, I had to temporarily move away gcc/jit/docs/conf.py, > > the python script dies on that and leaves almost all files unchanged. > > Thanks for doing the update. Is the patch below OK to fix the JIT thing? > After this change, update-copyright.py --this-year seems to update > gcc/jit correctly (including the texinfo files). > > Richard > > > contrib/ > * update-copyright.py (Copyright.__init__): Add a regexp for > "copyright = u'". > (Copyright.update_copyright): Don't add a space before the year > in that case. Ok, thanks. > --- contrib/update-copyright.py 2014-08-05 10:29:02.695491816 +0100 > +++ contrib/update-copyright.py 2015-01-13 14:13:43.500812967 + > @@ -183,6 +183,7 @@ class Copyright: > '|[Cc]opyright\s+%s' > '|[Cc]opyright\s+©' > '|[Cc]opyright\s+@copyright{}' > +'|copyright = u\'' > '|@set\s+copyright[\w-]+)' > > # 2: the years. Include the whitespace in the year, so that > @@ -363,7 +364,8 @@ class Copyright: > return (False, orig_line, next_line) > > line = (line[:match.start (2)] > -+ ' ' + canon_form + self.separator > ++ ('' if intro.startswith ('copyright = ') else ' ') > ++ canon_form + self.separator > + line[match.end (2):]) > > # Use the standard (C) form. Jakub
Re: [committed] Update copyright years, part 2
Jakub Jelinek writes: > Patch too large to attach uncompressed, this > has been created with update-copyright.py --this-year. > Note, I had to temporarily move away gcc/jit/docs/conf.py, > the python script dies on that and leaves almost all files unchanged. Thanks for doing the update. Is the patch below OK to fix the JIT thing? After this change, update-copyright.py --this-year seems to update gcc/jit correctly (including the texinfo files). Richard contrib/ * update-copyright.py (Copyright.__init__): Add a regexp for "copyright = u'". (Copyright.update_copyright): Don't add a space before the year in that case. Index: contrib/update-copyright.py === --- contrib/update-copyright.py 2014-08-05 10:29:02.695491816 +0100 +++ contrib/update-copyright.py 2015-01-13 14:13:43.500812967 + @@ -183,6 +183,7 @@ class Copyright: '|[Cc]opyright\s+%s' '|[Cc]opyright\s+©' '|[Cc]opyright\s+@copyright{}' +'|copyright = u\'' '|@set\s+copyright[\w-]+)' # 2: the years. Include the whitespace in the year, so that @@ -363,7 +364,8 @@ class Copyright: return (False, orig_line, next_line) line = (line[:match.start (2)] -+ ' ' + canon_form + self.separator ++ ('' if intro.startswith ('copyright = ') else ' ') ++ canon_form + self.separator + line[match.end (2):]) # Use the standard (C) form.
RE: Open Issues in the TSAN Runtime
On Tue, 13 Jan 2015 17:22:23, Jakub Jelinek wrote: > > > So do you mean following? I've bootstrapped/regtested on x86_64-linux and > i686-linux, but haven't tried any special testcases on it. If it works for > you, I'll commit it. > OK. Thanks Bernd. > 2015-01-13 Jakub Jelinek > > * sanitizer_common/sanitizer_deadlock_detector.h: Cherry pick > upstream r224518 and r224519. > * tsan/tsan_rtl_thread.cc: Cherry pick upstream r224702 and > r224834. > > --- libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h.jj 2014-05-22 > 10:18:14.893626024 +0200 > +++ libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h 2015-01-13 > 14:07:40.096414924 +0100 > @@ -48,6 +48,8 @@ class DeadlockDetectorTLS { > if (epoch_ == current_epoch) return; > bv_.clear(); > epoch_ = current_epoch; > + n_recursive_locks = 0; > + n_all_locks_ = 0; > } > > uptr getEpoch() const { return epoch_; } > @@ -81,7 +83,8 @@ class DeadlockDetectorTLS { > } > } > // Printf("remLock: %zx %zx\n", lock_id, epoch_); > - CHECK(bv_.clearBit(lock_id)); > + if (!bv_.clearBit(lock_id)) > + return; // probably addLock happened before flush > if (n_all_locks_) { > for (sptr i = n_all_locks_ - 1; i>= 0; i--) { > if (all_locks_with_contexts_[i].lock == static_cast(lock_id)) { > @@ -173,6 +176,7 @@ class DeadlockDetector { > recycled_nodes_.clear(); > available_nodes_.setAll(); > g_.clear(); > + n_edges_ = 0; > return getAvailableNode(data); > } > > --- libsanitizer/tsan/tsan_rtl_thread.cc.jj 2014-09-24 11:08:03.824028080 > +0200 > +++ libsanitizer/tsan/tsan_rtl_thread.cc 2015-01-13 14:08:06.167954292 +0100 > @@ -109,12 +109,13 @@ void ThreadContext::OnStarted(void *arg) > thr->dd_pt = ctx->dd->CreatePhysicalThread(); > thr->dd_lt = ctx->dd->CreateLogicalThread(unique_id); > } > + thr->fast_state.SetHistorySize(flags()->history_size); > + // Commit switch to the new part of the trace. > + // TraceAddEvent will reset stack0/mset0 in the new part for us. > + TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0); > + > thr->fast_synch_epoch = epoch0; > AcquireImpl(thr, 0, &sync); > - thr->fast_state.SetHistorySize(flags()->history_size); > - const uptr trace = (epoch0 / kTracePartSize) % TraceParts(); > - Trace *thr_trace = ThreadTrace(thr->tid); > - thr_trace->headers[trace].epoch0 = epoch0; > StatInc(thr, StatSyncAcquire); > sync.Reset(&thr->clock_cache); > DPrintf("#%d: ThreadStart epoch=%zu stk_addr=%zx stk_size=%zx " > > > Jakub
Re: [PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
On Tue, Jan 13, 2015 at 05:18:19PM +0100, Jakub Jelinek wrote: > 3) on request from Richard (which Segher on IRC argues against), "memory" >clobber also prevents CSE; As extend.texi used to say: " If your assembler instructions access memory in an unpredictable fashion, add @samp{memory} to the list of clobbered registers. This causes GCC to not keep memory values cached in registers across the assembler instruction and not optimize stores or loads to that memory. You also should add the @code{volatile} keyword if the memory affected is not listed in the inputs or outputs of the @code{asm}, as the @samp{memory} clobber does not count as a side-effect of the @code{asm}. " so a "memory" clobber in a non-volatile asm should not prevent CSE. Segher
Re: shift/extract SHIFT_COUNT_TRUNCATED combine bug
On 01/13/15 02:51, Richard Biener wrote: On a SHIFT_COUNT_TRUNCATED target, I don't think it's ever OK to widen a shift, variable or constant. In the case of a variable shift, we could easily have eliminated the masking code before or during combine. For a constant shift amount we could have adjusted the constant (see SHIFT_COUNT_TRUNCATED in cse.c) I think it's just an oversight and it has simply never bit us before. IMHO SHIFT_COUNT_TRUNCATED should be removed and instead backends should provide shift patterns with a (and:QI ...) for the shift amount which simply will omit that operation if suitable. Perhaps. I'm certainly not wed to concept of SHIFT_COUNT_TRUNCATED. I don't see that getting addressed in the gcc-5 timeframe. aarch64, alpha, epiphany, iq2000, lm32, m32r, mep, microblaze, mips, mn103, nds32, pa, sparc, stormy16, tilepro, v850 and xtensa are the current SHIFT_COUNT_TRUNCATED targets. Jeff
[patch] update function comments for lto_symtab_encoder_encode_*
Hi Richard. I'm chasing my tail here looking at an LTO + debug problem, and for the life of me I can't figure out how all this partition business affects a symbol's `analyzed' bit. Anyways... the documentation for all these functions is wrong. Can you look at this patch and tell me if it makes sense? I feel a bit uneasy committing under the obvious rule, since I don't entirely understand the partitioning thing. Would anyone mind me fixing this on mainline? It's just a comment fix. Also, since you seem to understand all this best, can you suggest some better wording for the lto_encoder_entry comments? /* Entry of LTO symtab encoder. */ struct lto_encoder_entry { symtab_node *node; /* Is the node in this partition (i.e. ltrans of this partition will be responsible for outputting it)? */ unsigned int in_partition:1; /* Do we encode body in this partition? */ unsigned int body:1; /* Do we encode initializer in this partition? For example the readonly variable initializers are encoded to aid constant folding even if they are not in the partition. */ unsigned int initializer:1; }; Whenever I get to the LTO part of this project, I promise to start documenting things better. This whole thing is a mystery. Thanks. Aldy diff --git a/gcc/ChangeLog b/gcc/ChangeLog index bddb9a8..68f8042 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2015-01-13 Aldy Hernandez + + * lto-cgraph: Update function comments for + lto_symtab_encoder_encode_*. + 2014-12-20 Martin Uecker * doc/invoke.texi: Document -Wdiscarded-array-qualifiers. diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index cf92892..0b9d945 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -187,7 +187,7 @@ lto_symtab_encoder_delete_node (lto_symtab_encoder_t encoder, } -/* Return TRUE if we should encode initializer of NODE (if any). */ +/* Return TRUE if we should encode the body of NODE (if any). */ bool lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t encoder, @@ -197,7 +197,7 @@ lto_symtab_encoder_encode_body_p (lto_symtab_encoder_t encoder, return encoder->nodes[index].body; } -/* Return TRUE if we should encode body of NODE (if any). */ +/* Specify that we encode the body of NODE in this partition. */ static void lto_set_symtab_encoder_encode_body (lto_symtab_encoder_t encoder, @@ -220,7 +220,7 @@ lto_symtab_encoder_encode_initializer_p (lto_symtab_encoder_t encoder, return encoder->nodes[index].initializer; } -/* Return TRUE if we should encode initializer of NODE (if any). */ +/* Specify that we should encode initializer of NODE (if any). */ static void lto_set_symtab_encoder_encode_initializer (lto_symtab_encoder_t encoder, @@ -230,7 +230,7 @@ lto_set_symtab_encoder_encode_initializer (lto_symtab_encoder_t encoder, encoder->nodes[index].initializer = true; } -/* Return TRUE if we should encode initializer of NODE (if any). */ +/* Return TRUE if NODE is in this partition. */ bool lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t encoder, @@ -242,7 +242,7 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t encoder, return encoder->nodes[index].in_partition; } -/* Return TRUE if we should encode body of NODE (if any). */ +/* Specify that NODE is in this partition. */ void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder,
Re: flatten expr.h (version 2)
On 13 January 2015 at 16:06, Prathamesh Kulkarni wrote: > On 13 January 2015 at 15:34, Richard Biener wrote: >> On Sun, 11 Jan 2015, Prathamesh Kulkarni wrote: >> >>> Hi, >>> This is a revamped expr.h flattening flattening patch rebased on >>> tree.h and tree-core.h flattening patch (r219402). >>> It depends upon the following patch to get committed. >>> https://gcc.gnu.org/ml/gcc-patches/2015-01/msg00565.html >>> >>> Changes: >>> * Removed all includes except tree-core.h. Put includes required by >>> expr.h in a comment. >>> * Moved stmt.c, expmed.c prototypes to stmt.h, expmed.h respectively. >>> * Adjusted generator programs: genemit.c, gengtype.c, genopinit.c, >>> genoutput.c. >>> * Did not put includes in gcc-plugin.h since expr.h cannot be included >>> by plugins >>> (putting them broke building a file in c-family/ since expr.h is not >>> allowed in front-ends) >>> * Affects java front-end (expr.h is allowed in java front-end). >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu with languages: >>> all,go,ada,jit >>> Built on all targets in config-list.mk with languages: all, go. >>> OK to commit ? >> >> diff --git a/gcc/expr.c b/gcc/expr.c >> index fc22862..824541e 100644 >> --- a/gcc/expr.c >> +++ b/gcc/expr.c >> @@ -41,11 +41,17 @@ along with GCC; see the file COPYING3. If not see >> #include "regs.h" >> #include "hard-reg-set.h" >> #include "except.h" >> -#include "input.h" >> #include "function.h" >> #include "insn-config.h" >> #include "insn-attr.h" >> /* Include expr.h after insn-config.h so we get HAVE_conditional_move. >> */ >> +#include "hashtab.h" >> +#include "emit-rtl.h" >> +#include "expmed.h" >> +#include "stmt.h" >> +#include "statistics.h" >> +#include "real.h" >> +#include "fixed-value.h" >> #include "expr.h" >> >> Please move the comment to the proper place > ah, my flattening tool doesn't look at comments. I will move the > comment before expr.h include, thanks. >> >> diff --git a/gcc/expr.h b/gcc/expr.h >> index a7638b8..f1be8dc 100644 >> --- a/gcc/expr.h >> +++ b/gcc/expr.h >> @@ -20,7 +20,8 @@ along with GCC; see the file COPYING3. If not see >> #ifndef GCC_EXPR_H >> #define GCC_EXPR_H >> >> -/* For inhibit_defer_pop */ >> +/* expr.h required includes */ >> +#if 0 >> #include "hashtab.h" >> #include "hash-set.h" >> #include "vec.h" >> @@ -29,15 +30,17 @@ along with GCC; see the file COPYING3. If not see >> #include "hard-reg-set.h" >> #include "input.h" >> #include "function.h" >> -/* For XEXP, GEN_INT, rtx_code */ >> #include "rtl.h" >> -/* For optimize_size */ >> #include "flags.h" >> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, >> - ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ >> #include "tree-core.h" >> -/* For GET_MODE_BITSIZE, word_mode */ >> #include "insn-config.h" >> +#include "alias.h" >> +#include "emit-rtl.h" >> +#include "expmed.h" >> +#include "stmt.h" >> +#endif >> >> Err, please remove the #if 0 section > I kept it because if something breaks later (hopefully not!), it will > be easier to fix. > I will remove it. >> >> + >> +#include "tree-core.h" >> >> Why? The original comment says >> >> -/* For tree_fits_[su]hwi_p, tree_to_[su]hwi, fold_convert, size_binop, >> - ssize_int, TREE_CODE, TYPE_SIZE, int_size_in_bytes,*/ >> >> but all those are declared in tree.h. Which means the files including >> expr.h must already include tree.h. >> >> If that's not the reason we need to include tree-core.h from expr.c >> please add a comment explaining why. > bt-load.c fails to compile because it includes expr.h but does not > include tree.h > I will place tree.h include in all files that include expr.h and rebuild. This is not going to work, since tree.h is now flattened. Shall also require including all headers required by tree.h in all files that include expr.h. Could we retain tree-core.h in expr.h for now ? Or should I insert tree.h (along with tree.h required includes) in all files that include expr.h ? Thanks, Prathamesh >> >> -/* Definitions from emit-rtl.c */ >> -#include "emit-rtl.h" >> - >> /* Return a memory reference like MEMREF, but with its mode widened to >> MODE and adjusted by OFFSET. */ >> extern rtx widen_memory_access (rtx, machine_mode, HOST_WIDE_INT); >> >> err - functions defined in emit-rtl.c should be declared in emit-rtl.h. >> Please fix that first. expr.h should _only_ contain prototypes >> for stuff defined in expr.c. > oops, missed it :( >> >> Andrew did a good job with this, first cleaning up a header moving >> declarations to proper places and only after that flattening it. >> >> The rest of the patch looks good to me but expr.h isn't in a good >> shape after it. > I will work on it and send patch with suggested changes by tomorrow. > > Thanks, > Prathamesh >> >> Thanks, >> Richard.
Re: [PATCH] Don't set TREE_READONLY on dummy args with VALUE attr (PR fortran/64528)
Jakub Jelinek wrote: > With VALUE attr, the PARM_DECLs hold the values and thus are (usually) not > read-only, therefore telling the middle-end they are read-only leads to > invalid IL. > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for > trunk? OK. Thanks for the patch. I haven't checked whether it also applies to 4.8/4.9, if so, the patch is also okay for those branches. Tobias
[PATCH] Fix ICE with -fgnu-tm and pragma ivdep (PR middle-end/64391)
We ICE on this testcase, because the usage of #pragma GCC ivdep pulls in the ANNOTATE internal functions which don't have underlying fndecls, hence we segv on a NULL_TREE. This patch makes get_attrs_for be prepared for such a scenario. The callers of get_attrs_for already check for NULL_TREE. I don't think internal fns can have transaction_* attributes anyway. While at it, I did some cleanups. Bootstrapped/regtested on {ppc64,x86_64}-linux, ok for trunk? 2015-01-13 Marek Polacek PR middle-end/64391 * trans-mem.c (get_attrs_for): Return NULL_TREE if X is NULL_TREE. * gcc.dg/tm/pr64391.c: New test. diff --git gcc/testsuite/gcc.dg/tm/pr64391.c gcc/testsuite/gcc.dg/tm/pr64391.c index e69de29..235118a 100644 --- gcc/testsuite/gcc.dg/tm/pr64391.c +++ gcc/testsuite/gcc.dg/tm/pr64391.c @@ -0,0 +1,10 @@ +/* PR middle-end/64391 */ +/* { dg-do compile } */ +/* { dg-options "-fgnu-tm" } */ + +void +foo (void) +{ +#pragma GCC ivdep + while (1); +} diff --git gcc/trans-mem.c gcc/trans-mem.c index b449760..21fa497 100644 --- gcc/trans-mem.c +++ gcc/trans-mem.c @@ -183,6 +183,9 @@ static void *expand_regions (struct tm_region *, static tree get_attrs_for (const_tree x) { + if (x == NULL_TREE) +return NULL_TREE; + switch (TREE_CODE (x)) { case FUNCTION_DECL: @@ -191,16 +194,16 @@ get_attrs_for (const_tree x) default: if (TYPE_P (x)) - return NULL; + return NULL_TREE; x = TREE_TYPE (x); if (TREE_CODE (x) != POINTER_TYPE) - return NULL; + return NULL_TREE; /* FALLTHRU */ case POINTER_TYPE: x = TREE_TYPE (x); if (TREE_CODE (x) != FUNCTION_TYPE && TREE_CODE (x) != METHOD_TYPE) - return NULL; + return NULL_TREE; /* FALLTHRU */ case FUNCTION_TYPE: Marek
[PATCH] Don't set TREE_READONLY on dummy args with VALUE attr (PR fortran/64528)
Hi! With VALUE attr, the PARM_DECLs hold the values and thus are (usually) not read-only, therefore telling the middle-end they are read-only leads to invalid IL. Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-01-13 Jakub Jelinek PR fortran/64528 * trans-decl.c (create_function_arglist): Don't set TREE_READONLY on dummy args with VALUE attribute. * gfortran.dg/pr64528.f90: New test. --- gcc/fortran/trans-decl.c.jj 2015-01-09 21:59:47.0 +0100 +++ gcc/fortran/trans-decl.c2015-01-13 14:24:22.342682352 +0100 @@ -2327,8 +2327,9 @@ create_function_arglist (gfc_symbol * sy /* Fill in arg stuff. */ DECL_CONTEXT (parm) = fndecl; DECL_ARG_TYPE (parm) = TREE_VALUE (typelist); - /* All implementation args are read-only. */ - TREE_READONLY (parm) = 1; + /* All implementation args except for VALUE are read-only. */ + if (!f->sym->attr.value) + TREE_READONLY (parm) = 1; if (POINTER_TYPE_P (type) && (!f->sym->attr.proc_pointer && f->sym->attr.flavor != FL_PROCEDURE)) --- gcc/testsuite/gfortran.dg/pr64528.f90.jj2015-01-13 14:27:13.475650977 +0100 +++ gcc/testsuite/gfortran.dg/pr64528.f90 2015-01-13 14:26:46.0 +0100 @@ -0,0 +1,20 @@ +! PR fortran/64528 +! { dg-do compile } +! { dg-options "-O -fno-tree-dce -fno-tree-ccp" } + +program pr64528 + interface + subroutine foo(x) + integer, value :: x + end subroutine foo + end interface + integer :: x + x = 10 + call foo(x) + if(x .ne. 10) then + endif +end program pr64528 +subroutine foo(x) + integer, value :: x + x = 11 +end subroutine foo Jakub
Re: [PATCH] IPA ICF: add comparison for target and optimization nodes
> Hello. > > Following patch adds support for target and optimization nodes comparison, > which is > based on Honza's newly added infrastructure. > > Apart from that, there's a small hunk that corrects formatting and removes > unnecessary > call to a comparison function. > > Hope it can be applied as one patch. > > Tested on x86_64-linux-pc without any new regression introduction. > > Ready for trunk? OK, thanks! Honza > > Thank you, > Martin > >From 393eaa47c8aef9a91a1c635016f23ca2f5aa25e4 Mon Sep 17 00:00:00 2001 > From: mliska > Date: Tue, 6 Jan 2015 15:06:18 +0100 > Subject: [PATCH] IPA ICF: target and optimization flags comparison. > > gcc/ChangeLog: > > 2015-01-06 Martin Liska > > * cgraphunit.c (cgraph_node::create_wrapper): Fix level of indentation. > * ipa-icf.c (sem_function::equals_private): Add support for target and > (sem_item_optimizer::merge_classes): Remove redundant function > comparison. > optimization flags comparison. > * tree.h (target_opts_for_fn): New function. > --- > gcc/cgraphunit.c | 52 ++-- > gcc/ipa-icf.c| 44 +++- > gcc/tree.h | 10 ++ > 3 files changed, 79 insertions(+), 27 deletions(-) > > diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c > index c8c8562..81246e2 100644 > --- a/gcc/cgraphunit.c > +++ b/gcc/cgraphunit.c > @@ -2385,40 +2385,40 @@ cgraphunit_c_finalize (void) > void > cgraph_node::create_wrapper (cgraph_node *target) > { > -/* Preserve DECL_RESULT so we get right by reference flag. */ > -tree decl_result = DECL_RESULT (decl); > + /* Preserve DECL_RESULT so we get right by reference flag. */ > + tree decl_result = DECL_RESULT (decl); > > -/* Remove the function's body but keep arguments to be reused > - for thunk. */ > -release_body (true); > -reset (); > + /* Remove the function's body but keep arguments to be reused > + for thunk. */ > + release_body (true); > + reset (); > > -DECL_RESULT (decl) = decl_result; > -DECL_INITIAL (decl) = NULL; > -allocate_struct_function (decl, false); > -set_cfun (NULL); > + DECL_RESULT (decl) = decl_result; > + DECL_INITIAL (decl) = NULL; > + allocate_struct_function (decl, false); > + set_cfun (NULL); > > -/* Turn alias into thunk and expand it into GIMPLE representation. */ > -definition = true; > -thunk.thunk_p = true; > -thunk.this_adjusting = false; > + /* Turn alias into thunk and expand it into GIMPLE representation. */ > + definition = true; > + thunk.thunk_p = true; > + thunk.this_adjusting = false; > > -cgraph_edge *e = create_edge (target, NULL, 0, CGRAPH_FREQ_BASE); > + cgraph_edge *e = create_edge (target, NULL, 0, CGRAPH_FREQ_BASE); > > -tree arguments = DECL_ARGUMENTS (decl); > + tree arguments = DECL_ARGUMENTS (decl); > > -while (arguments) > - { > - TREE_ADDRESSABLE (arguments) = false; > - arguments = TREE_CHAIN (arguments); > - } > + while (arguments) > +{ > + TREE_ADDRESSABLE (arguments) = false; > + arguments = TREE_CHAIN (arguments); > +} > > -expand_thunk (false, true); > -e->call_stmt_cannot_inline_p = true; > + expand_thunk (false, true); > + e->call_stmt_cannot_inline_p = true; > > -/* Inline summary set-up. */ > -analyze (); > -inline_analyze_function (this); > + /* Inline summary set-up. */ > + analyze (); > + inline_analyze_function (this); > } > > #include "gt-cgraphunit.h" > diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c > index c7ba75a..28158b3 100644 > --- a/gcc/ipa-icf.c > +++ b/gcc/ipa-icf.c > @@ -427,6 +427,49 @@ sem_function::equals_private (sem_item *item, >if (!equals_wpa (item, ignored_nodes)) > return false; > > + /* Checking function TARGET and OPTIMIZATION flags. */ > + cl_target_option *tar1 = target_opts_for_fn (decl); > + cl_target_option *tar2 = target_opts_for_fn (m_compared_func->decl); > + > + if (tar1 != NULL || tar2 != NULL) > +{ > + if (!cl_target_option_eq (tar1, tar2)) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fprintf (dump_file, "Source target flags\n"); > + cl_target_option_print (dump_file, 2, tar1); > + fprintf (dump_file, "Target target flags\n"); > + cl_target_option_print (dump_file, 2, tar2); > + } > + > + return return_false_with_msg ("Target flags are different"); > + } > +} > + else if (tar1 != NULL || tar2 != NULL) > +return return_false_with_msg ("Target flags are different"); > + > + cl_optimization *opt1 = opts_for_fn (decl); > + cl_optimization *opt2 = opts_for_fn (m_compared_func->decl); > + > + if (opt1 != NULL && opt2 != NULL) > +{ > + if (memcmp (opt1, opt2, sizeof(cl_optimization))) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + { > + fpr
Re: Open Issues in the TSAN Runtime
On Mon, Jan 12, 2015 at 09:55:15PM +0100, Jakub Jelinek wrote: > > specific changes from there. > > Yes, I'll try to cherry-pick those tomorrow. > > > I am especially interested in fixing these two issues, but there may be > > other important improvements too: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64350 : TSAN fails after > > stress-testing for a while > > > > was fixed by > > > > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_deadlock_detector.h?r1=224518&r2=224517&pathrev=224518 > > http://llvm.org/viewvc/llvm-project/compiler-rt/trunk/lib/sanitizer_common/sanitizer_deadlock_detector.h?r1=224519&r2=224518&pathrev=224519 > > > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63251 : tsan: corrupted shadow > > stack > > > > was fixed by > > > > http://llvm.org/viewvc/llvm-project?view=revision&revision=224702 > > http://llvm.org/viewvc/llvm-project?view=revision&revision=224834 So do you mean following? I've bootstrapped/regtested on x86_64-linux and i686-linux, but haven't tried any special testcases on it. If it works for you, I'll commit it. 2015-01-13 Jakub Jelinek * sanitizer_common/sanitizer_deadlock_detector.h: Cherry pick upstream r224518 and r224519. * tsan/tsan_rtl_thread.cc: Cherry pick upstream r224702 and r224834. --- libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h.jj 2014-05-22 10:18:14.893626024 +0200 +++ libsanitizer/sanitizer_common/sanitizer_deadlock_detector.h 2015-01-13 14:07:40.096414924 +0100 @@ -48,6 +48,8 @@ class DeadlockDetectorTLS { if (epoch_ == current_epoch) return; bv_.clear(); epoch_ = current_epoch; +n_recursive_locks = 0; +n_all_locks_ = 0; } uptr getEpoch() const { return epoch_; } @@ -81,7 +83,8 @@ class DeadlockDetectorTLS { } } // Printf("remLock: %zx %zx\n", lock_id, epoch_); -CHECK(bv_.clearBit(lock_id)); +if (!bv_.clearBit(lock_id)) + return; // probably addLock happened before flush if (n_all_locks_) { for (sptr i = n_all_locks_ - 1; i >= 0; i--) { if (all_locks_with_contexts_[i].lock == static_cast(lock_id)) { @@ -173,6 +176,7 @@ class DeadlockDetector { recycled_nodes_.clear(); available_nodes_.setAll(); g_.clear(); +n_edges_ = 0; return getAvailableNode(data); } --- libsanitizer/tsan/tsan_rtl_thread.cc.jj 2014-09-24 11:08:03.824028080 +0200 +++ libsanitizer/tsan/tsan_rtl_thread.cc2015-01-13 14:08:06.167954292 +0100 @@ -109,12 +109,13 @@ void ThreadContext::OnStarted(void *arg) thr->dd_pt = ctx->dd->CreatePhysicalThread(); thr->dd_lt = ctx->dd->CreateLogicalThread(unique_id); } + thr->fast_state.SetHistorySize(flags()->history_size); + // Commit switch to the new part of the trace. + // TraceAddEvent will reset stack0/mset0 in the new part for us. + TraceAddEvent(thr, thr->fast_state, EventTypeMop, 0); + thr->fast_synch_epoch = epoch0; AcquireImpl(thr, 0, &sync); - thr->fast_state.SetHistorySize(flags()->history_size); - const uptr trace = (epoch0 / kTracePartSize) % TraceParts(); - Trace *thr_trace = ThreadTrace(thr->tid); - thr_trace->headers[trace].epoch0 = epoch0; StatInc(thr, StatSyncAcquire); sync.Reset(&thr->clock_cache); DPrintf("#%d: ThreadStart epoch=%zu stk_addr=%zx stk_size=%zx " Jakub
[PATCH] Reenable CSE of non-volatile inline asm (PR rtl-optimization/63637)
Hi! My PR60663 fix unfortunately stopped CSE of all inline-asms, even when they e.g. only have the clobbers added by default. This patch attempts to restore the old behavior, with the exceptions: 1) as always, asm volatile is not CSEd 2) inline-asm with multiple outputs are not CSEd 3) on request from Richard (which Segher on IRC argues against), "memory" clobber also prevents CSE; this can be removed by removing the int j, lim = XVECLEN (x, 0); and loop below it 4) inline-asm with clobbers is never copied into an insn that wasn't inline-asm before, so if there are clobbers, we allow CSEing of e.g. two same inline-asms, but only by reusing results of one of those Bootstrapped/regtested on x86_64-linux and i686-linux, tested also with arm cross after reverting the PR60663 arm cost fix. Ok for trunk this way, or with 3) removed? 2015-01-13 Jakub Jelinek PR rtl-optimization/63637 PR rtl-optimization/60663 * cse.c (merge_equiv_classes): Set new_elt->cost to MAX_COST if elt->cost is MAX_COST for ASM_OPERANDS. (find_sets_in_insn): Fix up comment typo. (cse_insn): Don't set src_volatile for all non-volatile ASM_OPERANDS in PARALLELs, but just those with multiple outputs or with "memory" clobber. Set elt->cost to MAX_COST for ASM_OPERANDS in PARALLEL. Set src_elt->cost to MAX_COST if new_src is ASM_OPERANDS and elt->cost is MAX_COST. * gcc.dg/pr63637-1.c: New test. * gcc.dg/pr63637-2.c: New test. * gcc.dg/pr63637-3.c: New test. * gcc.dg/pr63637-4.c: New test. * gcc.dg/pr63637-5.c: New test. * gcc.dg/pr63637-6.c: New test. * gcc.target/i386/pr63637-1.c: New test. * gcc.target/i386/pr63637-2.c: New test. * gcc.target/i386/pr63637-3.c: New test. * gcc.target/i386/pr63637-4.c: New test. * gcc.target/i386/pr63637-5.c: New test. * gcc.target/i386/pr63637-6.c: New test. --- gcc/cse.c.jj2015-01-09 21:59:44.0 +0100 +++ gcc/cse.c 2015-01-13 13:26:23.391216064 +0100 @@ -1792,6 +1792,8 @@ merge_equiv_classes (struct table_elt *c } new_elt = insert (exp, class1, hash, mode); new_elt->in_memory = hash_arg_in_memory; + if (GET_CODE (exp) == ASM_OPERANDS && elt->cost == MAX_COST) + new_elt->cost = MAX_COST; } } } @@ -4258,7 +4260,7 @@ find_sets_in_insn (rtx_insn *insn, struc { int i, lim = XVECLEN (x, 0); - /* Go over the epressions of the PARALLEL in forward order, to + /* Go over the expressions of the PARALLEL in forward order, to put them in the same order in the SETS array. */ for (i = 0; i < lim; i++) { @@ -4634,12 +4636,27 @@ cse_insn (rtx_insn *insn) && REGNO (dest) >= FIRST_PSEUDO_REGISTER) sets[i].src_volatile = 1; - /* Also do not record result of a non-volatile inline asm with -more than one result or with clobbers, we do not want CSE to -break the inline asm apart. */ else if (GET_CODE (src) == ASM_OPERANDS && GET_CODE (x) == PARALLEL) - sets[i].src_volatile = 1; + { + /* Do not record result of a non-volatile inline asm with +more than one result. */ + if (n_sets > 1) + sets[i].src_volatile = 1; + + int j, lim = XVECLEN (x, 0); + for (j = 0; j < lim; j++) + { + rtx y = XVECEXP (x, 0, j); + /* And do not record result of a non-volatile inline asm +with "memory" clobber. */ + if (GET_CODE (y) == CLOBBER && MEM_P (XEXP (y, 0))) + { + sets[i].src_volatile = 1; + break; + } + } + } #if 0 /* It is no longer clear why we used to do this, but it doesn't @@ -5230,8 +5247,8 @@ cse_insn (rtx_insn *insn) ; /* Look for a substitution that makes a valid insn. */ - else if (validate_unshare_change -(insn, &SET_SRC (sets[i].rtl), trial, 0)) + else if (validate_unshare_change (insn, &SET_SRC (sets[i].rtl), + trial, 0)) { rtx new_rtx = canon_reg (SET_SRC (sets[i].rtl), insn); @@ -5593,6 +5610,12 @@ cse_insn (rtx_insn *insn) } elt = insert (src, classp, sets[i].src_hash, mode); elt->in_memory = sets[i].src_in_memory; + /* If inline asm has any clobbers, ensure we only reuse + existing inline asms and never try to put the ASM_OPERANDS + into an insn that isn't inline asm. */ + if (GET_CODE (src) == ASM_OPERANDS + && GET_CODE (x) == PARALLEL) + elt->cost = MAX_COST; sets[i].src_elt = classp = elt; } if (sets[i].src_cons
[PATCH] Fix REE for vector modes (PR rtl-optimization/64286, take 2)
On Mon, Jan 12, 2015 at 02:29:53PM -0700, Jeff Law wrote: > On 01/12/15 12:59, Jakub Jelinek wrote: > >Hi! > > > >As mentioned in the PR, giving up for all vector mode extensions > >is unnecessary, but unlike scalar integer extensions, where the low part > >of the extended value is the original value, for vectors this is not true, > >thus the old value is lost. Which means we can perform REE, but only if > >all uses of the definition are the same (code+mode) extension. > > > >Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > > >2015-01-12 Jakub Jelinek > > > > PR rtl-optimization/64286 > > * ree.c (add_removable_extension): Don't add vector mode > > extensions if all uses of the source register aren't the same > > vector extensions. > > > > * gcc.target/i386/avx2-pr64286.c: New test. > Does it make sense to remove your change for 59754 in combine_reaching_defs? > Shouldn't this patch handle that case as well? You're right, this patch handles that too. New patch, bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2015-01-13 Jakub Jelinek PR rtl-optimization/64286 * ree.c (combine_reaching_defs): Move part of comment earlier, remove !SCALAR_INT_MODE_P check. (add_removable_extension): Don't add vector mode extensions if all uses of the source register aren't the same vector extensions. * gcc.target/i386/avx2-pr64286.c: New test. --- gcc/ree.c.jj2015-01-12 21:29:07.023060045 +0100 +++ gcc/ree.c 2015-01-13 09:43:32.158449885 +0100 @@ -783,6 +783,17 @@ combine_reaching_defs (ext_cand *cand, c != REGNO (get_extended_src_reg (SET_SRC (PATTERN (cand->insn); if (copy_needed) { + /* Considering transformation of +(set (reg1) (expression)) +... +(set (reg2) (any_extend (reg1))) + +into + +(set (reg2) (any_extend (expression))) +(set (reg1) (reg2)) +... */ + /* In theory we could handle more than one reaching def, it just makes the code to update the insn stream more complex. */ if (state->defs_list.length () != 1) @@ -798,18 +809,6 @@ combine_reaching_defs (ext_cand *cand, c if (state->modified[INSN_UID (cand->insn)].kind != EXT_MODIFIED_NONE) return false; - /* Transformation of -(set (reg1) (expression)) -(set (reg2) (any_extend (reg1))) -into -(set (reg2) (any_extend (expression))) -(set (reg1) (reg2)) -is only valid for scalar integral modes, as it relies on the low -subreg of reg1 to have the value of (expression), which is not true -e.g. for vector modes. */ - if (!SCALAR_INT_MODE_P (GET_MODE (SET_DEST (PATTERN (cand->insn) - return false; - machine_mode dst_mode = GET_MODE (SET_DEST (PATTERN (cand->insn))); rtx src_reg = get_extended_src_reg (SET_SRC (PATTERN (cand->insn))); @@ -1027,6 +1026,7 @@ add_removable_extension (const_rtx expr, different extension. FIXME: this obviously can be improved. */ for (def = defs; def; def = def->next) if ((idx = def_map[INSN_UID (DF_REF_INSN (def->ref))]) + && idx != -1U && (cand = &(*insn_list)[idx - 1]) && cand->code != code) { @@ -1038,6 +1038,57 @@ add_removable_extension (const_rtx expr, } return; } + /* For vector mode extensions, ensure that all uses of the + XEXP (src, 0) register are the same extension (both code + and to which mode), as unlike integral extensions lowpart + subreg of the sign/zero extended register are not equal + to the original register, so we have to change all uses or + none. */ + else if (VECTOR_MODE_P (GET_MODE (XEXP (src, 0 + { + if (idx == 0) + { + struct df_link *ref_chain, *ref_link; + + ref_chain = DF_REF_CHAIN (def->ref); + for (ref_link = ref_chain; ref_link; ref_link = ref_link->next) + { + if (ref_link->ref == NULL + || DF_REF_INSN_INFO (ref_link->ref) == NULL) + { + idx = -1U; + break; + } + rtx_insn *use_insn = DF_REF_INSN (ref_link->ref); + const_rtx use_set; + if (use_insn == insn || DEBUG_INSN_P (use_insn)) + continue; + if (!(use_set = single_set (use_insn)) + || !REG_P (SET_DEST (use_set)) + || GET_MODE (SET_DEST (use_set)) != GET_MODE (dest) + || GET_CODE (SET_SRC (use_set)) != code + || !rtx_equal_p (XEXP (SET_SRC (use_set), 0), +XEXP (src, 0))) + { +
Re: [ARM]Make CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO return 2.
On 13/01/15 15:58, Renlin Li wrote: > Hi all, > > This patch update CLZ_DEFINED_VALUE_AT_ZERO and > CTZ_DEFINED_VALUE_AT_ZERO to make them return 2 in > arm back-end. > > Here are the explanations from GCC documentation: > > CLZ_DEFINED_VALUE_AT_ZERO (mode, value) > CTZ_DEFINED_VALUE_AT_ZERO (mode, value) > A C expression that indicates whether the architecture defines a value > for @code{clz} or @code{ctz} with a zero operand. > A result of 0 indicates the value is undefined. > If the value is defined for only the RTL expression, the macro should > evaluate to 1; if the value applies also to the corresponding optab > entry (which is normally the case if it expands directly into > the corresponding RTL), then the macro should evaluate to 2. > In the cases where the value is defined, @var{value} should be set to > this value. > > arm-none-eabi has been test on the model, no new issue. > Okay for trunk? > > gcc/ChangeLog: > > 2015-01-13 Renlin Li > > * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO): Return 2. > (CTZ_DEFINED_VALUE_AT_ZERO): Ditto. > > OK. R.
[ARM]Make CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO return 2.
Hi all, This patch update CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO to make them return 2 in arm back-end. Here are the explanations from GCC documentation: CLZ_DEFINED_VALUE_AT_ZERO (mode, value) CTZ_DEFINED_VALUE_AT_ZERO (mode, value) A C expression that indicates whether the architecture defines a value for @code{clz} or @code{ctz} with a zero operand. A result of 0 indicates the value is undefined. If the value is defined for only the RTL expression, the macro should evaluate to 1; if the value applies also to the corresponding optab entry (which is normally the case if it expands directly into the corresponding RTL), then the macro should evaluate to 2. In the cases where the value is defined, @var{value} should be set to this value. arm-none-eabi has been test on the model, no new issue. Okay for trunk? gcc/ChangeLog: 2015-01-13 Renlin Li * config/arm/arm.h (CLZ_DEFINED_VALUE_AT_ZERO): Return 2. (CTZ_DEFINED_VALUE_AT_ZERO): Ditto. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index d850982..83c9c33 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -2145,9 +2145,9 @@ extern int making_const_table; : reverse_condition (code)) #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ - ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE)) + ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2) #define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ - ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE)) + ((VALUE) = GET_MODE_UNIT_BITSIZE (MODE), 2) #define CC_STATUS_INIT \ do { cfun->machine->thumb1_cc_insn = NULL_RTX; } while (0)
Re: PATCH][ARM][4.8]Backport "Fix definition of __ARM_SIZEOF_WCHAR_T"
On 13/01/15 15:53, Renlin Li wrote: > Hi all, > > This is a backport patch for > https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=213864 > > arm-none-eabi regression test has been done, no new issues. > Okay for branch 4.8? > > gcc/ChangeLog > Fix PR target/61413 > Backport from mainline. > > 2014-08-12 Ramana Radhakrishnan > > PR target/61413 > * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition > of __ARM_SIZEOF_WCHAR_T. > > > > backport.patch > > > diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h > index c0f2184..2eb 100644 > --- a/gcc/config/arm/arm.h > +++ b/gcc/config/arm/arm.h > @@ -74,8 +74,8 @@ extern char arm_arch_name[]; > builtin_define_with_int_value ( \ > "__ARM_SIZEOF_MINIMAL_ENUM", \ > flag_short_enums ? 1 : 4);\ > - builtin_define_with_int_value ( \ > - "__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE); \ > + builtin_define_type_sizeof ("__ARM_SIZEOF_WCHAR_T", \ > + wchar_type_node); \ > if (TARGET_ARM_ARCH_PROFILE)\ > builtin_define_with_int_value ( \ > "__ARM_ARCH_PROFILE", TARGET_ARM_ARCH_PROFILE); \ > OK.
PATCH][ARM][4.8]Backport "Fix definition of __ARM_SIZEOF_WCHAR_T"
Hi all, This is a backport patch for https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=213864 arm-none-eabi regression test has been done, no new issues. Okay for branch 4.8? gcc/ChangeLog Fix PR target/61413 Backport from mainline. 2014-08-12 Ramana Radhakrishnan PR target/61413 * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix definition of __ARM_SIZEOF_WCHAR_T. diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index c0f2184..2eb 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -74,8 +74,8 @@ extern char arm_arch_name[]; builtin_define_with_int_value (\ "__ARM_SIZEOF_MINIMAL_ENUM",\ flag_short_enums ? 1 : 4);\ - builtin_define_with_int_value (\ - "__ARM_SIZEOF_WCHAR_T", WCHAR_TYPE_SIZE); \ + builtin_define_type_sizeof ("__ARM_SIZEOF_WCHAR_T", \ +wchar_type_node); \ if (TARGET_ARM_ARCH_PROFILE)\ builtin_define_with_int_value ( \ "__ARM_ARCH_PROFILE", TARGET_ARM_ARCH_PROFILE); \
Re: libffi is broken for x32
On 01/13/2015 07:35 AM, H.J. Lu wrote: > Can I apply it to GCC trunk? Please. r~
[[ARM/AArch64][testsuite] 27/36] Add vmull_n tests.
* gcc.target/aarch64/advsimd-intrinsics/vmull_n.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c new file mode 100644 index 000..df28a94 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmull_n.c @@ -0,0 +1,61 @@ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +/* Expected results. */ +VECT_VAR_DECL(expected,int,32,4) [] = { 0x11000, 0x11000, 0x11000, 0x11000 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x22000, 0x22000 }; +VECT_VAR_DECL(expected,uint,32,4) [] = { 0x33000, 0x33000, 0x33000, 0x33000 }; +VECT_VAR_DECL(expected,uint,64,2) [] = { 0x44000, 0x44000 }; + +#define INSN_NAME vmull +#define TEST_MSG "VMULL_N" +void exec_vmull_n (void) +{ + int i; + + /* vector_res = vmull_n(vector,val), then store the result. */ +#define TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L) \ + VECT_VAR(vector_res, T1, W2, N) =\ +INSN##_n_##T2##W(VECT_VAR(vector, T1, W, N), \ +L);\ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), VECT_VAR(vector_res, T1, W2, N)) + +#define TEST_VMULL_N(INSN, T1, T2, W, W2, N, L)\ + TEST_VMULL_N1(INSN, T1, T2, W, W2, N, L) + + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector, uint, 16, 4); + DECL_VARIABLE(vector, uint, 32, 2); + + DECL_VARIABLE(vector_res, int, 32, 4); + DECL_VARIABLE(vector_res, int, 64, 2); + DECL_VARIABLE(vector_res, uint, 32, 4); + DECL_VARIABLE(vector_res, uint, 64, 2); + + clean_results (); + + /* Initialize vector. */ + VDUP(vector, , int, s, 16, 4, 0x1000); + VDUP(vector, , int, s, 32, 2, 0x1000); + VDUP(vector, , uint, u, 16, 4, 0x1000); + VDUP(vector, , uint, u, 32, 2, 0x1000); + + /* Choose multiplier arbitrarily. */ + TEST_VMULL_N(INSN_NAME, int, s, 16, 32, 4, 0x11); + TEST_VMULL_N(INSN_NAME, int, s, 32, 64, 2, 0x22); + TEST_VMULL_N(INSN_NAME, uint, u, 16, 32, 4, 0x33); + TEST_VMULL_N(INSN_NAME, uint, u, 32, 64, 2, 0x44); + + CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, ""); + CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, ""); + CHECK(TEST_MSG, uint, 32, 4, PRIx32, expected, ""); + CHECK(TEST_MSG, uint, 64, 2, PRIx64, expected, ""); +} + +int main (void) +{ + exec_vmull_n (); + return 0; +} -- 2.1.0
Re: libffi is broken for x32
On Mon, Jan 12, 2015 at 5:13 PM, Richard Henderson wrote: > On 01/12/2015 04:57 PM, H.J. Lu wrote: >> The problem is my x86_64-*-linux-gnux32 patch >> >> https://gcc.gnu.org/ml/gcc-patches/2012-08/msg01083.html >> >> was never accepted upstream. Can I apply it to config.guess >> in GCC? > > Ah. Hmm. Perhaps the configure.host patch would be better after all. > Can I apply it to GCC trunk? Thanks. -- H.J.
Re: [PATCH] PR59448 - Promote consume to acquire
On 01/13/2015 10:20 AM, Torvald Riegel wrote: On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote: The problem with the patch in the PR is the memory model is immediately promoted from consume to acquire. This happens *before* any of the memmodel checks are made. If a consume is illegally specified (such as in a compare_exchange), it gets promoted to acquire and the compiler doesn't report the error because it never sees the consume. The only issue I can think of in compare_exchange is if the program specifies memory_order_consume for the success path but memory_order_acquire for the failure path, which is disallowed by the standard. However, I don't see a reason why the standard's requirement is anything but a performance check in our particular case. The only case we prevent the compiler from reporting is a consume-on-success / acquire-on-failure combination. But we upgrade the former to acquire, so we can't even cause libatomic (or similar) to issue too weak barriers due to libatomic relying on the standard's requirement. Thus, if there's no easy way to upgrade to acquire after the sanity checks, I think this isn't critical enough to hold up the patch from being committed. memory_order_consume is clearly a feature for experts. The error was actually in exchange... not compare_exchange like I wrote. and causes a testsuite error that specifically tests for an illegal consume. Andrew
Re: [patch RFA libffi SH] Fix configure error for sh4-unknown-linux-gnu
On 01/12/2015 09:38 PM, Kaz Kojima wrote: > 2015-01-13 Kaz Kojima > > * configure.host: Remove extra brackets for sh. Ok, thanks. r~
[[ARM/AArch64][testsuite] 08/36] Add vtrn tests. Refactor vzup and vzip tests.
* gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vtrn.c: New file. * gcc.target/aarch64/advsimd-intrinsics/vuzp.c: Use code from vshuffle.inc. * gcc.target/aarch64/advsimd-intrinsics/vzip.c: Use code from vshuffle.inc. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc new file mode 100644 index 000..928f338 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vshuffle.inc @@ -0,0 +1,139 @@ +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* In this case, output variables are arrays of vectors. */ +#define DECL_VSHUFFLE(T1, W, N) \ + VECT_ARRAY_TYPE(T1, W, N, 2) VECT_ARRAY_VAR(result_vec, T1, W, N, 2); \ + VECT_VAR_DECL(result_bis, T1, W, N)[2 * N] + + /* We need to use a temporary result buffer (result_bis), because + the one used for other tests is not large enough. A subset of the + result data is moved from result_bis to result, and it is this + subset which is used to check the actual behaviour. The next + macro enables to move another chunk of data from result_bis to + result. */ +#define TEST_VSHUFFLE(INSN, Q, T1, T2, W, N) \ + VECT_ARRAY_VAR(result_vec, T1, W, N, 2) =\ +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N)); \ + vst2##Q##_##T2##W(VECT_VAR(result_bis, T1, W, N),\ + VECT_ARRAY_VAR(result_vec, T1, W, N, 2)); \ + memcpy(VECT_VAR(result, T1, W, N), VECT_VAR(result_bis, T1, W, N), \ +sizeof(VECT_VAR(result, T1, W, N))); + + /* Overwrite "result" with the contents of "result_bis"[X]. */ +#define TEST_EXTRA_CHUNK(T1, W, N, X) \ + memcpy(VECT_VAR(result, T1, W, N), &(VECT_VAR(result_bis, T1, W, N)[X*N]), \ +sizeof(VECT_VAR(result, T1, W, N))); + + DECL_VARIABLE_ALL_VARIANTS(vector1); + DECL_VARIABLE_ALL_VARIANTS(vector2); + + /* We don't need 64 bits variants. */ +#define DECL_ALL_VSHUFFLE()\ + DECL_VSHUFFLE(int, 8, 8);\ + DECL_VSHUFFLE(int, 16, 4); \ + DECL_VSHUFFLE(int, 32, 2); \ + DECL_VSHUFFLE(uint, 8, 8); \ + DECL_VSHUFFLE(uint, 16, 4); \ + DECL_VSHUFFLE(uint, 32, 2); \ + DECL_VSHUFFLE(poly, 8, 8); \ + DECL_VSHUFFLE(poly, 16, 4); \ + DECL_VSHUFFLE(float, 32, 2); \ + DECL_VSHUFFLE(int, 8, 16); \ + DECL_VSHUFFLE(int, 16, 8); \ + DECL_VSHUFFLE(int, 32, 4); \ + DECL_VSHUFFLE(uint, 8, 16); \ + DECL_VSHUFFLE(uint, 16, 8); \ + DECL_VSHUFFLE(uint, 32, 4); \ + DECL_VSHUFFLE(poly, 8, 16); \ + DECL_VSHUFFLE(poly, 16, 8); \ + DECL_VSHUFFLE(float, 32, 4) + + DECL_ALL_VSHUFFLE(); + + /* Initialize input "vector" from "buffer". */ + TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector1, buffer); + VLOAD(vector1, buffer, , float, f, 32, 2); + VLOAD(vector1, buffer, q, float, f, 32, 4); + + /* Choose arbitrary initialization values. */ + VDUP(vector2, , int, s, 8, 8, 0x11); + VDUP(vector2, , int, s, 16, 4, 0x22); + VDUP(vector2, , int, s, 32, 2, 0x33); + VDUP(vector2, , uint, u, 8, 8, 0x55); + VDUP(vector2, , uint, u, 16, 4, 0x66); + VDUP(vector2, , uint, u, 32, 2, 0x77); + VDUP(vector2, , poly, p, 8, 8, 0x55); + VDUP(vector2, , poly, p, 16, 4, 0x66); + VDUP(vector2, , float, f, 32, 2, 33.6f); + + VDUP(vector2, q, int, s, 8, 16, 0x11); + VDUP(vector2, q, int, s, 16, 8, 0x22); + VDUP(vector2, q, int, s, 32, 4, 0x33); + VDUP(vector2, q, uint, u, 8, 16, 0x55); + VDUP(vector2, q, uint, u, 16, 8, 0x66); + VDUP(vector2, q, uint, u, 32, 4, 0x77); + VDUP(vector2, q, poly, p, 8, 16, 0x55); + VDUP(vector2, q, poly, p, 16, 8, 0x66); + VDUP(vector2, q, float, f, 32, 4, 33.8f); + +#define TEST_ALL_VSHUFFLE(INSN)\ + TEST_VSHUFFLE(INSN, , int, s, 8, 8); \ + TEST_VSHUFFLE(INSN, , int, s, 16, 4);\ + TEST_VSHUFFLE(INSN, , int, s, 32, 2);\ + TEST_VSHUFFLE(INSN, , uint, u, 8, 8);\ + TEST_VSHUFFLE(INSN, , uint, u, 16, 4); \ + TEST_VSHUFFLE(INSN, , uint, u, 32, 2); \ + TEST_VSHUFFLE(INSN, , poly, p, 8, 8);\ + TEST_VSHUFFLE(INSN, , poly, p, 16, 4);
[[ARM/AArch64][testsuite] 06/36] Add vmla and vmls tests.
* gcc.target/aarch64/advsimd-intrinsics/vmlX.inc: New file. * gcc.target/aarch64/advsimd-intrinsics/vmla.c: New file. * gcc.target/aarch64/advsimd-intrinsics/vmls.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc new file mode 100644 index 000..1c8f1be --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmlX.inc @@ -0,0 +1,110 @@ +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ +#define DECL_VMLX(T, W, N) \ + DECL_VARIABLE(vector1, T, W, N); \ + DECL_VARIABLE(vector2, T, W, N); \ + DECL_VARIABLE(vector3, T, W, N); \ + DECL_VARIABLE(vector_res, T, W, N) + + /* vector_res = vmla(vector, vector3, vector4), + then store the result. */ +#define TEST_VMLX1(INSN, Q, T1, T2, W, N) \ + VECT_VAR(vector_res, T1, W, N) = \ +INSN##Q##_##T2##W(VECT_VAR(vector1, T1, W, N), \ + VECT_VAR(vector2, T1, W, N), \ + VECT_VAR(vector3, T1, W, N)); \ + vst1##Q##_##T2##W(VECT_VAR(result, T1, W, N), \ + VECT_VAR(vector_res, T1, W, N)) + +#define TEST_VMLX(INSN, Q, T1, T2, W, N) \ + TEST_VMLX1(INSN, Q, T1, T2, W, N) + + DECL_VMLX(int, 8, 8); + DECL_VMLX(int, 16, 4); + DECL_VMLX(int, 32, 2); + DECL_VMLX(uint, 8, 8); + DECL_VMLX(uint, 16, 4); + DECL_VMLX(uint, 32, 2); + DECL_VMLX(float, 32, 2); + DECL_VMLX(int, 8, 16); + DECL_VMLX(int, 16, 8); + DECL_VMLX(int, 32, 4); + DECL_VMLX(uint, 8, 16); + DECL_VMLX(uint, 16, 8); + DECL_VMLX(uint, 32, 4); + DECL_VMLX(float, 32, 4); + + clean_results (); + + VLOAD(vector1, buffer, , int, s, 8, 8); + VLOAD(vector1, buffer, , int, s, 16, 4); + VLOAD(vector1, buffer, , int, s, 32, 2); + VLOAD(vector1, buffer, , uint, u, 8, 8); + VLOAD(vector1, buffer, , uint, u, 16, 4); + VLOAD(vector1, buffer, , uint, u, 32, 2); + VLOAD(vector1, buffer, , float, f, 32, 2); + VLOAD(vector1, buffer, q, int, s, 8, 16); + VLOAD(vector1, buffer, q, int, s, 16, 8); + VLOAD(vector1, buffer, q, int, s, 32, 4); + VLOAD(vector1, buffer, q, uint, u, 8, 16); + VLOAD(vector1, buffer, q, uint, u, 16, 8); + VLOAD(vector1, buffer, q, uint, u, 32, 4); + VLOAD(vector1, buffer, q, float, f, 32, 4); + + VDUP(vector2, , int, s, 8, 8, 0x11); + VDUP(vector2, , int, s, 16, 4, 0x22); + VDUP(vector2, , int, s, 32, 2, 0x33); + VDUP(vector2, , uint, u, 8, 8, 0x44); + VDUP(vector2, , uint, u, 16, 4, 0x55); + VDUP(vector2, , uint, u, 32, 2, 0x66); + VDUP(vector2, , float, f, 32, 2, 33.1f); + VDUP(vector2, q, int, s, 8, 16, 0x77); + VDUP(vector2, q, int, s, 16, 8, 0x88); + VDUP(vector2, q, int, s, 32, 4, 0x99); + VDUP(vector2, q, uint, u, 8, 16, 0xAA); + VDUP(vector2, q, uint, u, 16, 8, 0xBB); + VDUP(vector2, q, uint, u, 32, 4, 0xCC); + VDUP(vector2, q, float, f, 32, 4, 99.2f); + + VDUP(vector3, , int, s, 8, 8, 0xFF); + VDUP(vector3, , int, s, 16, 4, 0xEE); + VDUP(vector3, , int, s, 32, 2, 0xDD); + VDUP(vector3, , uint, u, 8, 8, 0xCC); + VDUP(vector3, , uint, u, 16, 4, 0xBB); + VDUP(vector3, , uint, u, 32, 2, 0xAA); + VDUP(vector3, , float, f, 32, 2, 10.23f); + VDUP(vector3, q, int, s, 8, 16, 0x99); + VDUP(vector3, q, int, s, 16, 8, 0x88); + VDUP(vector3, q, int, s, 32, 4, 0x77); + VDUP(vector3, q, uint, u, 8, 16, 0x66); + VDUP(vector3, q, uint, u, 16, 8, 0x55); + VDUP(vector3, q, uint, u, 32, 4, 0x44); + VDUP(vector3, q, float, f, 32, 4, 77.8f); + + TEST_VMLX(INSN_NAME, , int, s, 8, 8); + TEST_VMLX(INSN_NAME, , int, s, 16, 4); + TEST_VMLX(INSN_NAME, , int, s, 32, 2); + TEST_VMLX(INSN_NAME, , uint, u, 8, 8); + TEST_VMLX(INSN_NAME, , uint, u, 16, 4); + TEST_VMLX(INSN_NAME, , uint, u, 32, 2); + TEST_VMLX(INSN_NAME, , float, f, 32, 2); + TEST_VMLX(INSN_NAME, q, int, s, 8, 16); + TEST_VMLX(INSN_NAME, q, int, s, 16, 8); + TEST_VMLX(INSN_NAME, q, int, s, 32, 4); + TEST_VMLX(INSN_NAME, q, uint, u, 8, 16); + TEST_VMLX(INSN_NAME, q, uint, u, 16, 8); + TEST_VMLX(INSN_NAME, q, uint, u, 32, 4); + TEST_VMLX(INSN_NAME, q, float, f, 32, 4); + + CHECK_RESULTS (TEST_MSG, ""); +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + return 0; +} + diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c new file mode 100644 index 000..e3da60c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vmla.c @@ -0,0 +1,50 @@ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +#define INSN_NAME vmla +#define TEST_MSG "VMLA" + +/* Expected results. */ +VECT_VAR_DECL(expected,int,8,8) [] = { 0xdf, 0xe0, 0xe1, 0xe2, + 0xe3, 0xe4, 0xe5, 0xe6 }; +VECT_VAR_DECL(exp
Re: [PATCH] PR59448 - Promote consume to acquire
On Tue, 2015-01-13 at 09:56 -0500, Andrew MacLeod wrote: > The problem with the patch in the PR is the memory model is immediately > promoted from consume to acquire. This happens *before* any of the > memmodel checks are made. If a consume is illegally specified (such as > in a compare_exchange), it gets promoted to acquire and the compiler > doesn't report the error because it never sees the consume. The only issue I can think of in compare_exchange is if the program specifies memory_order_consume for the success path but memory_order_acquire for the failure path, which is disallowed by the standard. However, I don't see a reason why the standard's requirement is anything but a performance check in our particular case. The only case we prevent the compiler from reporting is a consume-on-success / acquire-on-failure combination. But we upgrade the former to acquire, so we can't even cause libatomic (or similar) to issue too weak barriers due to libatomic relying on the standard's requirement. Thus, if there's no easy way to upgrade to acquire after the sanity checks, I think this isn't critical enough to hold up the patch from being committed. memory_order_consume is clearly a feature for experts.
[[ARM/AArch64][testsuite] 35/36] Add vqdmull_lane tests.
* gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c new file mode 100644 index 000..12f2a6b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull_lane.c @@ -0,0 +1,94 @@ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +/* Expected values of cumulative_saturation flag. */ +int VECT_VAR(expected_cumulative_sat,int,16,4) = 0; +int VECT_VAR(expected_cumulative_sat,int,32,2) = 0; + +/* Expected results. */ +VECT_VAR_DECL(expected,int,32,4) [] = { 0x8000, 0x8000, 0x8000, 0x8000 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x4000, 0x4000 }; + +/* Expected values of cumulative_saturation flag when saturation + occurs. */ +int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1; +int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1; + +/* Expected results when saturation occurs. */ +VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff, +0x7fff, 0x7fff }; +VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff, +0x7fff }; + +#define INSN_NAME vqdmull +#define TEST_MSG "VQDMULL_LANE" + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + int i; + + /* vector_res = vqdmull_lane(vector,vector2,lane), then store the result. */ +#define TEST_VQDMULL_LANE2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) \ + Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \ + VECT_VAR(vector_res, T1, W2, N) =\ +INSN##_lane_##T2##W(VECT_VAR(vector, T1, W, N),\ + VECT_VAR(vector2, T1, W, N),\ + L); \ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), \ +VECT_VAR(vector_res, T1, W2, N)); \ + CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + + /* Two auxliary macros are necessary to expand INSN. */ +#define TEST_VQDMULL_LANE1(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_VQDMULL_LANE2(INSN, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) + +#define TEST_VQDMULL_LANE(T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_VQDMULL_LANE1(INSN_NAME, T1, T2, W, W2, N, L, EXPECTED_CUMULATIVE_SAT, CMT) + + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector2, int, 16, 4); + DECL_VARIABLE(vector2, int, 32, 2); + + DECL_VARIABLE(vector_res, int, 32, 4); + DECL_VARIABLE(vector_res, int, 64, 2); + + clean_results (); + + /* Initialize vector. */ + VDUP(vector, , int, s, 16, 4, 0x1000); + VDUP(vector, , int, s, 32, 2, 0x1000); + + /* Initialize vector2. */ + VDUP(vector2, , int, s, 16, 4, 0x4); + VDUP(vector2, , int, s, 32, 2, 0x2); + + /* Choose lane arbitrarily. */ + TEST_VQDMULL_LANE(int, s, 16, 32, 4, 2, expected_cumulative_sat, ""); + TEST_VQDMULL_LANE(int, s, 32, 64, 2, 1, expected_cumulative_sat, ""); + + CHECK(TEST_MSG, int, 32, 4, PRIx32, expected, ""); + CHECK(TEST_MSG, int, 64, 2, PRIx64, expected, ""); + + VDUP(vector, , int, s, 16, 4, 0x8000); + VDUP(vector2, , int, s, 16, 4, 0x8000); + VDUP(vector, , int, s, 32, 2, 0x8000); + VDUP(vector2, , int, s, 32, 2, 0x8000); + +#define TEST_MSG2 "with saturation" + TEST_VQDMULL_LANE(int, s, 16, 32, 4, 2, expected_cumulative_sat2, TEST_MSG2); + TEST_VQDMULL_LANE(int, s, 32, 64, 2, 1, expected_cumulative_sat2, TEST_MSG2); + + CHECK(TEST_MSG, int, 32, 4, PRIx32, expected2, TEST_MSG2); + CHECK(TEST_MSG, int, 64, 2, PRIx64, expected2, TEST_MSG2); +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + return 0; +} -- 2.1.0
[[ARM/AArch64][testsuite] 34/36] Add vqdmull tests.
* gcc.target/aarch64/advsimd-intrinsics/vqdmull.c: New file. diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c new file mode 100644 index 000..e71a624 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vqdmull.c @@ -0,0 +1,86 @@ +#include +#include "arm-neon-ref.h" +#include "compute-ref-data.h" + +/* Expected values of cumulative_saturation flag. */ +int VECT_VAR(expected_cumulative_sat,int,16,4) = 0; +int VECT_VAR(expected_cumulative_sat,int,32,2) = 0; + +/* Expected results. */ +VECT_VAR_DECL(expected,int,32,4) [] = { 0x200, 0x1c2, 0x188, 0x152 }; +VECT_VAR_DECL(expected,int,64,2) [] = { 0x200, 0x1c2 }; + +/* Expected values of cumulative_saturation flag when saturation + occurs. */ +int VECT_VAR(expected_cumulative_sat2,int,16,4) = 1; +int VECT_VAR(expected_cumulative_sat2,int,32,2) = 1; + +/* Expected results when saturation occurs. */ +VECT_VAR_DECL(expected2,int,32,4) [] = { 0x7fff, 0x7fff, +0x7fff, 0x7fff }; +VECT_VAR_DECL(expected2,int,64,2) [] = { 0x7fff, +0x7fff }; + +#define INSN_NAME vqdmull +#define TEST_MSG "VQDMULL" + +#define FNNAME1(NAME) exec_ ## NAME +#define FNNAME(NAME) FNNAME1(NAME) + +void FNNAME (INSN_NAME) (void) +{ + /* Basic test: y=vqdmull(x,x), then store the result. */ +#define TEST_VQDMULL2(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + Set_Neon_Cumulative_Sat(0, VECT_VAR(vector_res, T1, W2, N)); \ + VECT_VAR(vector_res, T1, W2, N) =\ +INSN##_##T2##W(VECT_VAR(vector, T1, W, N), \ + VECT_VAR(vector2, T1, W, N));\ + vst1q_##T2##W2(VECT_VAR(result, T1, W2, N), \ +VECT_VAR(vector_res, T1, W2, N)); \ + CHECK_CUMULATIVE_SAT(TEST_MSG, T1, W, N, EXPECTED_CUMULATIVE_SAT, CMT) + + /* Two auxliary macros are necessary to expand INSN. */ +#define TEST_VQDMULL1(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_VQDMULL2(INSN, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) + +#define TEST_VQDMULL(T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) \ + TEST_VQDMULL1(INSN_NAME, T1, T2, W, W2, N, EXPECTED_CUMULATIVE_SAT, CMT) + + DECL_VARIABLE(vector, int, 16, 4); + DECL_VARIABLE(vector, int, 32, 2); + DECL_VARIABLE(vector2, int, 16, 4); + DECL_VARIABLE(vector2, int, 32, 2); + DECL_VARIABLE(vector_res, int, 32, 4); + DECL_VARIABLE(vector_res, int, 64, 2); + + clean_results (); + + VLOAD(vector, buffer, , int, s, 16, 4); + VLOAD(vector, buffer, , int, s, 32, 2); + VLOAD(vector2, buffer, , int, s, 16, 4); + VLOAD(vector2, buffer, , int, s, 32, 2); + + TEST_VQDMULL(int, s, 16, 32, 4, expected_cumulative_sat, ""); + TEST_VQDMULL(int, s, 32, 64, 2, expected_cumulative_sat, ""); + + CHECK (TEST_MSG, int, 32, 4, PRIx16, expected, ""); + CHECK (TEST_MSG, int, 64, 2, PRIx32, expected, ""); + + VDUP(vector, , int, s, 16, 4, 0x8000); + VDUP(vector2, , int, s, 16, 4, 0x8000); + VDUP(vector, , int, s, 32, 2, 0x8000); + VDUP(vector2, , int, s, 32, 2, 0x8000); + +#define TEST_MSG2 "with saturation" + TEST_VQDMULL(int, s, 16, 32, 4, expected_cumulative_sat2, TEST_MSG2); + TEST_VQDMULL(int, s, 32, 64, 2, expected_cumulative_sat2, TEST_MSG2); + + CHECK (TEST_MSG, int, 32, 4, PRIx16, expected2, TEST_MSG2); + CHECK (TEST_MSG, int, 64, 2, PRIx32, expected2, TEST_MSG2); +} + +int main (void) +{ + FNNAME (INSN_NAME) (); + return 0; +} -- 2.1.0