[PATCH,i386] Enable FMA4 for AMD bdver3
Hi The below patch enables FMA4 for AMD bdver3 architectures. make -k check passes. Is it OK for upstream? Regards Ganesh diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fb5b267..cbb5311 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-10-16 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + + * config/i386/i386.c (ix86_option_override_internal): Enable FMA4 + for AMD bdver3. + 2013-10-16 Hans-Peter Nilsson h...@axis.com * config/cris/t-elfmulti (MULTILIB_OPTIONS, MULTILIB_DIRNAMES) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b5796db..c24ce36 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -3104,7 +3104,7 @@ ix86_option_override_internal (bool main_args_p, {bdver3, PROCESSOR_BDVER3, CPU_BDVER3, PTA_64BIT | PTA_MMX | PTA_SSE | PTA_SSE2 | PTA_SSE3 | PTA_SSE4A | PTA_CX16 | PTA_ABM | PTA_SSSE3 | PTA_SSE4_1 - | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX + | PTA_SSE4_2 | PTA_AES | PTA_PCLMUL | PTA_AVX | PTA_FMA4 | PTA_XOP | PTA_LWP | PTA_BMI | PTA_TBM | PTA_F16C | PTA_FMA | PTA_PRFCHW | PTA_FXSR | PTA_XSAVE | PTA_XSAVEOPT | PTA_FSGSBASE},
Re: [PATCH,i386] Enable FMA4 for AMD bdver3
On Wed, Oct 16, 2013 at 8:28 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The below patch enables FMA4 for AMD bdver3 architectures. make -k check passes. +2013-10-16 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + + * config/i386/i386.c (ix86_option_override_internal): Enable FMA4 + for AMD bdver3. + OK for mainline and 4.8 branch (for 4.8.2 if approved by Jakub, otherwise please wait for branch to open). Thanks, Uros.
Re: [PATCH,i386] Enable FMA4 for AMD bdver3
On Wed, Oct 16, 2013 at 09:00:58AM +0200, Uros Bizjak wrote: On Wed, Oct 16, 2013 at 8:28 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The below patch enables FMA4 for AMD bdver3 architectures. make -k check passes. +2013-10-16 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com + + * config/i386/i386.c (ix86_option_override_internal): Enable FMA4 + for AMD bdver3. + OK for mainline and 4.8 branch (for 4.8.2 if approved by Jakub, otherwise please wait for branch to open). 4.8.2 is already rolling, so too late for that. Jakub
Re: libgo patch committed: Don't clobber context when catching signal
Hello! This patch to libgo fixes a dumb bug in which catching a signal via the os/signal package could clobber the saved split-stack context, leading to disaster if the signal arrived at the wrong time. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Will commit to 4.8 branch when it reopens. Attached patch is re-enables bootstrap on targets without split stack due to unused variable warning. The patch was tested on alphaev68-linux-gnu [1]. [1] http://gcc.gnu.org/ml/gcc-testresults/2013-10/msg01200.html Uros. Index: runtime/go-signal.c === --- runtime/go-signal.c (revision 203610) +++ runtime/go-signal.c (working copy) @@ -399,7 +399,9 @@ { G *gp; M *mp; +#ifdef USING_SPLIT_STACK void *stack_context[10]; +#endif /* We are now running on the stack registered via sigaltstack. (Actually there is a small span of time between runtime_siginit
[buildrobot] Minor fallout of: Fix PR 57756
On Wed, 2013-10-16 16:24:12 +1030, Alan Modra amo...@gmail.com wrote: On Tue, Oct 15, 2013 at 02:45:23PM -0700, Sriraman Tallam wrote: I committed this patch after making the above change. /src/gcc-virgin/gcc/config/rs6000/rs6000.c: At global scope: /src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion from ‘void (*)(cl_target_option*)’ to ‘void (*)(cl_target_option*, gcc_options*)’ [-fpermissive] /src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion from ‘void (*)(cl_target_option*)’ to ‘void (*)(gcc_options*, cl_target_option*)’ [-fpermissive] Not quite... Also causes another build failure seen only on some targets (m68k, mips, ppc, s390 and sparc), but not all, which build robot[1] found: m68k-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19881 mips-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19892 mips64-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19920 mips64el-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19879 mipsel-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19924 powerpc-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19903 powerpc64le-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19917 powerpcle-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19884 s390-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19889 s390x-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19913 sparc-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19957 sparc64-linux http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=19898 They're all similar, like this: g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. -I/home/jbglaw/repos/gcc/gcc/../include -I/home/jbglaw/repos/gcc/gcc/../libcpp/include -I/home/jbglaw/repos/gcc/gcc/../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o sparc.o -MT sparc.o -MMD -MP -MF ./.deps/sparc.TPo /home/jbglaw/repos/gcc/gcc/config/sparc/sparc.c /home/jbglaw/repos/gcc/gcc/config/sparc/sparc.c: In function ‘void sparc_option_override()’: /home/jbglaw/repos/gcc/gcc/config/sparc/sparc.c:1228: error: ‘target_flags_explicit’ was not declared in this scope /home/jbglaw/repos/gcc/gcc/config/sparc/sparc.c:1322: error: ‘target_flags_explicit’ was not declared in this scope It's always a missing declaration of `target_flags_explicit', so this should be quite easy to fix. MfG, JBG [1] http://toolchain.lug-owl.de/buildbot/ http://toolchain.lug-owl.de/buildbot/timeline.php -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: Fortschritt bedeutet, einen Schritt so zu machen, the second : daß man den nächsten auch noch machen kann. signature.asc Description: Digital signature
Re: [PATCH] Fix PR54702 and LTO bootstrap (for me)
On Tue, 15 Oct 2013, Mike Stump wrote: On Oct 15, 2013, at 12:45 AM, Richard Biener rguent...@suse.de wrote: On Mon, 14 Oct 2013, Mike Stump wrote: On Sep 25, 2012, at 8:00 AM, Richard Guenther rguent...@suse.de wrote: 2012-09-25 Richard Guenther rguent...@suse.de PR lto/54625 * lto-symtab.c (lto_symtab_merge_cgraph_nodes_1): Do not merge cgraph nodes for builtins. * gcc.dg/lto/pr54702_0.c: New testcase. * gcc.dg/lto/pr54702_1.c: Likewise. * gcc.dg/lto/pr54625-1_0.c: Likewise. * gcc.dg/lto/pr54625-1_1.C: Likewise. * gcc.dg/lto/pr54625-2_0.c: Likewise. * gcc.dg/lto/pr54625-2_1.C: Likewise. xgcc: error: /home/mrs/wg/gcc/gcc/testsuite/gcc.dg/lto/pr54625-2_1.C: C++ compiler not installed on this system FAIL: gcc.dg/lto/pr54625-2 c_lto_pr54625-2_1.o assemble, -O2 -flto -fuse-linker-plugin shouldn't C++ testcases go into the C++ testsuite? Well ... this is a mixed C / C++ testcase (see the other file participating, gcc.dg/lto/pr54625-2_0.c). The C testsuite is the only one where the driver switches languages based on the file ending. That is an interesting inquiry, but, doesn't lead to the error. Not sure exactly why you bring it up: From a random C++ test suite run: spawn /home/mrs/work2/gcc-port/gcc/xgcc -B/home/mrs/work2/gcc-port/gcc/ -fno-diagnostics-show-caret -O0 -flto -flto-partition=none -fuse-linker-plugin -DNO_TRAMPOLINES -c -DNO_TRAMPOLINES -o cp_lto_20100603-1_1.o /home/mrs/work2/gcc/gcc/testsuite/g++.dg/lto/20100603-1_1.c here, we see that xgcc is used to build a C part of a mix language testcase in the C++ test suite. If you follow that style, is there some reason why it won't just work as you want? Hmm, it seems at least the fortran frontend knows how to mix fortran and C. And the fortran/c test cases are in the fortran test suite. So if you really did not build the C++ compiler then we need, Yup, no C++ compiler. at least for lto.exp, a way to say dg-require-c++-frontend in the main file (thats the _0 one). Why do that? Btw, can we end up with the C frontend not built? I'v never contemplated that idea? but no, c is automatically added in order I suppose, to build the language independent runtime (libgcc) with. I suppose it's fine to move the testcase to the C++ lto testsuite if you verify it will use the C compiler for the C parts. Thanks, Richard.
Re: [PATCH] tree_code_name wrapper
On Tue, Oct 15, 2013 at 5:11 PM, Paulo Matos pma...@broadcom.com wrote: -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Jakub Jelinek Sent: 15 October 2013 15:45 To: Paulo Matos Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree_code_name wrapper The ChangeLog is still wrong: OK, sorry. If this is ok now I will open a bottle of champagne. Thank you all for your time reviewing this. Ok to submit? Ok. Thanks. Richard. 2013-10-15 Paulo Matos pma...@broadcom.com gcc/ * tree-core.h (tree_code_name): Remove. * tree.h (get_tree_code_name): New prototype. * tree.c (tree_code_name): Make static. (get_tree_code_name): New function. (dump_tree_statistics, tree_check_failed, tree_not_check_failed, tree_class_check_failed, tree_range_check_failed, tree_not_class_check_failed, omp_clause_check_failed, tree_contains_struct_check_failed, tree_operand_check_failed): Use new wrapper get_tree_code_name instead of calling tree_code_name directly. * tree-vrp.c (dump_asserts_for): Likewise. * tree-dump.c (dequeue_and_dump): Likewise. * tree-pretty-print.c (do_niy, dump_generic_node): Likewise. * tree-pretty-print.h (pp_unsupported_tree): Likewise. * lto-streamer-out.c (lto_write_tree, DFS_write_tree): Likewise. * tree-ssa-dom.c (print_expr_hash_elt): Likewise. * gimple-pretty-print.c (dump_unary_rhs, dump_binary_rhs, dump_ternary_rhs, dump_gimple_assign, dump_gimple_cond, dump_gimple_omp_for): Likewise. * tree-vect-data-refs.c (vect_create_data_ref_ptr): Likewise. * tree-ssa-pre.c (print_pre_expr): Likewise. * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Likewise. * print-tree.c (print_node_brief, print_node): Likewise. * gimple.c (gimple_check_failed): Likewise. * lto-streamer.c (lto_tag_name, print_lto_report): Likewise. * config/frv/frv.c (frv_init_cumulative_args): Likewise. * config/mep/mep.c (mep_validate_vliw): Likewise. * config/iq2000/iq2000.c (init_cumulative_args): Likewise. * config/rs6000/rs6000.c (init_cumulative_args): Likewise. gcc/cp/ * error.c (code_to_string): Use new wrapper get_tree_code_name. * cxx-pretty-print.c (pp_cxx_assignment_operator): Likewise. * pt.c (tsubst): Likewise. * semantics.c (cxx_eval_constant_expression, potential_constant_expression_1): Likewise. * mangle.c (MANGLE_TRACE_TREE, dump_substitution_candidates, add_substitution, find_substitution): Likewise. -- Paulo Matos
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote: On 10/15/13 10:53, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. You got the job! ;) Richard. jeff
RE: [PATCH] Fix PR58682
-Original Message- From: Mike Stump [mailto:mikest...@comcast.net] Sent: 14 October 2013 20:46 To: Kyrill Tkachov Cc: Paulo Matos; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix PR58682 In this case, I've reviewed the license I suspect you have, and I suspect it does not have a grant of enough rights to allow it to be contributed to gcc. Assuming we cannot get the testcase into GCC, can we get the patch into trunk? If you try it locally you can definitely see the ICE and that the patch fixes it. -- Paulo Matos
Re: [Patch, Fortran] PR58652 - accept CLASS(*) as argument to CLASS(*)
Dear Tobias, Your patch is fine for trunk. Thanks Paul On 16 October 2013 00:38, Tobias Burnus bur...@net-b.de wrote: As the test case (see also PR) showed, gfortran was rejecting: subroutine list_move_alloc(self,item) class(list_node),intent(inout) :: self class(*),intent(inout),allocatable :: item ... class(*), allocatable :: expr ... call ast%move_alloc(expr) with the bogus message: call ast%move_alloc(expr) 1 Error: Actual argument to 'item' at (1) must have the same declared type The attached patch now also accepts passing CLASS(*) to CLASS(*). Built and currently regtesting on x86-64-gnu-linux (when successful:) OK for the trunk? Tobias -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[PATCH] Enhance ifcombine to recover non short circuit branches
Hi, The patch enhances ifcombine pass to recover some non short circuit branches. Basically, it will do the following transformation: Case 1: if (cond1) if (cond2) == if (cond1 cond2) Case 2: if (cond1) goto L1 if (cond2) goto L1 == if (cond1 || cond2) goto L1 Case 3: if (cond1) goto L1 else goto L2 L1: if (cond2) goto L2 == if (invert (cond1) || cond2) goto L2 Case 4: if (cond1) goto L1 if (cond2) goto L2 L1: == if (invert (cond1) cond2) goto L2 Bootstrap on X86-64 and ARM. Two new FAILs in regression tests: gcc.dg/uninit-pred-8_b.c gcc.dg/uninit-pred-9_b.c uninit pass should be enhanced to handle more complex conditions. Will submit a bug to track it and fix it later. Is it OK for trunk? Thanks! -Zhenqiang ChangeLog: 2013-10-16 Zhenqiang Chen zhenqiang.c...@linaro.org * fold-const.c (simple_operand_p_2): Make it global. * tree.h (simple_operand_p_2): Declare it. * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (bb_has_overhead_p, generate_condition_node, ifcombine_ccmp): New functions. (ifcombine_fold_ifandif): New function, extracted from ifcombine_ifandif. (ifcombine_ifandif): Call ifcombine_ccmp. (tree_ssa_ifcombine_bb): Skip optimized bb. testsuite/ChangeLog 2013-10-16 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated. ifcombine.patch Description: Binary data
Re: [PATCH] Move effective target check after other directives in c-c++-common/cpp/openmp-define-3.c
Hi! On Mon, 14 Oct 2013 11:26:21 +0100, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: This minuscule patch moves the dg-require-effective-target fopenmp line after the other directives in the c-c++-common/cpp/openmp-define-3.c test. Otherwise dejagnu still proceeds to add -fopenmp to the options which fails on bare metal arm and aarch64 targets. Thanks for reporting and fixing this -- I had not been aware of the ordering requirements of dg-* directives. From gcc/doc/sourcebuild.texi: @item @{ dg-require-effective-target @var{keyword} [@{ @var{selector} @}] @} Skip the test if [...] This directive must appear after any @code{dg-do} directive in the test and before any @code{dg-additional-sources} directive. [...] --- a/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c +++ b/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c @@ -1,6 +1,6 @@ -/* { dg-require-effective-target fopenmp } */ /* { dg-options -fopenmp } */ /* { dg-do preprocess } */ +/* { dg-require-effective-target fopenmp } */ The other testsuite files that I added in r203417 should also be adapted accordingly. Do you want to do that, or want me to do it? Grüße, Thomas pgp8dkUCJT8ZE.pgp Description: PGP signature
Re: RFC: Add of type-demotion pass
On Wed, Oct 16, 2013 at 7:59 AM, Jeff Law l...@redhat.com wrote: On 07/08/13 11:45, Kai Tietz wrote: Hello, These passes are implementing type-demotion (type sinking into statements) for some gimple statements. I limitted inital implementation to the set of multiply, addition, substraction, and binary-and/or/xor. Additional this pass adds some rules to sink type-cast sequences - eg. (int) (short) x; with char as type of x. This special handing in this pass of such type-sequence simplification is necessary to avoid too complex cast-sequences by type-unsigned conversion used by this pass to avoid undefined overflow behaviour. I will sent separate patch with some test-cases to demonstrate and verify operation of this new optimization. Just one sample I will cite here to demonstrate operation of type-demotion pass. So I think we want to come back to this patch and make some decisions about how to go forward. I've been reviewing all the discussions back through last year and I think the single biggest thing we need to realize is that there's nothing in this patch that *couldn't* be handled by tree-ssa-forwprop with a suitable amount of following the use-def chains through casts in various transformations and determining when those casts can be safely ignored. However, I don't think extending tree-ssa-forwprop in this way is wise. As I see it, the value in this patch is it allows us to avoid that nonsense by essentially reformulating this a problem of moving and reformulating the casts in the IL. I see two primary effects of type sinking. Note it was called type demotion ;) First and probably the most important in my mind is by sinking a cast through its uses the various transformations we already perform are more likely to apply *without* needing to handle optimizing through typecasts explicitly. I would say it is desirable to express arithmetic in the smallest possible types (see what premature optimization the C family frontends do to narrow operations again after C integer promotion applied). You need some kind of range information to do this, thus either integrate it into VRP (there is already code that does this there) or use range information from VRP which we now preserve. The second primary effect is, given two casts where the first indirectly feeds the second (ie, the first feeds some statement, which then feeds the second cast), if we're able to sink the first cast, we end up with the first cast directly feeding the second cast. When this occurs one of the two casts can often be eliminated. Sadly, I didn't keep any of those test files, but I regularly saw them in GCC bootstraps. This transformation is applied both by fold-const.c and by SSA forwprop (our GIMPLE combiner). Doing it in yet another pass looks wrong (and it isn't type demotion but also can be promotion). Kai, you mentioned you had more tests, but I never saw those posted. I pulled together a handful of tests from the PRs discussion threads. Some may be duplicates of yours (mine are attached). If you could post yours as well, it'd be helpful. Not all of mine are helped by this patch, but at least two are. There was some question about what to do with PROMOTE_MODE based targets. On those targets there's a lot of value in getting something into word mode and doing everything in word mode. I think that can/should be a separate issue -- ISTM that ought to be handled fairly late in the tree optimizers. I don't see a strong argument for gating this patch on catering to PROMOTE_MODE. In contrast to the desire of expressing operations in the smallest required type there is the desire of exposing the effect of PROMOTE_MODE on GIMPLE instead of only during RTL expansion. This is because the truncations (sext and zext) PROMOTE_MODE introduced are easier to optimize away when range information is available (see the attempts to address this at RTL expansion time from Kugan from Linaro). Similarly, I know there's a type hoisting patch that's also queued up. I think it should be handled separately as well. I think we need to paint a picture of the final result - what is the main objective of the various(?!) passes in question? Where do we do the same kind of transformation already? AFAIK we have many places that optimize a series of conversions (that's a combine-like problem), I'd call that conversion contraction. We do have a set of foldings in the C family frontends, in fold and in forwprop that try to un-do the effect of C integer promotions (that's also a combine-like problem). We have no pass that tries to promote or demote the types of variables with using a data-flow approach (VRP comes closest, but the transform is again pattern-matching, thus combine-like). I do not object to adding this kind of pass, but I suggest to look at the targets desires when implementing it - which eventually means to honor PROMOTE_MODE (be careful about pass placement here - you want
[C++ Patch] PR 58724
Hi, we ICE on the C++11 version of the GNU visibility attribute on a namespace. I suppose we want to accept it, then just use get_attribute_name instead of TREE_PURPOSE. The bug report also asked for an example in the docs of attribute on a namespace declaration. Tested x86_64-linux. Thanks, Paolo. /// 2013-10-16 Paolo Carlini paolo.carl...@oracle.com PR c++/58724 * doc/extend.texi [visibility (visibility_type)]: Add example about visibility attribute on namespace declaration. /cp 2013-10-16 Paolo Carlini paolo.carl...@oracle.com PR c++/58724 * name-lookup.c (handle_namespace_attrs): Use get_attribute_name. /testsuite 2013-10-16 Paolo Carlini paolo.carl...@oracle.com PR c++/58724 * g++.dg/cpp0x/gen-attrs-56.C: New. Index: cp/name-lookup.c === --- cp/name-lookup.c(revision 203691) +++ cp/name-lookup.c(working copy) @@ -3571,7 +3571,7 @@ handle_namespace_attrs (tree ns, tree attributes) for (d = attributes; d; d = TREE_CHAIN (d)) { - tree name = TREE_PURPOSE (d); + tree name = get_attribute_name (d); tree args = TREE_VALUE (d); if (is_attribute_p (visibility, name)) Index: doc/extend.texi === --- doc/extend.texi (revision 203691) +++ doc/extend.texi (working copy) @@ -4220,6 +4220,12 @@ the One Definition Rule; for example, it is usuall an inline method as hidden without marking the whole class as hidden. A C++ namespace declaration can also have the visibility attribute. + +@smallexample +namespace nspace1 __attribute__ ((visibility (protected))) +@{ /* @r{Do something.} */; @} +@end smallexample + This attribute applies only to the particular namespace body, not to other definitions of the same namespace; it is equivalent to using @samp{#pragma GCC visibility} before and after the namespace Index: testsuite/g++.dg/cpp0x/gen-attrs-56.C === --- testsuite/g++.dg/cpp0x/gen-attrs-56.C (revision 0) +++ testsuite/g++.dg/cpp0x/gen-attrs-56.C (working copy) @@ -0,0 +1,5 @@ +// PR c++/58724 +// { dg-do compile { target c++11 } } + +namespace foo __attribute__((visibility(default))) {} +namespace bar [[gnu::visibility(default)]] {}
Re: [PATCH] Move effective target check after other directives in c-c++-common/cpp/openmp-define-3.c
Hi Thomas, --- a/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c +++ b/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c @@ -1,6 +1,6 @@ -/* { dg-require-effective-target fopenmp } */ /* { dg-options -fopenmp } */ /* { dg-do preprocess } */ +/* { dg-require-effective-target fopenmp } */ The other testsuite files that I added in r203417 should also be adapted accordingly. Do you want to do that, or want me to do it? I'll leave the honors to you :) Kyrill Grüße, Thomas
Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
PIng? On 19 September 2013 18:21, Charles Baylis charles.bay...@linaro.org wrote: Hi Here is an updated version. Changelog: * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails with hardfloat, and test is not thumb-specific * gcc,target/arm/thumb-ltu.c: Avoid test failure with hardfloat ABI by requiring arm_thumb1_ok * lib/target-supports.exp (check_effective_target_arm_fp16_ok_nocache): don't force -mfloat-abi=soft when building for hardfloat target On 19 August 2013 16:34, Richard Earnshaw rearn...@arm.com wrote: On 15/08/13 15:10, Charles Baylis wrote: Hi The attached patch fixes some tests which fail when testing gcc for a arm-none-linux-gnueabihf target because they do not expect to be built with a hard float ABI. The change in target-supports.exp fixes arm-fp16-ops-5.c and arm-fp16-ops-6.c. Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause any other tests to break. Comments? This is my first patch, so please point out anything wrong. 2013-08-15 Charles Baylis charles.bay...@linaro.org * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * target-supports.exp: don't force -mfloat-abi=soft when building for hardfloat target hf-fixes.txt Index: gcc/testsuite/gcc.dg/builtin-apply2.c === --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726) +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-skip-if Variadic funcs have all args on stack. Normal funcs have args in registers. { aarch64*-*-* avr-*-* } { * } { } } */ /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* } { -mfloat-abi=hard } { } } */ +/* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-gnueabihf } { * } { -mfloat-abi=soft* } } */ As you've noticed, basing the test's behaviour on the config variant doesn't work reliably. The builtin-apply2 test really should be skipped if the current test variant is not soft-float. We already have check_effective_target_arm_hf_eabi in target-supports.exp that checks whether __ARM_PCS_VFP is defined during a compilation. So can replace both arm related lines in builtin-apply2 with /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* arm_hf_eabi} { * } { } } */ /* PR target/12503 */ /* Origin: pierre.nguyen-tu...@asim.lip6.fr */ Index: gcc/testsuite/gcc.dg/tls/pr42894.c === --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726) +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy) @@ -1,6 +1,7 @@ /* PR target/42894 */ /* { dg-do compile } */ /* { dg-options -march=armv5te -mthumb { target arm*-*-* } } */ +/* { dg-options -march=armv5te -mthumb -mfloat-abi=soft { target arm*-*-*hf } } */ /* { dg-require-effective-target tls } */ Although the original PR was for Thumb1, this is a generic test. I'm not convinced that on ARM it should try to force thumb1. Removing the original dg-options line should solve the problem and we then get better multi-lib testing as well. extern __thread int t; Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c === --- gcc/testsuite/gcc.target/arm/thumb-ltu.c (revision 201726) +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { -march=armv6 -march=armv6j -march=armv6z } } */ -/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 } */ +/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft } */ This won't work if there's an explict -mfloat-abi={softfp,hard} on the multilib options. Probably the best thing to do here is to skip the test if arm_thumb1_ok is not true. void f(unsigned a, unsigned b, unsigned c, unsigned d) { Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 201726) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2445,6 +2445,11 @@ # Must generate floating-point instructions. return 0 } +if [check-flags [list { *-*-gnueabihf } { * } { } ]] { +# Use existing float-abi and force an fpu which supports fp16 This should use arm_hf_eabi as described above. + set et_arm_fp16_flags -mfpu=vfpv4 + return 1; +} if [check-flags
RE: [PATCH,i386] Enable FMA4 for AMD bdver3
4.8.2 is already rolling, so too late for that. Is 4.8 branch (gcc/branches/gcc-4_8-branch) open? If yes, shall I commit these changes? Regards Ganesh -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: Wednesday, October 16, 2013 12:41 PM To: Uros Bizjak Cc: Gopalasubramanian, Ganesh; gcc-patches@gcc.gnu.org Subject: Re: [PATCH,i386] Enable FMA4 for AMD bdver3 On Wed, Oct 16, 2013 at 09:00:58AM +0200, Uros Bizjak wrote: On Wed, Oct 16, 2013 at 8:28 AM, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: The below patch enables FMA4 for AMD bdver3 architectures. make -k check passes. +2013-10-16 Ganesh Gopalasubramanian +ganesh.gopalasubraman...@amd.com + + * config/i386/i386.c (ix86_option_override_internal): Enable FMA4 + for AMD bdver3. + OK for mainline and 4.8 branch (for 4.8.2 if approved by Jakub, otherwise please wait for branch to open). 4.8.2 is already rolling, so too late for that. Jakub
FW: MAX_PATH problems with mingw gcc
Hi, Resending filename-normalize patch to correct list gcc-patches@gcc.gnu.org. All context, please, see below. Regards Vladimir Simonov -Original Message- From: Vladimir Simonov Sent: Wednesday, October 16, 2013 12:52 PM To: 'Jiri Dobry'; 'gcc-h...@gcc.gnu.org' Subject: RE: MAX_PATH problems with mingw gcc Hi all, Sending new version of filename-normalize patch because people asking to make it public. This version also fixes problems with gcov support of Linux binaries crosscompiled on Windows. Problem: If you do crossbuild with gcov support, binaries refer gcda files like C:\Myproject\Mydir\Myobj.gcda. After run on Linux files with names like C:\Myproject\Mydir\Myobj.gcda are created. No any directory tree. The last chunk of new patch fixes this. Best regards Vladimir Simonov -Original Message- From: Simonov, Vladimir Sent: Monday, May 27, 2013 8:15 PM To: gcc-h...@gcc.gnu.org Subject: MAX_PATH problems with mingw gcc Hi all, Constructions #include ../../some_module/some_header.h are quite popular in our project. They lead to funny for Linux user problem which becomes real headache when we build the project by mingw based cross gcc. Several levels of such include may easy exceed Windows MAX_PATH limitation for file I/O functions (260 symbols). For example g:/w/project1/product1/bundle/console/mv_system/centralized/dialogs/selectors/../../../../machines/dialogs/selectors/../../../components/machines/browsers/tree_based_group_browser/../../../hierarchy_model_browser/hierarchy_model_browser.h:37:61: fatal error: ../../remote/async_invokers/client/async_action.h: No such file or directory #include ../../remote/async_invokers/client/async_action.h As you see gcc tries to open g:/w/project1/product1/bundle/console/mv_system/centralized/dialogs/selectors/../../../../machines/dialogs/selectors/../../../components/machines/browsers/tree_based_group_browser/../../../hierarchy_model_browser/../../remote/async_invokers/client/async_action.h which length is 263 symbols. IMO the best solution would be to switch on Unicode version of file IO functions in libiberty but it is quite radical change. Attaching patch which works for me many years and illustrates another approach. Its main idea is to simplify paths in some key places working with filenames (libcpp, parse_include in directives.c and find_file_in_dir in files.c). With this patch code above compiles OK. There are many pro and contras, i.e. it adds additional, probably unnecessary work on Linux time but makes filenames shorter, it affects libiberty which is shared between gcc/binutils/gdb, but it may be useful in other packages, etc., etc. Anyway I'll be glad to hear any comments and may be to see some fix for above problem in gcc code. Best regards Vladimir Simonov gcc-4.8.1-filename-normalize.patch Description: gcc-4.8.1-filename-normalize.patch
Re: [PATCH] Move effective target check after other directives in c-c++-common/cpp/openmp-define-3.c
Hi! On Wed, 16 Oct 2013 10:54:48 +0100, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: --- a/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c +++ b/gcc/testsuite/c-c++-common/cpp/openmp-define-3.c @@ -1,6 +1,6 @@ -/* { dg-require-effective-target fopenmp } */ /* { dg-options -fopenmp } */ /* { dg-do preprocess } */ +/* { dg-require-effective-target fopenmp } */ The other testsuite files that I added in r203417 should also be adapted accordingly. Do you want to do that, or want me to do it? I'll leave the honors to you :) Sure. ;-) Pushed to trunk, r203697: gcc/testsuite/ * c-c++-common/cpp/openmp-define-1.c: Move dg-require-effective-target fopenmp after dg-do directive. * c-c++-common/cpp/openmp-define-2.c: Likewise. * gfortran.dg/openmp-define-1.f90: Likewise. * gfortran.dg/openmp-define-2.f90: Likewise. * gfortran.dg/openmp-define-3.f90: Likewise. Completion of r203588, fix-up for r203417. diff --git gcc/testsuite/c-c++-common/cpp/openmp-define-1.c gcc/testsuite/c-c++-common/cpp/openmp-define-1.c index 0e7943a..c537922 100644 --- gcc/testsuite/c-c++-common/cpp/openmp-define-1.c +++ gcc/testsuite/c-c++-common/cpp/openmp-define-1.c @@ -1,5 +1,5 @@ -/* { dg-require-effective-target fopenmp } */ /* { dg-do preprocess } */ +/* { dg-require-effective-target fopenmp } */ #ifdef _OPENMP # error _OPENMP defined diff --git gcc/testsuite/c-c++-common/cpp/openmp-define-2.c gcc/testsuite/c-c++-common/cpp/openmp-define-2.c index f89..8823e29 100644 --- gcc/testsuite/c-c++-common/cpp/openmp-define-2.c +++ gcc/testsuite/c-c++-common/cpp/openmp-define-2.c @@ -1,6 +1,6 @@ -/* { dg-require-effective-target fopenmp } */ /* { dg-options -fno-openmp } */ /* { dg-do preprocess } */ +/* { dg-require-effective-target fopenmp } */ #ifdef _OPENMP # error _OPENMP defined diff --git gcc/testsuite/gfortran.dg/openmp-define-1.f90 gcc/testsuite/gfortran.dg/openmp-define-1.f90 index d4ef1d1..4e1c076 100644 --- gcc/testsuite/gfortran.dg/openmp-define-1.f90 +++ gcc/testsuite/gfortran.dg/openmp-define-1.f90 @@ -1,6 +1,6 @@ -! { dg-require-effective-target fopenmp } ! { dg-options -cpp } ! { dg-do preprocess } +! { dg-require-effective-target fopenmp } #ifdef _OPENMP # error _OPENMP defined diff --git gcc/testsuite/gfortran.dg/openmp-define-2.f90 gcc/testsuite/gfortran.dg/openmp-define-2.f90 index 651dae7..cd167ea 100644 --- gcc/testsuite/gfortran.dg/openmp-define-2.f90 +++ gcc/testsuite/gfortran.dg/openmp-define-2.f90 @@ -1,6 +1,6 @@ -! { dg-require-effective-target fopenmp } ! { dg-options -cpp -fno-openmp } ! { dg-do preprocess } +! { dg-require-effective-target fopenmp } #ifdef _OPENMP # error _OPENMP defined diff --git gcc/testsuite/gfortran.dg/openmp-define-3.f90 gcc/testsuite/gfortran.dg/openmp-define-3.f90 index dfc45b1..3d55986 100644 --- gcc/testsuite/gfortran.dg/openmp-define-3.f90 +++ gcc/testsuite/gfortran.dg/openmp-define-3.f90 @@ -1,6 +1,6 @@ -! { dg-require-effective-target fopenmp } ! { dg-options -cpp -fopenmp } ! { dg-do preprocess } +! { dg-require-effective-target fopenmp } #ifndef _OPENMP # error _OPENMP not defined Grüße, Thomas pgp0HJQRNg51x.pgp Description: PGP signature
[C PATCH] Warn for _Alignas in an array declarator (PR c/58267)
This PR is about _Alignas in contexts like char x[_Alignas (int) 20]; C grammar does not allow _Alignas to be in an array declarator, yet we don't issue any warning or an error. This patch implements a pedwarn for this. I'm however highly unsure whether we want pedwarn, error, or normal warning for this - currently we just DTRT and ignore the _Alignas altogether. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-16 Marek Polacek pola...@redhat.com PR c/58267 c/ * c-parser.c (c_parser_direct_declarator_inner): Record whether _Alignas has been seen. * c-decl.c (build_array_declarator): Add alignas_p parameter. Warn if _Alignas occurs in an array declarator. * c-tree.h (build_array_declarator): Adjust declaration. testsuite/ * gcc.dg/c1x-align-5.c: New test. --- gcc/c/c-tree.h.mp 2013-10-15 19:36:09.548670833 +0200 +++ gcc/c/c-tree.h 2013-10-15 19:36:25.003727398 +0200 @@ -480,8 +480,8 @@ extern void c_init_decl_processing (void extern void c_print_identifier (FILE *, tree, int); extern int quals_from_declspecs (const struct c_declspecs *); extern struct c_declarator *build_array_declarator (location_t, tree, - struct c_declspecs *, - bool, bool); + struct c_declspecs *, + bool, bool, bool); extern tree build_enumerator (location_t, location_t, struct c_enum_contents *, tree, tree); extern tree check_for_loop_decls (location_t, bool); --- gcc/c/c-parser.c.mp 2013-10-15 19:33:47.706150200 +0200 +++ gcc/c/c-parser.c2013-10-15 19:35:57.557627164 +0200 @@ -3096,8 +3096,10 @@ c_parser_direct_declarator_inner (c_pars struct c_declspecs *quals_attrs = build_null_declspecs (); bool static_seen; bool star_seen; + bool alignas_seen; tree dimen; c_parser_consume_token (parser); + alignas_seen = c_parser_next_token_is_keyword (parser, RID_ALIGNAS); c_parser_declspecs (parser, quals_attrs, false, false, true, cla_prefer_id); static_seen = c_parser_next_token_is_keyword (parser, RID_STATIC); if (static_seen) @@ -3169,7 +3171,8 @@ c_parser_direct_declarator_inner (c_pars if (dimen) mark_exp_read (dimen); declarator = build_array_declarator (brace_loc, dimen, quals_attrs, - static_seen, star_seen); + static_seen, star_seen, + alignas_seen); if (declarator == NULL) return NULL; inner = set_array_declarator_inner (declarator, inner); --- gcc/c/c-decl.c.mp 2013-10-15 19:35:30.172525717 +0200 +++ gcc/c/c-decl.c 2013-10-16 11:05:06.046383071 +0200 @@ -3866,7 +3866,7 @@ quals_from_declspecs (const struct c_dec struct c_declarator * build_array_declarator (location_t loc, tree expr, struct c_declspecs *quals, bool static_p, - bool vla_unspec_p) + bool vla_unspec_p, bool alignas_p) { struct c_declarator *declarator = XOBNEW (parser_obstack, struct c_declarator); @@ -3908,6 +3908,9 @@ build_array_declarator (location_t loc, } current_scope-had_vla_unspec = true; } + if (alignas_p) +pedwarn (loc, OPT_Wpedantic, %_Alignas% meaningless in array +declarator); return declarator; } --- gcc/testsuite/gcc.dg/c1x-align-5.c.mp 2013-10-16 11:11:16.432690963 +0200 +++ gcc/testsuite/gcc.dg/c1x-align-5.c 2013-10-16 11:15:09.269514593 +0200 @@ -0,0 +1,17 @@ +/* Test C1X alignment support. Test invalid code. */ +/* { dg-do compile } */ +/* { dg-options -std=c1x -pedantic-errors } */ + +static int +foo (int a[_Alignas (0) 10]) /* { dg-error meaningless in array declarator } */ +{ + return a[0]; +} + +int +main () +{ + int a[_Alignas (int) 10];/* { dg-error meaningless in array declarator } */ + foo (a); + return 0; +} Marek
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On Wed, 16 Oct 2013, Zhenqiang Chen wrote: The patch enhances ifcombine pass to recover some non short circuit branches. Basically, it will do the following transformation: Case 1: if (cond1) if (cond2) == if (cond1 cond2) Hello, it might be more clear if you wrote (cond1 cond2) so it can't be confused with a short-circuit? Using branch probabilities to guide the decision would be nice, don't know how hard that would be though. -- Marc Glisse
Re: [PATCH] tree_code_name wrapper
Hi, bootstrap is currently broken, I'm going to test and commit the below if everything goes well. Thanks, Paolo. /// Index: pt.c === --- pt.c(revision 203698) +++ pt.c(working copy) @@ -12102,7 +12102,7 @@ tsubst (tree t, tree args, tsubst_flags_t complain gcc_unreachable (); default: - sorry (use of %qs in template, get_tree_code_name ((int) code)); + sorry (use of %qs in template, get_tree_code_name (code)); return error_mark_node; } }
[COMMITTED] LRA enabling on ARM
Hi, after the discussion in the thread below I've committed the enabling of LRA on ARM through the new option -mlra. Notice that we still rely on reload until the regression has been resolved. http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00908.html Thanks, Yvan 2013-10-16 Yvan Roux yvan.r...@linaro.org * config/arm/arm.opt (mlra): New option. * config/arm/arm.c (arm_lra_p): New function. (TARGET_LRA_P): Define. arm-lra.diff Description: Binary data
[PATCH, rs6000] Fix vsx_unpack expansions for endianness
For vector unpack operations, the meaning of high and low is reversed for little endian. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. This fixes one test case for little endian (gcc.dg/vect/vect-122.c). Ok for trunk? Thanks, Bill 2013-10-16 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for endianness. (vec_unpacks_lo_v4sf): Likewise. (vec_unpacks_float_hi_v4si): Likewise. (vec_unpacks_float_lo_v4si): Likewise. (vec_unpacku_float_hi_v4si): Likewise. (vec_unpacku_float_lo_v4si): Likewise. Index: gcc/config/rs6000/vector.md === --- gcc/config/rs6000/vector.md (revision 203508) +++ gcc/config/rs6000/vector.md (working copy) @@ -872,7 +872,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -884,7 +884,7 @@ { rtx reg = gen_reg_rtx (V4SFmode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvspdp (operands[0], reg)); DONE; }) @@ -896,7 +896,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -908,7 +908,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvsxwdp (operands[0], reg)); DONE; }) @@ -920,7 +920,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], true); + rs6000_expand_interleave (reg, operands[1], operands[1], BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; }) @@ -932,7 +932,7 @@ { rtx reg = gen_reg_rtx (V4SImode); - rs6000_expand_interleave (reg, operands[1], operands[1], false); + rs6000_expand_interleave (reg, operands[1], operands[1], !BYTES_BIG_ENDIAN); emit_insn (gen_vsx_xvcvuxwdp (operands[0], reg)); DONE; })
Re: [C PATCH] Warn for _Alignas in an array declarator (PR c/58267)
On Wed, 16 Oct 2013, Marek Polacek wrote: This PR is about _Alignas in contexts like char x[_Alignas (int) 20]; C grammar does not allow _Alignas to be in an array declarator, yet we don't issue any warning or an error. This patch implements a pedwarn for this. I'm however highly unsure whether we want pedwarn, error, or normal warning for this - currently we just DTRT and ignore the _Alignas altogether. I think this should be an error. I see this as a problem with how the parsing of array declarators uses c_parser_declspecs to parse a sequence of qualifiers and attributes, with scspec_ok=false and typespec_ok=false to ensure those kinds of specifiers can't occur in the list, but without any measure to exclude alignment specifiers. That is, I think c_parser_declspecs should gain an argument specifying whether alignment specifiers are OK, and the relevant calls from c_parser_direct_declarator_inner would pass false for that argument, and uses of alignment specifiers here would result in parse errors. (Incidentally, the comments in c-parser.c that are supposed to give the grammar accepted are missing the syntax of array-declarator.) --- gcc/testsuite/gcc.dg/c1x-align-5.c.mp 2013-10-16 11:11:16.432690963 +0200 +++ gcc/testsuite/gcc.dg/c1x-align-5.c2013-10-16 11:15:09.269514593 +0200 @@ -0,0 +1,17 @@ +/* Test C1X alignment support. Test invalid code. */ +/* { dg-do compile } */ +/* { dg-options -std=c1x -pedantic-errors } */ + +static int +foo (int a[_Alignas (0) 10]) /* { dg-error meaningless in array declarator } */ +{ + return a[0]; +} + +int +main () +{ + int a[_Alignas (int) 10]; /* { dg-error meaningless in array declarator } */ + foo (a); + return 0; +} I think this should also be tested after static (given that there are two separate calls to c_parser_declspecs involved), as well as in an abstract declarator. -- Joseph S. Myers jos...@codesourcery.com
Re: [PING] [PATCH] PR58143/58227 wrong code at -O3
Hi, On Tue, 15 Oct 2013, Richard Biener wrote: On Sun, Oct 6, 2013 at 2:22 PM, Bernd Edlinger bernd.edlin...@hotmail.de wrote: How I should proceed with this patch, is it OK? The latest version was posted at: http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html I have now attached a quick draft of a patch that rewrites the moved stmts No, you haven't :) Ciao, Michael.
RE: [PATCH] Portable Volatility Warning
Hi Richard, well I think I have now a solution for both of your comments on the initial version of the portable volatility warning patch. Furthermore I have integrated Sandra's comments. Therefore I think it might be worth another try, if you don't mind. Technically this patch is not dependent on Sandra's patch any more. However it may be useful to use -Wportable-volatility together with -fno-strict-volatile-bitmaps, to get correct code, together with a warning where the device registers are accessed in possible wrong way. As we know the -fstrict-volatile-bitmaps generates wrong code most of the time, but the warning does not cover all cases where -fstrict-volatile-bitmaps generates wrong code without Sandra's patch. On Tue, 3 Sep 2013 10:53:10, Richard Biener wrote: Just a quick first note: @@ -3470,6 +3470,13 @@ optimize_bit_field_compare (location_t l || offset != 0 || TREE_CODE (linner) == PLACEHOLDER_EXPR) return 0; + /* Do not try to optimize on volatiles, as it would defeat the + -Wportable-volatility warning later: + If the COMPONENT_REF is replaced by a BIT_FIELD_REF we loose + information and thus the warning cannot be generated correctly. */ + if (warn_portable_volatility lvolatilep) + return 0; changing code-generation with a warning option looks wrong to me. Note that I completely removed this code once (it also generates strange BIT_FIELD_REFs) but re-instantiated it on request because of lost optimizations. I just emit the warning, when the COMPONENT_REF is about to be replaced, but leave the code generation flow intact. Similar for @@ -609,11 +609,14 @@ layout_decl (tree decl, unsigned int kno occupying a complete byte or bytes on proper boundary, and not -fstrict-volatile-bitfields. If the latter is set, we unfortunately can't check TREE_THIS_VOLATILE, as a cast - may make a volatile object later. */ + may make a volatile object later. + Handle -Wportable-volatility analogous, as we would loose + information and defeat the warning later. */ if (TYPE_SIZE (type) != 0 TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT - flag_strict_volatile_bitfields = 0) + flag_strict_volatile_bitfields = 0 + warn_portable_volatility == 0) { enum machine_mode xmode = mode_for_size_tree (DECL_SIZE (decl), MODE_INT, 1); I also wonder about we unfortunately can't check TREE_THIS_VOLATILE, as a cast may make a volatile object later. - does that mean that AAPCS requires struct X { int x : 8; char c; }; void foo (struct X *p) { ((volatile struct X *)p)-x = 1; } to be treated as volatile bitfield access? Similar, is volatile struct X x; x.x = 1; a volatile bitifled access? I think the warning can be completely implemented inside struct-layout.c for example in finish_bitfield_representative (if you pass that the first field in the group, too). Of course that is at the expense of warning for struct declarations rather than actual code differences (the struct may not be used) Richard. Bootstrapped and regression-tested on x86_64-linux-gnu. OK for trunk? Thanks Bernd.2013-10-16 Bernd Edlinger bernd.edlin...@hotmail.de Implement -Wportable-volatility warning to warn about code that accesses volatile structure members for which different ABI specifications exist. * expr.c (check_portable_volatility): New function. (expand_assignment): Call check_portable_volatility. (expand_real_expr_1): Likewise. * expr.h (check_portable_volatility): New function. * fold-const.c (optimize_bit_field_compare): Call check_portable_volatility. Removed if-statement, because condition flag_strict_volatile_bitfields 0 is always false. * c-family/c.opt: Add -Wportable-volatility option. * doc/invoke.texi: Add documentation about -Wportable-volatility. testsuite: c-c++-common/ * Wportable-volatility-1.c: New testcase. * Wportable-volatility-2.c: New testcase. patch-portable-volatility.diff Description: Binary data
Re: [patch] (un)fix loop unswitching.
Bah, bounced due to html or something. My latest email client upgrade must have lost the 'plain text only' attribute.. On 10/16/2013 08:02 AM, Andrew MacLeod wrote: More or less the same patch that I put in bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58697 I've bootstrapped it and run it through regressions. Jakubs test case fails again like it did before my changes, so it no longer masks the problem. Loop unswitching was calling estimate_loop_iterations_int, which for various annoying reasons ended up in cfgloop and called get_estimate_loop_iterations, so it wasn't doing the calculations. Hopefully its clearer now. The end result now is that cfgloop.c has 4 accessor routines which just retrieve info: get_estimated_loop_iterations, get_estimated_loop_iterations_int, get_max_loop_iterations, and get_max_loop_iterations_int. tree-ssa-loop-niter.c has 4 routines which create the information first, if they can, then retrieve the information. estimated_loop_iterations, estimated_loop_iterations_int, max_loop_iterations, and max_loop_iterations_int. all the cfg* routines call the 'get_' variety, and all the tree-ssa routines call the non 'get_' variety to calculate things if need be. Bootstraps on x86_64-unknown-linux-gnu and no regressions. OK? I presume Jakub's test case isn't in the testsuite anywhere then? I presume it would go in along with a fix to the original problem Andrew * cfgloop.c (get_estimated_loop_iterations_int): Rename from estimated_loop_iterations_int. (max_stmt_executions_int): Call get_max_loop_iterations_int. (get_max_loop_iterations_int): New. HWINT version of get_max_loop_iterations. * cfgloop.h: Add prototypes. * loop-iv.c (find_simple_exit): call get_estimated_loop_iterations_int. * loop-unroll.c (decide_peel_once_rolling): Call get_estimated_loop_iterations_int. * tree-ssa-loop-niter.c (estimated_loop_iterations_int): Add back. * tree-ssa-loop-niter.h: Tweak prototypes. Index: cfgloop.c === *** cfgloop.c (revision 203625) --- cfgloop.c (working copy) *** record_niter_bound (struct loop *loop, d *** 1815,1826 loop-nb_iterations_estimate = loop-nb_iterations_upper_bound; } ! /* Similar to estimated_loop_iterations, but returns the estimate only if it fits to HOST_WIDE_INT. If this is not the case, or the estimate on the number of iterations of LOOP could not be derived, returns -1. */ HOST_WIDE_INT ! estimated_loop_iterations_int (struct loop *loop) { double_int nit; HOST_WIDE_INT hwi_nit; --- 1815,1826 loop-nb_iterations_estimate = loop-nb_iterations_upper_bound; } ! /* Similar to get_estimated_loop_iterations, but returns the estimate only if it fits to HOST_WIDE_INT. If this is not the case, or the estimate on the number of iterations of LOOP could not be derived, returns -1. */ HOST_WIDE_INT ! get_estimated_loop_iterations_int (struct loop *loop) { double_int nit; HOST_WIDE_INT hwi_nit; *** estimated_loop_iterations_int (struct lo *** 1842,1848 HOST_WIDE_INT max_stmt_executions_int (struct loop *loop) { ! HOST_WIDE_INT nit = max_loop_iterations_int (loop); HOST_WIDE_INT snit; if (nit == -1) --- 1842,1848 HOST_WIDE_INT max_stmt_executions_int (struct loop *loop) { ! HOST_WIDE_INT nit = get_max_loop_iterations_int (loop); HOST_WIDE_INT snit; if (nit == -1) *** get_max_loop_iterations (struct loop *lo *** 1891,1893 --- 1891,1915 *nit = loop-nb_iterations_upper_bound; return true; } + + /* Similar to get_max_loop_iterations, but returns the estimate only +if it fits to HOST_WIDE_INT. If this is not the case, or the estimate +on the number of iterations of LOOP could not be derived, returns -1. */ + + HOST_WIDE_INT + get_max_loop_iterations_int (struct loop *loop) + { + double_int nit; + HOST_WIDE_INT hwi_nit; + + if (!get_max_loop_iterations (loop, nit)) + return -1; + + if (!nit.fits_shwi ()) + return -1; + hwi_nit = nit.to_shwi (); + + return hwi_nit 0 ? -1 : hwi_nit; + } + + Index: cfgloop.h === *** cfgloop.h (revision 203625) --- cfgloop.h (working copy) *** loop_outermost (struct loop *loop) *** 740,747 } extern void record_niter_bound (struct loop *, double_int, bool, bool); ! extern HOST_WIDE_INT estimated_loop_iterations_int (struct loop *); ! extern HOST_WIDE_INT max_loop_iterations_int (struct loop *); extern bool get_estimated_loop_iterations (struct loop *loop, double_int *nit); extern bool get_max_loop_iterations (struct loop *loop, double_int *nit); --- 740,747 } extern void record_niter_bound (struct loop *, double_int, bool, bool); ! extern HOST_WIDE_INT get_estimated_loop_iterations_int (struct loop *); ! extern
Re: Minor ChangeLog goof
On Tue, Oct 15, 2013 at 09:53:22AM -0600, Jeff Law wrote: Martin put his entry in the wrong ChangeLog. (toplevel instead of inside the gcc subdirectory). Ouch, sorry and thanks for spotting and fixing it. Martin Fixed in the obvious way. diff --git a/ChangeLog b/ChangeLog index 63c6cd8..0d3c199 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,9 +1,3 @@ -2013-10-15 Martin Jambor mjam...@suse.cz - - * ipa-utils.h (ipa_edge_within_scc): Declare. - * ipa-cp.c (edge_within_scc): Moved... - * ipa-utils.c (ipa_edge_within_scc): ...here. Updated all callers. - 2013-01-10 Joern Rennecke joern.renne...@embecosm.com Import from savannah.gnu.org: diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 4142dd2..09e2494 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -753,6 +753,11 @@ (avx512f_getmantmode): Ditto. (avx512f_rndscalemode): Fix formatting. +2013-10-15 Martin Jambor mjam...@suse.cz + + * ipa-utils.h (ipa_edge_within_scc): Declare. + * ipa-cp.c (edge_within_scc): Moved... + * ipa-utils.c (ipa_edge_within_scc): ...here. Updated all callers. 2013-10-15 Alexander Ivchenko alexander.ivche...@intel.com Maxim Kuznetsov maxim.kuznet...@intel.com
Re: Patch: Add #pragma ivdep support to the ME and C FE
Hi, Just one question. You describe the pragma in the doco patch as: +This pragma tells the compiler that the immediately following @code{for} +loop can be executed in any loop index order without affecting the result. +The pragma aids optimization and in particular vectorization as the +compiler can then assume a vectorization safelen of infinity. I'm not a specialist, but I was always told that the 'original' meaning of ivdep (which I believe was introduced by Cray), was that the compiler could assume that there are only forward dependencies in the loop, but not that it can be executed in any order. The Intel docs give this example: void ignore_vec_dep(int *a, int k, int c, int m) { #pragma ivdep for (int i = 0; i m; i++) a[i] = a[i + k] * c; } Given your description, this loop wouldn't be a candidate for ivdep, as reversing the loop index order changes the semantics. I believe that the way you interpret it (ie. setting vectorization safe length to INT_MAX) is correct with respect to this other definition, though. Is there any normative definition for such language extensions ? Oh, and are there any plans to maintain this information in some way till the back-end? Software pipelining could be another huge winner for that kind of dependency analysis simplification. Thanks, Fred On 10 October 2013 10:45, Tobias Burnus bur...@net-b.de wrote: Richard: Could you review the Gimple part? Thanks! Joseph: Could you have a look at the C part? Thanks! Jakub Jelinek wrote: + if (loop-latch) + FOR_EACH_EDGE (e, ei, loop-latch-succs) + { + if (e-dest-flags BB_RTL) + break; I'd prefer Richard to review this (and probably Joseph the C FE part). You can't really have loop-latch in GIMPLE and the successors in RTL, so perhaps you can check that in the if (loop-latch) check already. GIMPLE_COND must be the last in the bb, can't be followed by debug stmts, so you can safely use just gsi_last_bb (e-dest) instead. I have done the two modifications you suggested, fixed the formatting glitches. And completed it by another all-language bootstrap and regtesting. (Including also the separately posted Fortran [1] and the C++ [2] patches.) OK for the trunk? [1] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00517.html [2] http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00656.html +#define SET_ANNOTATE_EXPR_ID(NODE, ID) \ + (ANNOTATE_EXPR_CHECK(NODE)-base.u.version = ID) Likewise. Shouldn't it be = (ID) ? Probably less relevant for assignments, but for safety, it cannot harm. Tobias
RE: [PATCH] tree_code_name wrapper
-Original Message- From: Paolo Carlini [mailto:paolo.carl...@oracle.com] Sent: 16 October 2013 12:54 To: Richard Biener; Paulo Matos Cc: Jakub Jelinek; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree_code_name wrapper Hi, bootstrap is currently broken, I'm going to test and commit the below if everything goes well. Sorry all, it might be because I did --enable-languages=c and that was in the C++ part, even though with g++ being used to bootstrap I would have expected it to break the build anyway. Thanks Paolo for fixing it. -- Paulo Matos
Re: libgo patch committed: Don't clobber context when catching signal
On Wed, Oct 16, 2013 at 12:24 AM, Uros Bizjak ubiz...@gmail.com wrote: This patch to libgo fixes a dumb bug in which catching a signal via the os/signal package could clobber the saved split-stack context, leading to disaster if the signal arrived at the wrong time. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Will commit to 4.8 branch when it reopens. Attached patch is re-enables bootstrap on targets without split stack due to unused variable warning. The patch was tested on alphaev68-linux-gnu [1]. Whoops, sorry about that. Committed to mainline. Ian
website patch committed: 4.8.2 supports Go 1.1.2
I committed this patch to the website to document that the 4.8.2 release includes full support for Go 1.1.2. Ian Index: gcc-4.8/changes.html === RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.8/changes.html,v retrieving revision 1.122 diff -u -r1.122 changes.html --- gcc-4.8/changes.html 16 Oct 2013 06:57:54 - 1.122 +++ gcc-4.8/changes.html 16 Oct 2013 13:46:57 - @@ -441,6 +441,8 @@ h3 id=goGo/h3 ul +liGCC 4.8.2 provides a complete implementation of the Go 1.1.2 + release./li liGCC 4.8.0 and 4.8.1 implement a preliminary version of the Go 1.1 release. The library support is not quite complete./li liGo has been tested on GNU/Linux and Solaris platforms for
[wide-int] int_traits tree
Speaking in terms of a patch: Index: tree.h === --- tree.h (revision 203690) +++ tree.h (working copy) @@ -5184,14 +5184,11 @@ wi::int_traits const_tree::decompose ( / HOST_BITS_PER_WIDE_INT); unsigned int xprecision = get_precision (x); - gcc_assert (precision = xprecision); + gcc_checking_assert (precision = xprecision precision != 0); - /* Got to be careful of precision 0 values. */ - if (precision) -len = MIN (len, max_len); if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) { - unsigned int small_prec = precision (HOST_BITS_PER_WIDE_INT - 1); + unsigned int small_prec = xprecision (HOST_BITS_PER_WIDE_INT - 1); if (small_prec) { /* We have to futz with this because the canonization for @@ -5201,7 +5198,7 @@ wi::int_traits const_tree::decompose ( { for (unsigned int i = 0; i len - 1; i++) scratch[i] = val[i]; - scratch[len - 1] = sext_hwi (val[len - 1], precision); + scratch[len - 1] = sext_hwi (val[len - 1], small_prec); return wi::storage_ref (scratch, len, precision); } } the precision 0 thing is gone as far as I understand? Also sign-extending the upper hwi must be done with precision % HOST_BITS_PER_WIDE_INT, no? And it assumes that we only ever have to extend the upper most HWI, thus 'len' is never too big. I note that we immediately return in the above case, so if (precision xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. I assume it adds that extra zero word in case the MSB is set, right? I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like template typename T wide_int_ref wi::ref (const T t, HOST_WIDE_INT *scratch = XALLOCAVEC (HOST_WIDE_INT, WIDE_INT_MAX_ELTS)) { return wide_int_ref (t, scratch); } and amend the storage constructors to get a scratch argument. The reason for all this is that otherwise the wide_int_ref object escapes when we pass the scratch storage to the workers in int_traits const_tree. Thanks, Richard.
[c++-concepts] Shorthand notation
I've committed initial support for shorthand constraints. This patch adds support for parsing a concept-names as non-class names. When introducing a template parameter, the concept name is transformed into a constraint on the template parameter. Constrained parameters can introduce type, non-type and template template parameters. This has initial support for variadic constraints, but it's not well tested. This patch does not yet support default arguments for constrained template parameters, nor does it support the use of concept ids of this form: templatetypename T, FunctionT F void f(); There are a couple of interesting things in the patch. I'm using a PLACEHOLDER_EXPR as a template argument in order to resolve constraint names. Effectively, I deduce concepts by creating an expression like: Equality_comparable?() where ? is a placeholder, and after coerce_template_arguments completes, I can extract the matched parameter from the placeholder. This works nicely when concepts are overloaded or have default arguments (although I'm not sure I'm testing that very thoroughly right now). With variadic constraints, I've had to add functionality to expand a pack as a conjunction of requirements. For example, if you declare: templateClass... Ts // ClassT() must be true for each T in Ts void f(); The transformation is: templatetypename... Ts requires ClassTs()... void f(); Where ClassTs()... expands to ClassT1() ClassT2() ... etc. I feel like the current implementation is a bit hacky, and I'm wondering if I should introduce a new node for a pack conjunction. Change log follows. 2013-10-16 Andrew Sutton andrew.n.sut...@gmail.com * gcc/cp/constraint.cc (conjoin_requiremens): New. (resolve_constraint_check): Filter non-concept candidates before coercing arguments. Perform deduction in a template-decl processing context to prevent errors during diagnosis. (finish_concept_name), (finish_shorthand_requirement), (get_shorthand_requirements): New. * gcc/cp/pt.c (template_parm_to_arg): Make non-static. (process_templat_parm): Build shorthand requirements from the parameter description. (end_templat_parm_list): New. (convert_placeholder_argument): New. (convert_template_argument): Match placeholder arguments against any template parameter. (tsubst_pack_conjuction): New. (tsubst_expr): Expand a pack as a conjunction. (type_dependent_expression_p): Placeholders are always type dependent. * gcc/cp/parser.c (cp_is_constrained_parameter), (cp_finish_template_type_parm), (cp_finish_template_template_parm) (cp_finish_non_type_template_parm), (cp_finish_constrined_parameter): New. (cp_parser_template_parameter): Handle constrained parameters. (cp_parser_nonclass_name): An identifier naming an overload set may declare a constrained parameter. (cp_parser_type_parameter), (cp_parser_template_declaration_after_exp): Get shorthand requirements from the tmeplate parameter list. * gcc/cp/cp-tree.h (TEMPLATE_PARM_CONSTRAINTS): New. Committed in 203704. Andrew Sutton
Re: [patch] (un)fix loop unswitching.
On Wed, Oct 16, 2013 at 2:56 PM, Andrew MacLeod amacl...@redhat.com wrote: Bah, bounced due to html or something. My latest email client upgrade must have lost the 'plain text only' attribute.. On 10/16/2013 08:02 AM, Andrew MacLeod wrote: More or less the same patch that I put in bugzilla http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58697 I've bootstrapped it and run it through regressions. Jakubs test case fails again like it did before my changes, so it no longer masks the problem. Loop unswitching was calling estimate_loop_iterations_int, which for various annoying reasons ended up in cfgloop and called get_estimate_loop_iterations, so it wasn't doing the calculations. Hopefully its clearer now. The end result now is that cfgloop.c has 4 accessor routines which just retrieve info: get_estimated_loop_iterations, get_estimated_loop_iterations_int, get_max_loop_iterations, and get_max_loop_iterations_int. tree-ssa-loop-niter.c has 4 routines which create the information first, if they can, then retrieve the information. estimated_loop_iterations, estimated_loop_iterations_int, max_loop_iterations, and max_loop_iterations_int. all the cfg* routines call the 'get_' variety, and all the tree-ssa routines call the non 'get_' variety to calculate things if need be. Bootstraps on x86_64-unknown-linux-gnu and no regressions. OK? I Ok. Thanks, Richard. presume Jakub's test case isn't in the testsuite anywhere then? I presume it would go in along with a fix to the original problem Andrew
Re: [PATCH, rs6000] Fix vsx_unpack expansions for endianness
On Wed, Oct 16, 2013 at 8:22 AM, Bill Schmidt wschm...@linux.vnet.ibm.com wrote: For vector unpack operations, the meaning of high and low is reversed for little endian. Bootstrapped and tested on powerpc64{,le}-unknown-linux-gnu with no regressions. This fixes one test case for little endian (gcc.dg/vect/vect-122.c). Ok for trunk? Thanks, Bill 2013-10-16 Bill Schmidt wschm...@linux.vnet.ibm.com * gcc/config/rs6000/vector.md (vec_unpacks_hi_v4sf): Correct for endianness. (vec_unpacks_lo_v4sf): Likewise. (vec_unpacks_float_hi_v4si): Likewise. (vec_unpacks_float_lo_v4si): Likewise. (vec_unpacku_float_hi_v4si): Likewise. (vec_unpacku_float_lo_v4si): Likewise. Okay. Thanks, David
Re: Patch: Add #pragma ivdep support to the ME and C FE
Frederic Riss wrote: Just one question. You describe the pragma in the doco patch as: +This pragma tells the compiler that the immediately following @code{for} +loop can be executed in any loop index order without affecting the result. +The pragma aids optimization and in particular vectorization as the +compiler can then assume a vectorization safelen of infinity. I'm not a specialist, but I was always told that the 'original' meaning of ivdep (which I believe was introduced by Cray), was that the compiler could assume that there are only forward dependencies in the loop, but not that it can be executed in any order. The nice thing about #pragma ivdep is that there is no real standard. And the explanation of the different vendors is also not completely clear. Some overview about this is given in the following file on pages 13-14 for Cray Reaseach PVP, MIPSPRO Open64, Intel ICC, Multiflow http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf That's summerized as: - vector: ignore lexical upward dependencies (Cray PVP, Intel ICC) - parallel: ignore loop-carried dependencies (MIPSPRO, Open64) - liberal: ignore loop-variant dependencies (Multiflow) The quotes for Cray and Intel are below. Cray: http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH The ivdep directive tells the compiler to ignore vector dependencies for the loop immediately following the directive. Conditions other than vector dependencies can inhibit vectorization. If these conditions are satisfactory, the loop vectorizes. This directive is useful for some loops that contain pointers and indirect addressing. The format of this directive is as follows: #pragma _CRI ivdep Intel: http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm The ivdep pragma instructs the compiler to ignore assumed vector dependencies. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Use this pragma only when you know that the assumed loop dependencies are safe to ignore. The Intel docs give this example: ... Given your description, this loop wouldn't be a candidate for ivdep, as reversing the loop index order changes the semantics. I believe that the way you interpret it (ie. setting vectorization safe length to INT_MAX) is correct with respect to this other definition, though. Do you have a suggestion for a better wording? My idea was to interpret this part similar to OpenMP's simd with safelen=infinity. (Actually, I believe loop-safelen was added for OpenMPv4's and/or Cilk Plus's simd.) OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states for this (excerpt from page 70): A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if the associated loop(s) were executed with no SIMD instructions. If the safelen clause is used then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. The parameter of the safelen clause must be a constant positive integer expression. The number of iterations that are executed concurrently at any given time is implementation defined. Each concurrent iteration will be executed by a different SIMD lane. Each set of concurrent iterations is a SIMD chunk. Oh, and are there any plans to maintain this information in some way till the back-end? Software pipelining could be another huge winner for that kind of dependency analysis simplification. I don't know until when loop-safelen is kept. As it is late in the middle-end, providing the backend with this information should be simple. Tobias
[AArch64] Fix preferred_reload_class for regclass STACK_REG.
The existing AArch64 implementation of preferred_reload_class when called with a destination regclass STACK_REG returns the wider register class GENERAL_REG. This patch changes the behavior such that we only return a narrower class (or NO_REGS) and never return a wider class. I'll leave this on the list 24h before committing. /Marcus 2013-10-16 Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.c (aarch64_preferred_reload_class): Adjust handling of STACK_REG.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index da3962f..7fce7a0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4219,9 +4219,18 @@ aarch64_class_max_nregs (reg_class_t regclass, enum machine_mode mode) static reg_class_t aarch64_preferred_reload_class (rtx x, reg_class_t regclass) { - if (regclass == POINTER_REGS || regclass == STACK_REG) + if (regclass == POINTER_REGS) return GENERAL_REGS; + if (regclass == STACK_REG) +{ + if (REG_P(x) + reg_class_subset_p (REGNO_REG_CLASS (REGNO (x)), POINTER_REGS)) + return regclass; + + return NO_REGS; +} + /* If it's an integer immediate that MOVI can't handle, then FP_REGS is not an option, so we return NO_REGS instead. */ if (CONST_INT_P (x) reg_class_subset_p (regclass, FP_REGS)
[AArch64] Classify FRAME_POINTER_REGNUM and ARG_POINTER_REGNUM as POINTER_REGS.
The existing aarch64 implementation of REGNO_REGCLASS classifies the soft frame and arg registers as CORE_REGS. However either could be eliminated against FP or SP, in case of the latter CORE_REGS is too narrow. This patch changes the classification to POINTER_REGS a superset of CORE_REGS that includes SP. Regressed aarch64-none-elf, committed. /Marcus 2013-10-16 Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.c (aarch64_regno_regclass): Classify FRAME_POINTER_REGNUM and ARG_POINTER_REGNUM as POINTER_REGS.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index f6f587a..da3962f 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3904,7 +3904,7 @@ aarch64_regno_regclass (unsigned regno) if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM) -return CORE_REGS; +return POINTER_REGS; if (FP_REGNUM_P (regno)) return FP_LO_REGNUM_P (regno) ? FP_LO_REGS : FP_REGS;
[AArch64] Fix output template for Scalar Neon-Neon register move.
Hi, To move a scalar char/short/int around in the vector registers there is no such instruction as: dup v0, v0.h[0] But there is: dup h0, v0.h[0] (Alternately there is dup v0.4h, v0.h[0], but I don't think that is what we are aiming for). Fix the output template we are using to reflect this. aarch64.exp came back clean and the correct instruction form is now generated. OK? Thanks, James --- 2013-10-14 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (*movmode_aarch64): Fix output template for DUP (element) Scalar. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 01664665e7d309f2cf2076bdc3ca6e0825612cea..758be47420e95fad74c57c1a9dcb7934b87c141e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -789,7 +789,7 @@ (define_insn *movmode_aarch64 case 8: return dup\t%0.Vallxd, %w1; case 9: - return dup\t%0, %1.v[0]; + return dup\t%Vetype0, %1.v[0]; default: gcc_unreachable (); }
Re: [AArch64] Fix output template for Scalar Neon-Neon register move.
On 16 October 2013 15:58, James Greenhalgh james.greenha...@arm.com wrote: Hi, To move a scalar char/short/int around in the vector registers there is no such instruction as: dup v0, v0.h[0] But there is: dup h0, v0.h[0] (Alternately there is dup v0.4h, v0.h[0], but I don't think that is what we are aiming for). Fix the output template we are using to reflect this. aarch64.exp came back clean and the correct instruction form is now generated. OK? OK /Marcus
Re: [C PATCH] Warn for _Alignas in an array declarator (PR c/58267)
On Wed, Oct 16, 2013 at 12:28:32PM +, Joseph S. Myers wrote: On Wed, 16 Oct 2013, Marek Polacek wrote: This PR is about _Alignas in contexts like char x[_Alignas (int) 20]; C grammar does not allow _Alignas to be in an array declarator, yet we don't issue any warning or an error. This patch implements a pedwarn for this. I'm however highly unsure whether we want pedwarn, error, or normal warning for this - currently we just DTRT and ignore the _Alignas altogether. I think this should be an error. Makes sense. I see this as a problem with how the parsing of array declarators uses c_parser_declspecs to parse a sequence of qualifiers and attributes, with scspec_ok=false and typespec_ok=false to ensure those kinds of specifiers can't occur in the list, but without any measure to exclude alignment specifiers. That is, I think c_parser_declspecs should gain an argument specifying whether alignment specifiers are OK, and the relevant calls from c_parser_direct_declarator_inner would pass false for that argument, and uses of alignment specifiers here would result in parse errors. Thanks, should be done in the attached patch. (Incidentally, the comments in c-parser.c that are supposed to give the grammar accepted are missing the syntax of array-declarator.) I've fixed that up. I think this should also be tested after static (given that there are two separate calls to c_parser_declspecs involved), as well as in an abstract declarator. Done. Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-16 Marek Polacek pola...@redhat.com PR c/58267 c/ * c-parser.c (c_parser_declspecs): Add alignspec_ok parameter. Document syntax of the array-declarator. (c_parser_declspecs) RID_ALIGNAS: Bail out if alignment specs are not permitted. (c_parser_declaration_or_fndef): Adjust c_parser_declspecs call. (c_parser_struct_declaration): Likewise. (c_parser_declarator): Likewise. (c_parser_direct_declarator_inner): Likewise. (c_parser_parameter_declaration): Likewise. (c_parser_type_name): Likewise. testsuite/ * gcc.dg/c1x-align-5.c: New test. --- gcc/c/c-parser.c.mp 2013-10-16 15:00:36.774530501 +0200 +++ gcc/c/c-parser.c2013-10-16 16:31:30.731611508 +0200 @@ -1124,7 +1124,7 @@ static void c_parser_declaration_or_fnde static void c_parser_static_assert_declaration_no_semi (c_parser *); static void c_parser_static_assert_declaration (c_parser *); static void c_parser_declspecs (c_parser *, struct c_declspecs *, bool, bool, - bool, enum c_lookahead_kind); + bool, bool, enum c_lookahead_kind); static struct c_typespec c_parser_enum_specifier (c_parser *); static struct c_typespec c_parser_struct_or_union_specifier (c_parser *); static tree c_parser_struct_declaration (c_parser *); @@ -1494,7 +1494,8 @@ c_parser_declaration_or_fndef (c_parser fndef_ok = !nested; } - c_parser_declspecs (parser, specs, true, true, start_attr_ok, cla_nonabstract_decl); + c_parser_declspecs (parser, specs, true, true, start_attr_ok, + true, cla_nonabstract_decl); if (parser-error) { c_parser_skip_to_end_of_block_or_statement (parser); @@ -1942,8 +1943,9 @@ c_parser_static_assert_declaration_no_se /* Parse some declaration specifiers (possibly none) (C90 6.5, C99 6.7), adding them to SPECS (which may already include some). Storage class specifiers are accepted iff SCSPEC_OK; type - specifiers are accepted iff TYPESPEC_OK; attributes are accepted at - the start iff START_ATTR_OK. + specifiers are accepted iff TYPESPEC_OK; alignment specifiers are + accepted iff ALIGNSPEC_OK; attributes are accepted at the start + iff START_ATTR_OK. declaration-specifiers: storage-class-specifier declaration-specifiers[opt] @@ -2039,7 +2041,7 @@ c_parser_static_assert_declaration_no_se static void c_parser_declspecs (c_parser *parser, struct c_declspecs *specs, bool scspec_ok, bool typespec_ok, bool start_attr_ok, - enum c_lookahead_kind la) + bool alignspec_ok, enum c_lookahead_kind la) { bool attrs_ok = start_attr_ok; bool seen_type = specs-typespec_kind != ctsk_none; @@ -2235,6 +2237,8 @@ c_parser_declspecs (c_parser *parser, st declspecs_add_attrs (loc, specs, attrs); break; case RID_ALIGNAS: + if (!alignspec_ok) + goto out; align = c_parser_alignas_specifier (parser); declspecs_add_alignas (loc, specs, align); break; @@ -2640,7 +2644,8 @@ c_parser_struct_declaration (c_parser *p } specs = build_null_declspecs (); decl_loc = c_parser_peek_token (parser)-location; - c_parser_declspecs (parser, specs, false, true, true, cla_nonabstract_decl); + c_parser_declspecs (parser, specs, false, true, true, +
Re: [PATCH v2 1/4] Ignore access-control keywords when parsing fields.
On Tue, 2013-10-15 at 12:36 -0600, Jeff Law wrote: On 09/24/13 11:49, David Malcolm wrote: [...] * gengtype-parse.c (struct_field_seq): Ignore access-control keywords (public: etc). OK. Thanks; committed to trunk as r203708.
Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
On Tue, 2013-10-15 at 13:19 +, Paulo Matos wrote: -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: 15 October 2013 10:51 To: Paulo Matos Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree_code_name wrapper On Tue, Oct 15, 2013 at 09:42:17AM +, Paulo Matos wrote: Thanks, regarding the indentation I was convinced we used 2 space indentation with maximum line length of 80 characters. We do, however 8 consecutive spaces in the indentation should be always replaced by a tab. This means that every sequence of 8 spaces should be converted into tabs? So, if we indent something 4 times, that becomes a tab instead of 4 times 2 spaces. Just to confirm so I can setup my emacs to use this. Probably a silly question, but what's the invocation in emacs to do this, and is it possible to set up the source tree so that this happens automatically for everyone? (I've been using whitespace-mode and manually copying-and-pasting tab characters, which is clearly suboptimal).
Re: [patch] combine ICE fix
On 10/15/13 12:16 PM, Jeff Law wrote: On 10/10/13 10:25, Jakub Jelinek wrote: On Thu, Oct 10, 2013 at 07:26:43AM -0700, Cesar Philippidis wrote: This patch addresses an ICE when building qemu for a Mips target in Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially affected. The problem occurs because the instruction combine phase uses two data structures to keep track of registers, reg_stat and regstat_n_sets_and_refs, but they potentially end up out of sync; when combine inserts a new register into reg_stat it does not update regstat_n_sets_and_refs. Failing to update the latter results in an occasional segmentation fault. Is this OK for trunk and gcc-4.8? If so, please check it in. I tested it on Mips and x86-64 and no regressions showed up. That looks broken. You leave everything from the last size till the current one uninitialized, so it would only work if combine_split_insns always increments max_reg_num () by at most one. I don't think that assumption is safe. Consider a parallel with a bunch of (clobber (match_scratch)) expressions. I address that in the patch posted here http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00802.html. Is that still insufficient? Furthermore, there are macros which should be used to access the fields, and, if the vector is ever going to be resized, supposedly it should be vec.h vector rather than just array. Or perhaps take into account: /* If a pass need to change these values in some magical way or the pass needs to have accurate values for these and is not using incremental df scanning, then it should use REG_N_SETS and REG_N_USES. If the pass is doing incremental scanning then it should be getting the info from DF_REG_DEF_COUNT and DF_REG_USE_COUNT. */ and not use REG_N_SETS etc. but instead the df stuff. Which begs the question, how exactly is combine utilizing the regstat_n_* structures and is that use even valid for combine? I'll take a look at that. Cesar
RE: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
For me it worked with: (add-hook 'c-mode-hook 'gcc-c-mode) (defun gcc-c-mode() ;; set gnu style. (c-set-style gnu) ;; TAB offset set to 2 (setq c-basic-offset 2) ;; Tabs should be 8 (setq indent-tabs-mode t) ) Then I simply edit all my software as if it was GCC (haven't found a way to say : 'only use this for GCC sources'). I am just missing a good mode for .md files. Paulo Matos -Original Message- From: David Malcolm [mailto:dmalc...@redhat.com] Sent: 16 October 2013 16:33 To: Paulo Matos Cc: Jakub Jelinek; Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper) On Tue, 2013-10-15 at 13:19 +, Paulo Matos wrote: -Original Message- From: Jakub Jelinek [mailto:ja...@redhat.com] Sent: 15 October 2013 10:51 To: Paulo Matos Cc: Richard Biener; Paolo Carlini; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] tree_code_name wrapper On Tue, Oct 15, 2013 at 09:42:17AM +, Paulo Matos wrote: Thanks, regarding the indentation I was convinced we used 2 space indentation with maximum line length of 80 characters. We do, however 8 consecutive spaces in the indentation should be always replaced by a tab. This means that every sequence of 8 spaces should be converted into tabs? So, if we indent something 4 times, that becomes a tab instead of 4 times 2 spaces. Just to confirm so I can setup my emacs to use this. Probably a silly question, but what's the invocation in emacs to do this, and is it possible to set up the source tree so that this happens automatically for everyone? (I've been using whitespace-mode and manually copying-and-pasting tab characters, which is clearly suboptimal).
Re: [wide-int] int_traits tree
On 10/16/2013 09:55 AM, Richard Biener wrote: Speaking in terms of a patch: Index: tree.h === --- tree.h (revision 203690) +++ tree.h (working copy) @@ -5184,14 +5184,11 @@ wi::int_traits const_tree::decompose ( / HOST_BITS_PER_WIDE_INT); unsigned int xprecision = get_precision (x); - gcc_assert (precision = xprecision); + gcc_checking_assert (precision = xprecision precision != 0); - /* Got to be careful of precision 0 values. */ - if (precision) -len = MIN (len, max_len); if (TYPE_SIGN (TREE_TYPE (x)) == UNSIGNED) { - unsigned int small_prec = precision (HOST_BITS_PER_WIDE_INT - 1); + unsigned int small_prec = xprecision (HOST_BITS_PER_WIDE_INT - 1); if (small_prec) { /* We have to futz with this because the canonization for @@ -5201,7 +5198,7 @@ wi::int_traits const_tree::decompose ( { for (unsigned int i = 0; i len - 1; i++) scratch[i] = val[i]; - scratch[len - 1] = sext_hwi (val[len - 1], precision); + scratch[len - 1] = sext_hwi (val[len - 1], small_prec); return wi::storage_ref (scratch, len, precision); } } the precision 0 thing is gone as far as I understand? There is still the case where the front ends create types with precision 0, and that causes a few odd cases, but the precision 0 that you refer to is gone. Also sign-extending the upper hwi must be done with precision % HOST_BITS_PER_WIDE_INT, no? And it assumes that we only ever have to extend the upper most HWI, thus 'len' is never too big. yes I note that we immediately return in the above case, so if (precision xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. yes I assume it adds that extra zero word in case the MSB is set, right? yes I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? I do not think so. There are three cases here: 1) xprecision precision: say xprecision = 8 and precision = 32. The number is between 0 and 255 and after canonicalization the number will be between 0 and 255. 2) xprecision == precision not much to talk about here. 3) xprecision precision We need to loose the upper bits of the input and it does not matter what they were.The only thing that we are concerned about is that the bits above what is now the sign bit pointed to by precision are matched in the bits above precision. I do think we could change is that the if (precision xprecision + HWI) to become else if (...), which will save a test. After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. template typename T wide_int_ref wi::ref (const T t, HOST_WIDE_INT *scratch = XALLOCAVEC (HOST_WIDE_INT, WIDE_INT_MAX_ELTS)) { return wide_int_ref (t, scratch); } and amend the storage constructors to get a scratch argument. The reason for all this is that otherwise the wide_int_ref object escapes when we pass the scratch storage to the workers in int_traits const_tree. Thanks, Richard.
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
Paulo Matos pma...@broadcom.com writes: (haven't found a way to say : 'only use this for GCC sources') You could put it in .dir-locals.el, or put a hook on find-file-hook that looks at buffer-file-name. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
[PATCH i386 AVX2] Remove redundant expands.
Hello, It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. ChangeLog: * config/i386/sse.md (vec_widen_umult_even_v8si): Remove expand, make insn visible, remove redundant check. (vec_widen_smult_even_v8si): Ditto. (avx2_pmaddwd): Ditto. (avx2_eqmode3): Ditto. (avx512f_eqmode3): Ditto. Bootrstrap pass. All AVX* tests pass. Is it ok to commit to main trunk? -- Thanks, K --- gcc/ChangeLog | 9 gcc/config/i386/sse.md | 119 ++--- 2 files changed, 23 insertions(+), 105 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index ebaa3e0..b25d8eb 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,12 @@ +2013-10-16 Kirill Yukhin kirill.yuk...@intel.com + + * config/i386/sse.md (vec_widen_umult_even_v8si): Remove expand, + make insn visible, remove redundant check. + (vec_widen_smult_even_v8si): Ditto. + (avx2_pmaddwd): Ditto. + (avx2_eqmode3): Ditto. + (avx512f_eqmode3): Ditto. + 2013-10-16 Yvan Roux yvan.r...@linaro.org * config/arm/arm.opt (mlra): New option. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2046dd5..57e4c2b 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -6067,23 +6067,7 @@ (set_attr prefix orig,vex) (set_attr mode sseinsnmode)]) -(define_expand vec_widen_umult_even_v8si - [(set (match_operand:V4DI 0 register_operand) - (mult:V4DI - (zero_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)]))) - (zero_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 2 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)])] - TARGET_AVX2 - ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);) - -(define_insn *vec_widen_umult_even_v8si +(define_insn vec_widen_umult_even_v8si [(set (match_operand:V4DI 0 register_operand =x) (mult:V4DI (zero_extend:V4DI @@ -6096,7 +6080,7 @@ (match_operand:V8SI 2 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - TARGET_AVX2 ix86_binary_operator_ok (MULT, V8SImode, operands) + TARGET_AVX2 vpmuludq\t{%2, %1, %0|%0, %1, %2} [(set_attr type sseimul) (set_attr prefix vex) @@ -6137,28 +6121,12 @@ (set_attr prefix orig,vex) (set_attr mode TI)]) -(define_expand vec_widen_smult_even_v8si - [(set (match_operand:V4DI 0 register_operand) - (mult:V4DI - (sign_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)]))) - (sign_extend:V4DI - (vec_select:V4SI - (match_operand:V8SI 2 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) -(const_int 4) (const_int 6)])] - TARGET_AVX2 - ix86_fixup_binary_operands_no_copy (MULT, V8SImode, operands);) - -(define_insn *vec_widen_smult_even_v8si +(define_insn vec_widen_smult_even_v8si [(set (match_operand:V4DI 0 register_operand =x) (mult:V4DI (sign_extend:V4DI (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand x) + (match_operand:V8SI 1 nonimmediate_operand %x) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))) (sign_extend:V4DI @@ -6166,7 +6134,7 @@ (match_operand:V8SI 2 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - TARGET_AVX2 ix86_binary_operator_ok (MULT, V8SImode, operands) + TARGET_AVX2 vpmuldq\t{%2, %1, %0|%0, %1, %2} [(set_attr isa avx) (set_attr type sseimul) @@ -6210,41 +6178,7 @@ (set_attr prefix orig,vex) (set_attr mode TI)]) -(define_expand avx2_pmaddwd - [(set (match_operand:V8SI 0 register_operand) - (plus:V8SI - (mult:V8SI - (sign_extend:V8SI - (vec_select:V8HI - (match_operand:V16HI 1 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) - (const_int 4) (const_int 6) - (const_int 8) (const_int 10) - (const_int 12) (const_int 14)]))) - (sign_extend:V8SI - (vec_select:V8HI - (match_operand:V16HI 2 nonimmediate_operand) - (parallel [(const_int 0) (const_int 2) -
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 15 Oct 08:46, Richard Henderson wrote: On 10/09/2013 03:31 AM, Kirill Yukhin wrote: + rtx op1 = operands[1]; + if (REG_P (op1)) +op1 = gen_rtx_REG (V16HImode, REGNO (op1)); + else +op1 = gen_lowpart (V16HImode, op1); The IF case is incorrect. You need to use gen_lowpart always. I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). I've rewrote this pattern w/o explicit insn emit: (define_insn_and_split vec_extract_lo_v32hi [(set (match_operand:V16HI 0 nonimmediate_operand =v,m) (vec_select:V16HI (match_operand:V32HI 1 nonimmediate_operand vm,v) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] TARGET_AVX512F !(MEM_P (operands[0]) MEM_P (operands[1])) # reload_completed [(set (match_dup 0) (match_dup 1))] { if (REG_P (operands[1])) operands[1] = gen_rtx_REG (V16HImode, REGNO (operands[1])); else operands[1] = adjust_address (operands[1], V16HImode, 0); }) The second alternative only matches for v/m/1. While I imagine that it doesn't really matter, it might be better to swap the two so that vmovddup gets used for the v/v/1 case too. Done. Why do you have separate define_expand and define_insn for this pattern? Expand removed. And is this really the best description for this insn? It's a store to memory. Surely it's better to say that we store V8QI. I believe it is a bug. We store 8 bytes, no zeroes. Fixed. We don't need to use ix86_fixup_binary_operands for any three-operand insn. That function is in order to help SSE's two-operand insns. You can drop the define-expand and just keep the define_insn. Fixed. You want a %v for operand 1, to make the operands commutative. Done. +(define_insn *vec_widen_smult_even_v16si + [(set (match_operand:V8DI 0 register_operand =x) +(mult:V8DI Similarly, plus errant x. Done. Whole patch below. Is it ok now? Bootstrap pass, all AVX* tests pass. -- Thanks, K gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 873 +- 3 files changed, 912 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend s) (zero_extend u)]) (define_code_attr u_bool [(sign_extend false) (zero_extend true)]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate s) (truncate ) (us_truncate us)]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix ) (unsigned_fix u)]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 06b2914..999d8ab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 6, 7 +;; Match 8 to 9. +(define_predicate const_8_to_9_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 8, 9 + ;; Match 8 to 11. (define_predicate const_8_to_11_operand (and (match_code const_int) @@ -762,16 +767,51 @@ (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 8, 15 +;; Match 10 to 11. +(define_predicate const_10_to_11_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 10, 11 + +;; Match 12 to 13. +(define_predicate const_12_to_13_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 12, 13 + ;; Match 12 to 15. (define_predicate const_12_to_15_operand (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 12, 15 +;; Match 14 to 15. +(define_predicate const_14_to_15_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 14, 15 + +;; Match 16 to 19. +(define_predicate const_16_to_19_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 16, 19 + ;; Match 16 to 31. (define_predicate const_16_to_31_operand (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 16, 31 +;; Match 20 to 23. +(define_predicate const_20_to_23_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 20, 23 + +;; Match 24
powerpc support for TARGET_SUPPORTS_WIDE_INT.
This patch makes the PPC the first public target to use the CONST_WIDE_INT representation rather than CONST_DOUBLES to represent larger integers. There are two parts to this patch. The changes to genrecog.c represent fixes that would be needed for any port that supports CONST_WIDE_INTS and the rest are changes specific to the rs6000 port. Bootstrapped and tested on a ppc64. Committed as revision 204710. Kenny Index: gcc/genrecog.c === --- gcc/genrecog.c (revision 203701) +++ gcc/genrecog.c (working copy) @@ -588,6 +588,7 @@ validate_pattern (rtx pattern, rtx insn, GET_CODE (src) != PC GET_CODE (src) != CC0 !CONST_INT_P (src) + !CONST_WIDE_INT_P (src) GET_CODE (src) != CALL) { const char *which; @@ -772,13 +773,14 @@ add_to_sequence (rtx pattern, struct dec We can optimize the generated code a little if either (a) the predicate only accepts one code, or (b) the - predicate does not allow CONST_INT, in which case it - can match only if the modes match. */ + predicate does not allow CONST_INT or CONST_WIDE_INT, + in which case it can match only if the modes match. */ pred = lookup_predicate (pred_name); if (pred) { test-u.pred.data = pred; - allows_const_int = pred-codes[CONST_INT]; + allows_const_int = (pred-codes[CONST_INT] +|| pred-codes[CONST_WIDE_INT]); if (was_code == MATCH_PARALLEL pred-singleton != PARALLEL) message_with_line (pattern_lineno, Index: gcc/recog.c === --- gcc/recog.c (revision 203701) +++ gcc/recog.c (working copy) @@ -1198,7 +1198,8 @@ const_scalar_int_operand (rtx op, enum m /* Returns 1 if OP is an operand that is a CONST_WIDE_INT of mode MODE. This most likely is not as useful as - const_scalar_int_operand, but is here for consistancy. */ + const_scalar_int_operand since it does not accept CONST_INTs, but + is here for consistancy. */ int const_wide_int_operand (rtx op, enum machine_mode mode) { Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 203701) +++ gcc/config/rs6000/predicates.md (working copy) @@ -19,7 +19,7 @@ ;; Return 1 for anything except PARALLEL. (define_predicate any_operand - (match_code const_int,const_double,const,symbol_ref,label_ref,subreg,reg,mem)) + (match_code const_int,const_double,const_wide_int,const,symbol_ref,label_ref,subreg,reg,mem)) ;; Return 1 for any PARALLEL. (define_predicate any_parallel_operand @@ -596,7 +596,7 @@ (define_predicate easy_vector_constant_ ;; Return 1 if operand is constant zero (scalars and vectors). (define_predicate zero_constant - (and (match_code const_int,const_double,const_vector) + (and (match_code const_int,const_double,const_wide_int,const_vector) (match_test op == CONST0_RTX (mode ;; Return 1 if operand is 0.0. @@ -790,7 +790,7 @@ (define_predicate logical_operand ;; Return 1 if op is a constant that is not a logical operand, but could ;; be split into one. (define_predicate non_logical_cint_operand - (and (match_code const_int,const_double) + (and (match_code const_int,const_wide_int) (and (not (match_operand 0 logical_operand)) (match_operand 0 reg_or_logical_cint_operand @@ -1058,7 +1058,7 @@ (define_predicate current_file_function ;; Return 1 if this operand is a valid input for a move insn. (define_predicate input_operand (match_code symbol_ref,const,reg,subreg,mem, - const_double,const_vector,const_int) + const_double,const_wide_int,const_vector,const_int) { /* Memory is always valid. */ if (memory_operand (op, mode)) @@ -1071,8 +1071,7 @@ (define_predicate input_operand /* Allow any integer constant. */ if (GET_MODE_CLASS (mode) == MODE_INT - (GET_CODE (op) == CONST_INT - || GET_CODE (op) == CONST_DOUBLE)) + CONST_SCALAR_INT_P (op)) return 1; /* Allow easy vector constants. */ @@ -,7 +1110,7 @@ (define_predicate input_operand ;; Return 1 if this operand is a valid input for a vsx_splat insn. (define_predicate splat_input_operand (match_code symbol_ref,const,reg,subreg,mem, - const_double,const_vector,const_int) + const_double,const_wide_int,const_vector,const_int) { if (MEM_P (op)) { Index: gcc/config/rs6000/darwin.h === --- gcc/config/rs6000/darwin.h (revision 203701) +++ gcc/config/rs6000/darwin.h (working copy) @@ -421,3 +421,4 @@ do { \ /* So far, there is no rs6000_fold_builtin, if one is introduced, then this will need to be modified similar to the x86 case. */ #define TARGET_FOLD_BUILTIN SUBTARGET_FOLD_BUILTIN + Index: gcc/config/rs6000/rs6000.c
Re: [PATCH i386 AVX2] Remove redundant expands.
On Wed, Oct 16, 2013 at 6:06 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. ChangeLog: * config/i386/sse.md (vec_widen_umult_even_v8si): Remove expand, make insn visible, remove redundant check. (vec_widen_smult_even_v8si): Ditto. (avx2_pmaddwd): Ditto. (avx2_eqmode3): Ditto. (avx512f_eqmode3): Ditto. -(define_insn *vec_widen_smult_even_v8si +(define_insn vec_widen_smult_even_v8si [(set (match_operand:V4DI 0 register_operand =x) (mult:V4DI (sign_extend:V4DI (vec_select:V4SI - (match_operand:V8SI 1 nonimmediate_operand x) + (match_operand:V8SI 1 nonimmediate_operand %x) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)]))) (sign_extend:V4DI @@ -6166,7 +6134,7 @@ (match_operand:V8SI 2 nonimmediate_operand xm) (parallel [(const_int 0) (const_int 2) (const_int 4) (const_int 6)])] - TARGET_AVX2 ix86_binary_operator_ok (MULT, V8SImode, operands) + TARGET_AVX2 vpmuldq\t{%2, %1, %0|%0, %1, %2} [(set_attr isa avx) Please also remove the above set_attr, it is not needed when the insn is constrainted with TARGET_AVX2. OK with this addition. Thanks, Uros.
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
Andreas You could put it in .dir-locals.el, or put a hook on Andreas find-file-hook that looks at buffer-file-name. We checked in a .dir-locals.el for gdb. I recommend it for GCC as well. The gdb one is appended. You'll at least want to edit the bugzilla URL to suit. To make this work, just drop it in gcc/.dir-locals.el. Emacs will pick it up automatically. Not sure if the Tcl settings are right for gcc's test suite, but they are for gdb's. Tom ;; Emacs settings. ;; Copyright (C) 2012-2013 Free Software Foundation, Inc. ;; This program is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 3 of the License, or ;; (at your option) any later version. ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see http://www.gnu.org/licenses/. ( (tcl-mode . ((tcl-indent-level . 4) (tcl-continued-indent-level . 4) (indent-tabs-mode . t))) (nil . ((bug-reference-url-format . http://sourceware.org/bugzilla/show_bug.cgi?id=%s;))) (c-mode . ((c-file-style . GNU) (indent-tabs-mode . t))) )
Re: RFC: Add of type-demotion pass
On 10/16/13 03:31, Richard Biener wrote: I see two primary effects of type sinking. Note it was called type demotion ;) ;) It's a mental block of mine; it's been called type hoisting/sinking in various contexts and I see parallels between the code motion algorithms and how the type promotion/demotion exposes unnecessary type conversions. So I keep calling them hoisting/sinking. I'll try to use promotion/demotion. First and probably the most important in my mind is by sinking a cast through its uses the various transformations we already perform are more likely to apply *without* needing to handle optimizing through typecasts explicitly. I would say it is desirable to express arithmetic in the smallest possible types (see what premature optimization the C family frontends do to narrow operations again after C integer promotion applied). I don't see this as the major benefit of type demotion. Yes, there is some value in shrinking constants and the like, but in my experience the benefits are relatively small and often get lost in things like partial register stalls on x86, the PA and probably others (yes, the PA has partial register stalls, it's just that nobody used that term). What I really want to get at here is avoiding having a large number of optimizers looking back through the use-def chains and attempting to elide typecasts in the middle of a chain of statements of interest. You need some kind of range information to do this, thus either integrate it into VRP (there is already code that does this there) or use range information from VRP which we now preserve. If the primary goal is to shrink types, then yes, you want to use whatever information you can, including VRP. But that's not the primary goal in my mind, at least not at this stage. There's no reason why this pass couldn't utilize VRP information to provide more opportunities to demote types and achieve the goal you want. But I'd consider that a follow-on opportunity. The second primary effect is, given two casts where the first indirectly feeds the second (ie, the first feeds some statement, which then feeds the second cast), if we're able to sink the first cast, we end up with the first cast directly feeding the second cast. When this occurs one of the two casts can often be eliminated. Sadly, I didn't keep any of those test files, but I regularly saw them in GCC bootstraps. This transformation is applied both by fold-const.c and by SSA forwprop (our GIMPLE combiner). Doing it in yet another pass looks wrong (and it isn't type demotion but also can be promotion). Yes, I know. And we need to get this back down to a single implementation. I don't much care which of the 3 implementations we keep, but it really should just be one and it needs to be reusable. I probably should have stated this differently -- the second primary effect is to expose more cases where type conversions can be eliminated via type promotion/demotion. I don't much care which of the 3 blobs of code to eliminate the conversions we use -- I do care that we've got a consistent way to promote/demote conversions to expose the unnecessary type conversions. In contrast to the desire of expressing operations in the smallest required type there is the desire of exposing the effect of PROMOTE_MODE on GIMPLE instead of only during RTL expansion. This is because the truncations (sext and zext) PROMOTE_MODE introduced are easier to optimize away when range information is available (see the attempts to address this at RTL expansion time from Kugan from Linaro). Right. I'm aware of this work and the problem he's trying to solve and have been loosely watching it -- primarily for the persistent VRP information. Similarly, I know there's a type hoisting patch that's also queued up. I think it should be handled separately as well. I think we need to paint a picture of the final result - what is the main objective of the various(?!) passes in question? Where do we do the same kind of transformation already? I thought we'd done this at a high level already. At the heart of this work is to: 1. Isolate, to the fullest extent possible, code which promotes and demotes types. We have this stuff all over the place right now and it's very ad-hoc. 2. Promote/demote types to allow our optimizers to not concern themselves with walking back through type conversions when applying optimizations. 3. Promote/demote types to expose unnecessary type conversions. If we look at #2 and #3 we can expect that we'd want a structure which allows for a simplification/optimization step to occur after types are promoted or demoted. ie, a pipeline that looks like: promote types - optimize1 - demote types - optimize2 Now where that little mini pipeline lands is still a big question to me. optimize1 may be a fairly significant hunk of our pipeline. optimize2 probably isn't (may just be a final
Re: [PATCH i386 AVX2] Remove redundant expands.
On Wed, Oct 16, 2013 at 6:06 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. OTOH, I have some second thoughts on removing AVX2 expanders. Please consider the situation, where we have *both* operands in memory, and the insn is inside the loop. When reload comes around, it will fixup one of the operands with a load from memory. However, having insn in the loop, I suspect the load won't be moved out of loop. So, I guess even AVX/AVX2 insn patterns should call ix86_fixup_binary_operands_*, and this fixup function should be improved to load one of the operands into register, in case both operands are in memory. This also means, that you still need expanders for AVX512 commutative multiplies. Uros.
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
On 10/16/2013 09:07 AM, Kirill Yukhin wrote: I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). False. gen_lowpart is perfectly safe post-reload. Indeed, taking the subreg of a hard register should arrive x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset); in simplify_subreg. Have you encountered some specific problem with gen_lowpart? r~
Re: [PATCH i386 AVX2] Remove redundant expands.
On 10/16/2013 09:47 AM, Uros Bizjak wrote: On Wed, Oct 16, 2013 at 6:06 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. OTOH, I have some second thoughts on removing AVX2 expanders. Please consider the situation, where we have *both* operands in memory, and the insn is inside the loop. When reload comes around, it will fixup one of the operands with a load from memory. However, having insn in the loop, I suspect the load won't be moved out of loop. So, I guess even AVX/AVX2 insn patterns should call ix86_fixup_binary_operands_*, and this fixup function should be improved to load one of the operands into register, in case both operands are in memory. This also means, that you still need expanders for AVX512 commutative multiplies. Uros. Fair enough. r~
Re: [C PATCH] Warn for _Alignas in an array declarator (PR c/58267)
On Wed, 16 Oct 2013, Marek Polacek wrote: Regtested/bootstrapped on x86_64-linux, ok for trunk? 2013-10-16 Marek Polacek pola...@redhat.com PR c/58267 c/ * c-parser.c (c_parser_declspecs): Add alignspec_ok parameter. Document syntax of the array-declarator. (c_parser_declspecs) RID_ALIGNAS: Bail out if alignment specs are not permitted. (c_parser_declaration_or_fndef): Adjust c_parser_declspecs call. (c_parser_struct_declaration): Likewise. (c_parser_declarator): Likewise. (c_parser_direct_declarator_inner): Likewise. (c_parser_parameter_declaration): Likewise. (c_parser_type_name): Likewise. testsuite/ * gcc.dg/c1x-align-5.c: New test. OK. -- Joseph S. Myers jos...@codesourcery.com
[jit] Revamp of locals handling (changing API)
I've committed the following to dmalcolm/jit: Convert local creation to take a function rather than a context. Simplify the API by eliminating the separate local type in favor of just using the lvalue type, as we do for globals. As well as reducing the number of API entrypoints, this simplifies client code by eliminating various casts from local to lvalue. To port from the old API: s/gcc_jit_context_new_local/gcc_jit_function_new_local/g s/gcc_jit_local_as_rvalue/gcc_jit_lvalue_as_rvalue/g s/gcc_jit_local/gcc_jit_lvalue/g and eliminate calls to gcc_jit_local_as_lvalue, and update the arguments to gcc_jit_function_new_local (was gcc_jit_context_new_local) to reflect the type change. gcc/jit/ * TODO.rst (gcc_jit_context_new_local): Remove completed item. * internal-api.c (gcc::jit::context::new_local): Replace with... (gcc::jit::function::new_local): ...this, and change return type from (local*) to (lvalue*). * internal-api.h (gcc::jit::local): Eliminate. (gcc::jit::context::new_local): Replace with... (gcc::jit::function::new_local): ...this, and change return type from (local*) to (lvalue*). * libgccjit.c (gcc_jit_local): Eliminate. (gcc_jit_context_new_local): Replace with... (gcc_jit_function_new_local): ...this, and change return type from (gcc_jit_local*) to (gcc_jit_lvalue*). (gcc_jit_local_as_lvalue): Remove. (gcc_jit_local_as_rvalue): Remove. * libgccjit.h (gcc_jit_local): Remove. (gcc_jit_context_new_local): Replace with... (gcc_jit_function_new_local): ...this, and change return type from (gcc_jit_local*) to (gcc_jit_lvalue*). (gcc_jit_local_as_lvalue): Remove. (gcc_jit_local_as_rvalue): Remove. * libgccjit.map (gcc_jit_context_new_local): Replace with... (gcc_jit_function_new_local): ...this. (gcc_jit_local_as_lvalue): Remove. (gcc_jit_local_as_rvalue): Remove. gcc/testsuite/ * jit.dg/test-dot-product.c (code_making_callback): Update for API changes to locals. * jit.dg/test-sum-of-squares.c (code_making_callback): Likewise. --- gcc/jit/ChangeLog.jit | 27 ++ gcc/jit/TODO.rst | 3 --- gcc/jit/internal-api.c | 33 +-- gcc/jit/internal-api.h | 19 +--- gcc/jit/libgccjit.c| 36 +- gcc/jit/libgccjit.h| 23 +-- gcc/jit/libgccjit.map | 4 +--- gcc/testsuite/ChangeLog.jit| 6 + gcc/testsuite/jit.dg/test-dot-product.c| 24 ++-- gcc/testsuite/jit.dg/test-sum-of-squares.c | 28 +++ 10 files changed, 97 insertions(+), 106 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index da3e75a..54953b6 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,30 @@ +2013-10-16 David Malcolm dmalc...@redhat.com + + * TODO.rst (gcc_jit_context_new_local): Remove completed item. + * internal-api.c (gcc::jit::context::new_local): Replace with... + (gcc::jit::function::new_local): ...this, and change return type + from (local*) to (lvalue*). + * internal-api.h (gcc::jit::local): Eliminate. + (gcc::jit::context::new_local): Replace with... + (gcc::jit::function::new_local): ...this, and change return type + from (local*) to (lvalue*). + * libgccjit.c (gcc_jit_local): Eliminate. + (gcc_jit_context_new_local): Replace with... + (gcc_jit_function_new_local): ...this, and change return type + from (gcc_jit_local*) to (gcc_jit_lvalue*). + (gcc_jit_local_as_lvalue): Remove. + (gcc_jit_local_as_rvalue): Remove. + * libgccjit.h (gcc_jit_local): Remove. + (gcc_jit_context_new_local): Replace with... + (gcc_jit_function_new_local): ...this, and change return type + from (gcc_jit_local*) to (gcc_jit_lvalue*). + (gcc_jit_local_as_lvalue): Remove. + (gcc_jit_local_as_rvalue): Remove. + * libgccjit.map (gcc_jit_context_new_local): Replace with... + (gcc_jit_function_new_local): ...this. + (gcc_jit_local_as_lvalue): Remove. + (gcc_jit_local_as_rvalue): Remove. + 2013-10-15 David Malcolm dmalc...@redhat.com * libgccjit.h (gcc_jit_location): Rewrite comment to reflect diff --git a/gcc/jit/TODO.rst b/gcc/jit/TODO.rst index feb4d2b..d750b2a 100644 --- a/gcc/jit/TODO.rst +++ b/gcc/jit/TODO.rst @@ -1,8 +1,5 @@ TODOs: -* gcc_jit_context_new_local should take a function, not the context (is - it building a global?) - * error-handling: * have a client-provided error-handling callback for the context, and call it, rather than asserting/crashing etc, to make the API resilient and helpful diff
Re: [wide-int] int_traits tree
On Oct 16, 2013, at 8:47 AM, Kenneth Zadeck zad...@naturalbridge.com wrote: To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. I don't have a problem with that direction if it improves code-gen...
Go patch committed: Updated 4.8 branch
I committed this patch to the 4.8 branch to bring in the last set of Go 1.1.2 bug fixes and patches. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Ian 2013-10-16 Ian Lance Taylor i...@google.com Bring in from mainline: 2013-10-11 Chris Manghane cm...@google.com * go-gcc.cc (Gcc_backend::function_code_expression): New function. 2013-10-10 Chris Manghane cm...@google.com * go-gcc.cc (Backend::error_function): New function. (Backend::function): New function. (Backend::make_function): New function. (function_to_tree): New function. Index: libgo/Makefile.am === --- libgo/Makefile.am (revision 203707) +++ libgo/Makefile.am (working copy) @@ -424,6 +424,7 @@ runtime_files = \ runtime/go-caller.c \ runtime/go-callers.c \ runtime/go-can-convert-interface.c \ + runtime/go-cdiv.c \ runtime/go-cgo.c \ runtime/go-check-interface.c \ runtime/go-construct-map.c \ Index: libgo/runtime/go-signal.c === --- libgo/runtime/go-signal.c (revision 203707) +++ libgo/runtime/go-signal.c (working copy) @@ -399,6 +399,9 @@ sig_tramp_info (int sig, Siginfo *info, { G *gp; M *mp; +#ifdef USING_SPLIT_STACK + void *stack_context[10]; +#endif /* We are now running on the stack registered via sigaltstack. (Actually there is a small span of time between runtime_siginit @@ -409,7 +412,7 @@ sig_tramp_info (int sig, Siginfo *info, if (gp != NULL) { #ifdef USING_SPLIT_STACK - __splitstack_getcontext (gp-stack_context[0]); + __splitstack_getcontext (stack_context[0]); #endif } @@ -432,7 +435,7 @@ sig_tramp_info (int sig, Siginfo *info, if (gp != NULL) { #ifdef USING_SPLIT_STACK - __splitstack_setcontext (gp-stack_context[0]); + __splitstack_setcontext (stack_context[0]); #endif } } Index: libgo/runtime/go-cdiv.c === --- libgo/runtime/go-cdiv.c (revision 0) +++ libgo/runtime/go-cdiv.c (revision 0) @@ -0,0 +1,46 @@ +/* go-cdiv.c -- complex division routines + + Copyright 2013 The Go Authors. All rights reserved. + Use of this source code is governed by a BSD-style + license that can be found in the LICENSE file. */ + +/* Calls to these functions are generated by the Go frontend for + division of complex64 or complex128. We use these because Go's + complex division expects slightly different results from the GCC + default. When dividing NaN+1.0i / 0+0i, Go expects NaN+NaNi but + GCC generates NaN+Infi. NaN+Infi seems wrong seems the rules of + C99 Annex G specify that if either side of a complex number is Inf, + the the whole number is Inf, but an operation involving NaN ought + to result in NaN, not Inf. */ + +__complex float +__go_complex64_div (__complex float a, __complex float b) +{ + if (__builtin_expect (b == 0+0i, 0)) +{ + if (!__builtin_isinff (__real__ a) + !__builtin_isinff (__imag__ a) + (__builtin_isnanf (__real__ a) || __builtin_isnanf (__imag__ a))) + { + /* Pass 1 to nanf to match math/bits.go. */ + return __builtin_nanf(1) + __builtin_nanf(1)*1i; + } +} + return a / b; +} + +__complex double +__go_complex128_div (__complex double a, __complex double b) +{ + if (__builtin_expect (b == 0+0i, 0)) +{ + if (!__builtin_isinf (__real__ a) + !__builtin_isinf (__imag__ a) + (__builtin_isnan (__real__ a) || __builtin_isnan (__imag__ a))) + { + /* Pass 1 to nan to match math/bits.go. */ + return __builtin_nan(1) + __builtin_nan(1)*1i; + } +} + return a / b; +} Index: libgo/runtime/go-make-slice.c === --- libgo/runtime/go-make-slice.c (revision 203707) +++ libgo/runtime/go-make-slice.c (working copy) @@ -34,7 +34,10 @@ __go_make_slice2 (const struct __go_type std = (const struct __go_slice_type *) td; ilen = (intgo) len; - if (ilen 0 || (uintptr_t) ilen != len) + if (ilen 0 + || (uintptr_t) ilen != len + || (std-__element_type-__size 0 + len MaxMem / std-__element_type-__size)) runtime_panicstring (makeslice: len out of range); icap = (intgo) cap; Index: gcc/go/go-gcc.cc === --- gcc/go/go-gcc.cc (revision 203707) +++ gcc/go/go-gcc.cc (working copy) @@ -232,6 +232,9 @@ class Gcc_backend : public Backend Bexpression* convert_expression(Btype* type, Bexpression* expr, Location); + Bexpression* + function_code_expression(Bfunction*, Location); + // Statements. Bstatement* @@ -334,6 +337,17 @@ class Gcc_backend : public Backend Bexpression* label_address(Blabel*, Location); + // Functions. + + Bfunction* + error_function() + { return this-make_function(error_mark_node); } + + Bfunction* + function(Btype*
Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk
On 10/02/13 16:53, Jason Merrill wrote: On 08/27/2013 04:03 PM, Aldy Hernandez wrote: + else if (!TREE_TYPE (e) || !TREE_CONSTANT (e) + || !INTEGRAL_TYPE_P (TREE_TYPE (e))) +cp_parser_error (parser, + step size must be an integer constant); Can't the step size be a value-dependent expression like a template non-type parameter? [Balaji, this is for the linear clause that is either linear(var) or linear(var : step)]. Jason, I'm not sure. The standard says: The conditional-expression in a simd-linear-step shall either satisfy the requirements of an integer constant expression, or be a reference to a variable with integer type. I take this to mean, an integral expression or a plain old variable. Is this the case, Balaji? Jason, are you asking about the validity of something like this: int *p; template int argoop void foobar(int a) { int j = 123; #pragma simd linear(j : argoop) for (int i=0; i a; ++i) p[i] = argoop; } void funky() { foobar 69 (1000); } If this is what you're asking, and the above standardeese allows it, then we have a problem, because the code parsing the j in the linear clause uses cp_parser_id_expression() which gets horrendously confused with the colon, expecting a '::'. For that matter, even doing this causes problems: int j = 123, k = 456; #pragma simd linear (j : k) ... Is this what you were asking about? If so, then perhaps something besides cp_parser_id_expression is in order, for the sake of the template argument or just a plain variable. Aldy
alias fix for PR58685
The sequence of events here can be summarized as shrink-wrapping causes the i386 backend to do something that confuses alias analysis. The miscompilation is that two instructions are swapped by the scheduler when they shouldn't be, due to an incorrect reg_base_value. The miscompiled function has the following parts: * a loop which uses %rdi as an incoming argument register * following that, a decrement of the stack pointer (shrink-wrapped to this point rather than the start of the function) * the rest of the function which has one set of %rdi to (symbol_ref LC0). The argument register %rdi is dead by the point where we decrement the stack pointer. The i386 backend splits the sub into a sequence of two insns, a clobber of %rdi and a push of it (which register is chosen appears to be somewhat random). When called from the scheduler, init_alias_analysis incorrectly deduces that %rdi has a base value of (symbol_ref LC0). Although it tries to track which registers are argument registers that may hold a pointer, this information is lost when we encounter the clobber. The main part of the following patch is to modify that piece of code to also set reg_seen for argument registers when encountering a clobber. There are other problems in this area, one of which showed up while testing an earlier patch. i386/pr57003.c demonstrates that return values of FUNCTION_ARG_REGNO are not constant across the whole compilation; they are affected by the ms_abi attribute. This necessitates changing from a static_reg_base_value array to a per-function one. Once that is done, it's better to look at DECL_ARGUMENTS in a similar way to what combine.c does to identify the arguments of the specific function we're compiling rather than using the less specific FUNCTION_ARG_REGNO. Lastly, I modified a test for Pmode to use the valid_pointer_mode hook which should be a little more correct. Bootstrapped and tested on x86_64-linux. Ok? Bernd PR rtl-optimization/58685 * alias.c (static_reg_base_value): Delete macro. (reg_base_value_initial): New macro to replace it. All uses changed. (record_set): For a clobber of an arg reg, set reg_seen. (setup_initial_base_values): Renamed from init_alias_target and made static. Check the DECL_INCOMING_RTL of the DECL_ARGUMENTS. Record that the array was set up for the current function. (init_alias_analysis): Call setup_initial_base_values if necessary. * function.h (struct emit_status): Add fields initial_reg_base_value and initial_reg_base_value_setup. * rtl.h (struct target_rtl): Remove x_static_reg_base_value. (init_alias_target): Don't declare. * toplev.c (backend_init_target): Don't call it. PR rtl-optimization/58685 * gcc.c-torture/execute/pr58685.c: New test. diff --git a/gcc/alias.c b/gcc/alias.c index a48bb51..b1189b7 100644 --- a/gcc/alias.c +++ b/gcc/alias.c @@ -229,8 +229,7 @@ static GTY((deletable)) vecrtx, va_gc *old_reg_base_value; #define UNIQUE_BASE_VALUE_FP -3 #define UNIQUE_BASE_VALUE_HFP -4 -#define static_reg_base_value \ - (this_target_rtl-x_static_reg_base_value) +#define reg_base_value_initial (crtl-emit.initial_reg_base_value) #define REG_BASE_VALUE(X) \ (REGNO (X) vec_safe_length (reg_base_value) \ @@ -1306,6 +1305,20 @@ record_set (rtx dest, const_rtx set, void *data ATTRIBUTE_UNUSED) set). */ if (GET_CODE (set) == CLOBBER) { + if (new_reg_base_value[regno] != 0 + regno != HARD_FRAME_POINTER_REGNUM + regno != FRAME_POINTER_REGNUM + regno != STACK_POINTER_REGNUM + regno != ARG_POINTER_REGNUM) + { + if (!bitmap_bit_p (reg_seen, regno)) + { + /* We have to make an exception here for an argument + register, in case it is used before the clobber. */ + gcc_assert (new_reg_base_value[regno] == arg_base_value); + bitmap_set_bit (reg_seen, regno); + } + } new_reg_base_value[regno] = 0; return; } @@ -1693,7 +1706,7 @@ find_base_term (rtx x) return ret; if (cselib_sp_based_value_p (val)) - return static_reg_base_value[STACK_POINTER_REGNUM]; + return reg_base_value_initial[STACK_POINTER_REGNUM]; f = val-locs; /* Temporarily reset val-locs to avoid infinite recursion. */ @@ -1795,7 +1808,7 @@ bool may_be_sp_based_p (rtx x) { rtx base = find_base_term (x); - return !base || base == static_reg_base_value[STACK_POINTER_REGNUM]; + return !base || base == reg_base_value_initial[STACK_POINTER_REGNUM]; } /* Return 0 if the addresses X and Y are known to point to different @@ -2809,34 +2822,34 @@ may_alias_p (const_rtx mem, const_rtx x) return rtx_refs_may_alias_p (x, mem, false); } -void -init_alias_target (void) +static void +setup_initial_base_values (void) { - int i; - if (!arg_base_value) arg_base_value = gen_rtx_ADDRESS (VOIDmode, 0); - memset (static_reg_base_value, 0, sizeof static_reg_base_value); + memset (reg_base_value_initial, 0, sizeof reg_base_value_initial); - for (i = 0; i
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On 10/16/13 02:21, Richard Biener wrote: On Tue, Oct 15, 2013 at 6:55 PM, Jeff Law l...@redhat.com wrote: On 10/15/13 10:53, Jakub Jelinek wrote: On Tue, Oct 15, 2013 at 10:50:39AM -0600, Jeff Law wrote: I noticed that we're now including rtl.h and tm_p.h in tree-ssa-reassoc.c, which seems wrong. Isn't that required for BRANCH_COST use? Other option would be to add some dummy wrapper around BRANCH_COST, put that wrapper into some file that already includes rtl.h and tm_p.h and just call it from there. Yea, looking at it now, I'm a bit surprised this hasn't been converted to a target hook which would avoid this problem entirely. You got the job! Joys :-) What's the policy on the GDFL stuff. I've tried so hard to avoid having to worry about that rats nest that I have no idea what our policy is. Basically I just want to take the old docs for BRANCH_COST and re-use those. Is that considered kosher with all the GDFL nonsense? Jeff
Re: [patch] combine ICE fix
On 10/16/13 09:34, Cesar Philippidis wrote: On 10/15/13 12:16 PM, Jeff Law wrote: On 10/10/13 10:25, Jakub Jelinek wrote: On Thu, Oct 10, 2013 at 07:26:43AM -0700, Cesar Philippidis wrote: This patch addresses an ICE when building qemu for a Mips target in Yocto. Both gcc-trunk, gcc-4.8 and all of the targets are potentially affected. The problem occurs because the instruction combine phase uses two data structures to keep track of registers, reg_stat and regstat_n_sets_and_refs, but they potentially end up out of sync; when combine inserts a new register into reg_stat it does not update regstat_n_sets_and_refs. Failing to update the latter results in an occasional segmentation fault. Is this OK for trunk and gcc-4.8? If so, please check it in. I tested it on Mips and x86-64 and no regressions showed up. That looks broken. You leave everything from the last size till the current one uninitialized, so it would only work if combine_split_insns always increments max_reg_num () by at most one. I don't think that assumption is safe. Consider a parallel with a bunch of (clobber (match_scratch)) expressions. I address that in the patch posted here http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00802.html. Is that still insufficient? Thanks. I wasn't aware of the follow-up. Furthermore, there are macros which should be used to access the fields, and, if the vector is ever going to be resized, supposedly it should be vec.h vector rather than just array. Or perhaps take into account: /* If a pass need to change these values in some magical way or the pass needs to have accurate values for these and is not using incremental df scanning, then it should use REG_N_SETS and REG_N_USES. If the pass is doing incremental scanning then it should be getting the info from DF_REG_DEF_COUNT and DF_REG_USE_COUNT. */ and not use REG_N_SETS etc. but instead the df stuff. Which begs the question, how exactly is combine utilizing the regstat_n_* structures and is that use even valid for combine? I'll take a look at that. This needs to be resolved before we can go forward with your patch. jeff
Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk
On 10/16/2013 01:48 PM, Aldy Hernandez wrote: On 10/02/13 16:53, Jason Merrill wrote: Can't the step size be a value-dependent expression like a template non-type parameter? Jason, I'm not sure. The standard says: The conditional-expression in a simd-linear-step shall either satisfy the requirements of an integer constant expression, or be a reference to a variable with integer type. I take this to mean, an integral expression or a plain old variable. Is this the case, Balaji? Right. An integral constant expression can be value-dependent. If this is what you're asking, and the above standardeese allows it, then we have a problem, because the code parsing the j in the linear clause uses cp_parser_id_expression() which gets horrendously confused with the colon, expecting a '::'. Sounds like you need to turn off parser-colon_corrects_to_scope_p when parsing the pragma arguments. Jason
RE: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Aldy Hernandez Sent: Wednesday, October 16, 2013 1:49 PM To: Jason Merrill Cc: Richard Henderson; gcc-patches; Iyer, Balaji V Subject: Re: PING: Fwd: Re: [patch] implement Cilk Plus simd loops on trunk On 10/02/13 16:53, Jason Merrill wrote: On 08/27/2013 04:03 PM, Aldy Hernandez wrote: + else if (!TREE_TYPE (e) || !TREE_CONSTANT (e) + || !INTEGRAL_TYPE_P (TREE_TYPE (e))) +cp_parser_error (parser, + step size must be an integer constant); Can't the step size be a value-dependent expression like a template non-type parameter? [Balaji, this is for the linear clause that is either linear(var) or linear(var : step)]. Jason, I'm not sure. The standard says: The conditional-expression in a simd-linear-step shall either satisfy the requirements of an integer constant expression, or be a reference to a variable with integer type. I take this to mean, an integral expression or a plain old variable. Is this the case, Balaji? Yup. Jason, are you asking about the validity of something like this: int *p; template int argoop void foobar(int a) { int j = 123; #pragma simd linear(j : argoop) for (int i=0; i a; ++i) p[i] = argoop; } void funky() { foobar 69 (1000); } If this is what you're asking, and the above standardeese allows it, then we have a problem, because the code parsing the j in the linear clause uses cp_parser_id_expression() which gets horrendously confused with the colon, expecting a '::'. For that matter, even doing this causes problems: In parser structure, there is a variable that called colon_corrects_to_scope_p. Set that to false right before searching for a colon. That should fix this looking for :: issue. int j = 123, k = 456; #pragma simd linear (j : k) ... Is this what you were asking about? If so, then perhaps something besides cp_parser_id_expression is in order, for the sake of the template argument or just a plain variable. Thanks, Balaji V. Iyer. Aldy
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
On Oct 16, 2013, at 9:26 AM, Tom Tromey tro...@redhat.com wrote: Andreas You could put it in .dir-locals.el, or put a hook on Andreas find-file-hook that looks at buffer-file-name. We checked in a .dir-locals.el for gdb. I recommend it for GCC as well. I've copied the gdb one and updated it for our tree: $ cat .dir-locals.el ;; Emacs settings. ;; Copyright (C) 2012-2013 Free Software Foundation, Inc. ;; This program is free software; you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation; either version 3 of the License, or ;; (at your option) any later version. ;; This program is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with this program. If not, see http://www.gnu.org/licenses/. ( (tcl-mode . ((tcl-indent-level . 4) (tcl-continued-indent-level . 4) (indent-tabs-mode . t))) (nil . ((bug-reference-url-format . http://gcc.gnu.org/PR%s;))) (c-mode . ((c-file-style . GNU) (indent-tabs-mode . t))) ) Committed revision 203715. I've tested it to ensure everything works as expected, except I didn't see exactly how to test the bug-reference… If someone knows how to test that, I'd appreciate it. If I understand the net result of this, it should make the lives of people that work on multiple trees with different formatting easier. Thanks guys. Now, if you've been manually fidgeting with tab characters, you should be able to just hit tab on the line your editing and have it do the right thing (if your emacs isn't really old).
Re: [PATCH i386 AVX2] Remove redundant expands.
On Wed, Oct 16, 2013 at 7:01 PM, Richard Henderson r...@redhat.com wrote: On 10/16/2013 09:47 AM, Uros Bizjak wrote: On Wed, Oct 16, 2013 at 6:06 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: It seems that gang of AVX* patterns were copy and pasted from SSE, however as far as they are NDD, we may remove corresponding expands which sort operands. OTOH, I have some second thoughts on removing AVX2 expanders. Please consider the situation, where we have *both* operands in memory, and the insn is inside the loop. When reload comes around, it will fixup one of the operands with a load from memory. However, having insn in the loop, I suspect the load won't be moved out of loop. So, I guess even AVX/AVX2 insn patterns should call ix86_fixup_binary_operands_*, and this fixup function should be improved to load one of the operands into register, in case both operands are in memory. This also means, that you still need expanders for AVX512 commutative multiplies. Fair enough. I have checked ix86_fixup_binary_operands and ix86_binary_operator_ok, and they should be OK for destructive (SSE2) and non-destructive (AVX) commutative instructions. Vector instructions have always destination in a register, so most of fixups and checks do not apply at all. We can probably use: { if (MEM_P (operands[1]) MEM_P (operands[2])) operands[1] = force_reg (MODEmode, operands[1]); } in expanders and !(MEM_P (operands[0]) MEM_P (operands[1])) in insn constraints for most of SSE and AVX commutative insns. Uros.
Re: [wide-int] int_traits tree
Kenneth Zadeck zad...@naturalbridge.com writes: I note that we immediately return in the above case, so if (precision xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. yes I assume it adds that extra zero word in case the MSB is set, right? yes I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? I do not think so. There are three cases here: 1) xprecision precision: say xprecision = 8 and precision = 32. The number is between 0 and 255 and after canonicalization the number will be between 0 and 255. 2) xprecision == precision not much to talk about here. 3) xprecision precision We need to loose the upper bits of the input and it does not matter what they were.The only thing that we are concerned about is that the bits above what is now the sign bit pointed to by precision are matched in the bits above precision. (3) is gone with the assert though. As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. If that's right and xprecision == precision can use val with an adjusted len, then... After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. ...the use of the scratch array goes away completely for addr_wide_int and max_wide_int. If this code is on a hot path for wide_int then maybe we should simply require that wide_int operations involving trees have explicit extensions, like in rtl. (I.e. only do the implicit extension for addr_wide_int and max_wide_int.) I'm not opposed to going back to a separate scratch array if all else fails, but it's really ugly, so I'd like to look at the alternatives first... Thanks, Richard
Merge from GCC 4.8 branch to gccgo branch
I've merged revision 203713 from the GCC 4.8 branch to the gccgo branch. Ian
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #5
On Mon, Oct 14, 2013 at 08:59:50PM -0400, David Edelsohn wrote: Please use VMX instead of AV, not because I am pedantic about IBM's name for the feature, but AV is meaningless to me and anyone else reading the code. If you need a three-letter name, use VMX. Okay with that change. I can change it to VMX, but VMX is not referenced very much in the port. For better or worse, Altivec is used everywhere. So I think I will just spell it out. I had used _av previously in rs6000_output_move_128bit. Did you want me to clean up those places as well? -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] decide edge's hotness when there is profile info
On Tue, Oct 15, 2013 at 3:39 PM, Teresa Johnson tejohn...@google.com wrote: On Tue, Oct 15, 2013 at 2:57 PM, Jan Hubicka hubi...@ucw.cz wrote: On Tue, Oct 15, 2013 at 2:03 PM, Jan Hubicka hubi...@ucw.cz wrote: Yes, that would work. So let's discard this patch because the fix for comdat can also fix this problem. Unforutnately ipa-profile-estimate is an IPA pass and as such it does not have access to profile_status_to_function. I see. I was coding that up today but hadn't tested it yet. You can probably just factor this out into a function that can be called and for normal FDO we call it where the loop stis now and for auto-FDO we can probably have another invocation from before early passes where auto-FDO is collected. Ok, let's go with that approach for now. It won't address the 0 count COMDAT calling another 0 count COMDAT problem, but I will probably just find a way to deal with this when inlining. You can still propagate, since tree-profile is an simple-ipa pass. Ok, I think I will leave that for the second patch, since I need to do some testing on the effects - i.e. the propagation I had in mind would make any 0-count routine called by a routine that was dropped to guessed to also be dropped to guessed (and no longer unlikely). It may be too aggressive, I need to check. + if (node-count) + continue; Also here we should sum the counts and consider function non unlikely executed in the same way as probably_never_executed does. I assume you mean by doing the same comparison to the number of profile-runs. Yes, this makes sense. Yes. I can prepare updated patch, but i am currently travelling, so i would not be disapointed if you beat me ;) I'm working on it, and I think based on Dehao's needs I am going to split up the patch into two phases, the one that does just the part you had sent a patch for (making sure 0 count routines with non-zero calls are marked guessed and have their node frequency set appropriately), and a subsequent one to do the count application when we inline a 0-count routine into a non-zero callsite. I'll shoot for getting this ready by tomorrow. BTW, in your original patch you are checking for both e-count or cgraph_maybe_hot_edge_p(e), but AFAICT the call to cgraph_maybe_hot_edge_p will never return true when e-count is zero. When there is a profile it will return false via maybe_hot_count_p since e-count == 0. When there is no profile it will return false when the callee has NODE_FREQUENCY_UNLIKELY_EXECUTED. So I think just checking for e-count 0 is sufficient here. I think I was checking caller count here (that is read) and the code was supposed to make functoin with hot caller to be hot... The code in your patch was just checking the edge count, not the caller count. cgraph_maybe_hot_edge_p also doesn't check the caller count, just the edge count. Do we want to make all functions with hot callers also be hot, even when the edge count is not hot? This may be to aggressive. I was simply going to make the code check e-count. Here is the patch from Honza, that I revised based on discussions and testing. Once this related patch goes in, I can change the check against the profile_info-runs that hardcodes the threshold with the new parameter proposed for probably_never_executed there: http://gcc.gnu.org/ml/gcc-patches/2013-10/msg00743.html Bootstrapped and tested on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Teresa 2013-10-16 Jan Hubicka j...@suse.cz Teresa Johnson tejohn...@google.com * predict.c (handle_missing_profiles): New function. (counts_to_freqs): Don't overwrite estimated frequencies when function has no profile counts. * predict.h (handle_missing_profiles): Declare. * tree-profile.c (tree_profiling): Invoke handle_missing_profiles. Index: predict.c === --- predict.c (revision 203633) +++ predict.c (working copy) @@ -2742,6 +2742,39 @@ estimate_loops (void) BITMAP_FREE (tovisit); } +void +handle_missing_profiles (void) +{ + struct cgraph_node *node; + /* See if 0 count function has non-0 count callers. In this case we + lost some profile. Drop its function profile to PROFILE_GUESSED. */ + FOR_EACH_DEFINED_FUNCTION (node) +{ + struct cgraph_edge *e; + gcov_type call_count = 0; + struct function *fn = DECL_STRUCT_FUNCTION (node-symbol.decl); + if (node-count) +continue; + for (e = node-callers; e; e = e-next_caller) +call_count += e-count; + if (call_count + fn fn-cfg + (call_count * 4 = profile_info-runs)) +{ + bool maybe_hot = maybe_hot_count_p (NULL, call_count); + if (dump_file) +fprintf (dump_file, + Dropping 0 profile for %s/%i. %s based on calls.\n, + cgraph_node_name (node),
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
Mike == Mike Stump mikest...@comcast.net writes: Mike I've tested it to ensure everything works as expected, except I Mike didn't see exactly how to test the bug-reference… If someone knows Mike how to test that, I'd appreciate it. If I understand the net Mike result of this, it should make the lives of people that work on Mike multiple trees with different formatting easier. Visit ChangeLog. Find a reference to a PR. M-x bug-reference-mode Click or C-c RET on the PR text If it pops up the bug in your browser, it works. There's also bug-reference-prog-mode, for programming modes. It only fontifies PR references in comments and strings. Tom
Re: [PATCH, powerpc] PR target/58673: Fix power8 quad memory/no-vsx-timode interaction (revised)
On Mon, Oct 14, 2013 at 08:51:15PM -0400, David Edelsohn wrote: As Ken Zadeck and Richard Sandiford noticed this morning, the F constraint seems to be a cut-and-paste error because the F constraint corresponds to an immediate scalar or vector float value, which never will match TImode. Should this constraint be n? The GCC testsuite apparently is missing a test to ensure that a large TImode immediate value can be loaded on targets that support int128_t. I'll try n. It probably should be something that allows CONST_DOUBLE ints as well as CONST_INT. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: [PATCH] Reassociate X == CST1 || X == CST2 if popcount (CST2 - CST1) == 1 into ((X - CST1) ~(CST2 - CST1)) == 0
On Wed, 16 Oct 2013, Jeff Law wrote: What's the policy on the GDFL stuff. I've tried so hard to avoid having to worry about that rats nest that I have no idea what our policy is. Basically I just want to take the old docs for BRANCH_COST and re-use those. Is that considered kosher with all the GDFL nonsense? When moving documentation text from the manual into target.def (so it ends up in both target.def and tm.texi, with tm.texi.in just having an @hook line to show where the regeneration of tm.texi should put the text), CC one of the people listed as docstring relicensing maintainers and ask us to approve the GPL relicensing in that case. -- Joseph S. Myers jos...@codesourcery.com
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On 10/15/13 15:30, Wei Mi wrote: Aren't you just trying to see if we have a comparison feeding the conditional jump and if they're already adjacent? Do you actually need to get the condition code regs to do that test? Yes, I am trying to see if we have a comparison feeding the conditional jump and if they're already adjacent. Do you have more easier way to do that test? Can't you just look at the last insn in the block and if it's a conditional peek at the previous insn and see if it sets CC mode register? Hmm, I guess that's effectively what you're doing, I guess I was just surprised by the need to first get the fixed_condition_code_regs as I expected you to just extract them from the conditional jump. But thinking a bit more about it now your solution seems rather clean. static void haifa_init_only_bb (basic_block, basic_block); diff --git a/gcc/sched-rgn.c b/gcc/sched-rgn.c index e1a2dce..156359e 100644 --- a/gcc/sched-rgn.c +++ b/gcc/sched-rgn.c @@ -2443,6 +2443,8 @@ add_branch_dependences (rtx head, rtx tail) cc0 setters remain at the end because they can't be moved away from their cc0 user. + Predecessors of SCHED_GROUP_P instructions at the end remain at the end. + COND_EXEC insns cannot be moved past a branch (see e.g. PR17808). Insns setting TARGET_CLASS_LIKELY_SPILLED_P registers (usually return @@ -2465,7 +2467,8 @@ add_branch_dependences (rtx head, rtx tail) #endif || (!reload_completed sets_likely_spilled (PATTERN (insn) -|| NOTE_P (insn)) +|| NOTE_P (insn) +|| (last != 0 SCHED_GROUP_P (last))) { if (!NOTE_P (insn)) { This looks like a straighforward bugfix and probably should go forward independent of this enhancement. Ok, I will separate it into another patch. Go ahead and consider that pre-approved. Just send it to the list with a note that I approved it in this thread. Jeff
Re: [PING] 3 patches waiting for approval/review
On 10/11/13 11:23, Andreas Krebbel wrote: On 10/10/13 18:41, Jeff Law wrote: On 10/10/13 04:00, Andreas Krebbel wrote: On 09/10/13 21:46, Jeff Law wrote: On 08/21/13 03:21, Andreas Krebbel wrote: [RFC] Allow functions calling mcount before prologue to be leaf functions http://gcc.gnu.org/ml/gcc-patches/2013-04/msg00993.html I don't think this is necessarily correct for all targets. ISTM the ability to consider a function calling mcount as a leaf needs to be a property of the target. We have already profile_before_prologue as a target property. Shouldn't this be enough to decide upon this? When a function calls mcount before the prologue it shouldn't matter whether the function is leaf or not. I don't think so, I think it'd break the PA's 32 bit ABI, maybe the 64 bit ABI as well. It's the caller's responsibility to build a mini stack frame if the function makes any calls. If the code in the prologue expander uses leafness to make the decision about whether or not to allocate the mini frame, then it'd do the wrong thing here. Since it seems to be about PROFILE_HOOKS vs FUNCTION_PROFILER targets what about the following patch: It's not about PROFILE_HOOKS vs FUNCTION_PROFILER, but the ABI and how the backend changes its behavior on the return value of leaf_function_p. Effectively you want to have functions which are not leaf functions (they call mcount) be treated as leaf functions. I have great concerns about the safety of allowing that without exports on each port weighing in on its correctness for their port. I still really feel this should be a target hook that is off by default so that the target maintainers can audit their target to ensure it operates correctly. Maybe I'm missing something, so perhaps another approach. Can you explain why you think it is safe to ignore calls to mcount when trying to determine if a function is a leaf or not? BTW, there is a hunk of this patch that can and should go forward now rather than later. Specifically removing the profile_arc_flag part of the test. If you wanted to push that forward immediately, I'd support that and you should consider it pre-approved. jeff
Re: [PATCH v4 04/20] add configury
Gerald == Gerald Pfeifer ger...@pfeifer.com writes: Gerald % mkdir /scratch2/tmp/gerald/OBJ-1014-1920 Gerald % cd /scratch2/tmp/gerald/OBJ-1014-1920 Gerald % $GCC_SOURCE/configure --prefix=/home/gerald/something Gerald % gmake bootstrap-lean Gerald % gmake install Gerald Wait a minute! Gerald It _looks_ like I can get this to succeed using gmake bootstrap Gerald and have the very same script fail using gmake bootstrap-lean. Gerald Can you reproduce that? Yeah, the bootstrap-lean + install combination fails for me. Tom
Re: [wide-int] int_traits tree
On 10/16/2013 02:45 PM, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: I note that we immediately return in the above case, so if (precision xprecision + HOST_BITS_PER_WIDE_INT) { len = wi::force_to_size (scratch, val, len, xprecision, precision, UNSIGNED); return wi::storage_ref (scratch, len, precision); } applies only when the desired precision is a multiple of a HWI. yes I assume it adds that extra zero word in case the MSB is set, right? yes I am confused about the condition written here and how we look at precision and not xprecision when deciding how to extend - given a 8-bit precision unsigned 0xff and precision == 64 we do not even consider sign-extending the value because we look at precision and not xprecision. Don't we need to look at xprecision here? I do not think so. There are three cases here: 1) xprecision precision: say xprecision = 8 and precision = 32. The number is between 0 and 255 and after canonicalization the number will be between 0 and 255. 2) xprecision == precision not much to talk about here. 3) xprecision precision We need to loose the upper bits of the input and it does not matter what they were.The only thing that we are concerned about is that the bits above what is now the sign bit pointed to by precision are matched in the bits above precision. (3) is gone with the assert though. you seem to be correct here. As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. If that's right and xprecision == precision can use val with an adjusted len, then... After all an assert precision == xprecision does not work in this routine. Quickly execute.exp tested only. To avoid code quality wreckage we have to implement a different way of allocating the 'scratch' storage of wide_int_ref_storage (well, ideally we wouldn't need it ...). I thought of simply allocating it via alloca (), but we need to do that in all callers that build a wide_int_ref (eventually via a hidden default arg). But as that's a template specialization of generic_wide_int ... so the option is to provide a function wrapper inside wi:: for this, like I want richard and mike to be the people who respond to the next point.I am not a c++ person and all of this storage manager layer stuff gives me a headache. ...the use of the scratch array goes away completely for addr_wide_int and max_wide_int. If this code is on a hot path for wide_int then maybe we should simply require that wide_int operations involving trees have explicit extensions, like in rtl. (I.e. only do the implicit extension for addr_wide_int and max_wide_int.) i do not understand how this is true. I'm not opposed to going back to a separate scratch array if all else fails, but it's really ugly, so I'd like to look at the alternatives first... Thanks, Richard
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
On 16/10/13 19:32, Mike Stump wrote: Committed revision 203715. Awesome! Can I backport your patch to 4_8? -- PMatos
Re: [PATCH, powerpc] Rework#2 VSX scalar floating point support, patch #5
On Wed, Oct 16, 2013 at 3:00 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: On Mon, Oct 14, 2013 at 08:59:50PM -0400, David Edelsohn wrote: Please use VMX instead of AV, not because I am pedantic about IBM's name for the feature, but AV is meaningless to me and anyone else reading the code. If you need a three-letter name, use VMX. Okay with that change. I can change it to VMX, but VMX is not referenced very much in the port. For better or worse, Altivec is used everywhere. So I think I will just spell it out. I had used _av previously in rs6000_output_move_128bit. Did you want me to clean up those places as well? Yes, please remove uses of av because I really don't think that it is meaningful or informative to someone reading the source code. If you change that use to _vmx, then VMX will be used more in the code. I'm not going to insist on VMX, but it never will be used more in the code if we always avoid using it. Thanks, David
Re: [PATCH, powerpc] PR target/58673: Fix power8 quad memory/no-vsx-timode interaction (revised)
On Wed, Oct 16, 2013 at 3:08 PM, Michael Meissner meiss...@linux.vnet.ibm.com wrote: I'll try n. It probably should be something that allows CONST_DOUBLE ints as well as CONST_INT. The 'n' constraint matches CONST_INT and CONST_DOUBLE. - David
Re: [PATCH] Cilk Keywords (_Cilk_spawn and _Cilk_sync) for C (and C++)
On 09/11/13 12:18, Iyer, Balaji V wrote: Hello Everyone, Couple weeks back, I had submitted a patch for review that will implement Cilk keywords (_Cilk_spawn and _Cilk_sync) into the C compiler. I recently finished C++ implementation also. In this email, I am attaching 2 patches: 1 for C (and the common parts for C and C++) and 1 for C++. The C++ Changelog is labelled cp-ChangeLog.cilkplus and the other one is just ChangeLog.cilkplus. There isn't much changes in the C patch. Only noticeable changes would be moving functions to the common parts so that C++ can use them. It passes all the tests and does not affect (by affect I mean fail a passing test or pass a failing one) any of the other tests in the testsuite directory. Is this Ok for trunk? This is not a complete review, but obviously I am starting to look at the patch and figured you could start addressing issues as I find them. I usually start by trying to filter out all the stuff that is fairly benign, so I'm looking at code in a fairly random order. You'll also note some items are just questions I'd like you to answer and don't (at this time) require any code changes -- they just help me understand everything. I'm also not looking at the C++ bits -- my familiarity with the C++ front-end is minimal at best and I'd probably do more harm than good reviewing that code. In gcc/Makefile.in, since the original submission of your patch we have integrated automatic dependency generation. As a result several hunks to provide dependencies for cilk.o and update dependencies for other objects are no longer needed. This is true for hte various Make-lang.in files as well. builtins.c: + if (flag_enable_cilkplus (!strcmp (name, __cilkrts_detach) + || !strcmp (name, __cilkrts_pop_frame))) Formatting nit. if (fubar (com | baz)) in cppbuiltin.c, presumably the __cilk predefine is some kind of version #? I'm going to assume that __clik is the same #define that ICC sets up, correct? In function.h: + /* This will indicate whether a function is a cilk function */ + unsigned int is_cilk_function : 1; Doesn't this really mean calls into the cilk runtime? In ira.c: + /* We need a frame pointer for all Cilk Plus functions that use + Cilk keywords. */ + || (flag_enable_cilkplus cfun-is_cilk_function) Can you explain to me a bit more why you need a frame pointer? I'm trying to determine if it's best to leave this as-is or have this code detect a property in the generated code for the function. From a modularity standpoint it seems pretty gross that we have to peek at this within IRA. In a couple places I saw this comment: + /* Cilk keywords currently need to replace some variables that + ordinary nested functions do not. */ + bool remap_var_for_cilk; I didn't see anywhere that explained exactly why some variables that do not ordinarily need replacing need replacing when cilk is enabled. If it's in the patch somewhere, just point me to it. If not, add documentation about why these variables need remapping for cilk. In gimplify.c: + /* Implicit _Cilk_sync must be inserted right before any return statement + if there is a _Cilk_spawn in the function. If the user has provided a + _Cilk_sync, the optimizer should remove this duplicate one. */ + if (fn_contains_cilk_spawn_p (cfun)) +{ + tree impl_sync = build0 (CILK_SYNC_STMT, void_type_node); + gimplify_and_add (impl_sync, pre_p); +} + Does anything actually ensure we don't have multiple syncs? What's the thinking behind parsing calls to cilk_spawn as a normal call if there's an error? Referring to this code in gimplify.c: + case CILK_SPAWN_STMT: + gcc_assert + (fn_contains_cilk_spawn_p (cfun) + lang_hooks.cilkplus.cilk_detect_spawn_and_unwrap (expr_p)); + if (!seen_error ()) + { + ret = (enum gimplify_status) + lang_hooks.cilkplus.gimplify_cilk_spawn (expr_p, pre_p, +post_p); + break; + } + /* If errors are seen, then just process it as a CALL_EXPR. */ + Meta-question, when we're not in cilk mode, should we be consuming the cilk tokens? I'm not familiar at all with our parser, so I'm not sure if we can handle this gracefully. Though I guess parsing hte token and warning/error if not in Cilk mode is probably the best course of action. Can you take a look at calls.c::special_function_p and determine if we need to do something special for spawn here? More tomorrow... jeff
Re: [PATCH][i386]Fix PR 57756
On Tue, Oct 15, 2013 at 10:54 PM, Alan Modra amo...@gmail.com wrote: On Tue, Oct 15, 2013 at 02:45:23PM -0700, Sriraman Tallam wrote: I committed this patch after making the above change. /src/gcc-virgin/gcc/config/rs6000/rs6000.c: At global scope: /src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion from ‘void (*)(cl_target_option*)’ to ‘void (*)(cl_target_option*, gcc_options*)’ [-fpermissive] /src/gcc-virgin/gcc/config/rs6000/rs6000.c:31122:29: error: invalid conversion from ‘void (*)(cl_target_option*)’ to ‘void (*)(gcc_options*, cl_target_option*)’ [-fpermissive] This patch fixes it, ok to submit? * config/rs6000/rs6000.c (rs6000_function_specific_save): New unused parameter. (rs6000_function_specifc_restore): Ditto. Index: config/rs6000/rs6000.c === --- config/rs6000/rs6000.c (revision 203711) +++ config/rs6000/rs6000.c (working copy) @@ -29995,7 +29995,8 @@ rs6000_set_current_function (tree fndecl) /* Save the current options */ static void -rs6000_function_specific_save (struct cl_target_option *ptr) +rs6000_function_specific_save (struct cl_target_option *ptr, + struct gcc_options *ARG_UNUSED (opts)) { ptr-x_rs6000_isa_flags = rs6000_isa_flags; ptr-x_rs6000_isa_flags_explicit = rs6000_isa_flags_explicit; @@ -30004,7 +30005,8 @@ static void /* Restore the current options */ static void -rs6000_function_specific_restore (struct cl_target_option *ptr) +rs6000_function_specific_restore (struct gcc_options *ARG_UNUSED (opts), + struct cl_target_option *ptr) { rs6000_isa_flags = ptr-x_rs6000_isa_flags; rs6000_isa_flags_explicit = ptr-x_rs6000_isa_flags_explicit; Thanks Sri -- Alan Modra Australia Development Lab, IBM
C++ PATCH for c++/57850 (-fdump-translation-unit and PCH)
Early exit from cp_write_global_declarations in the PCH case completely broke -fdump-translation-unit; the simple fix is just to do the dump on the PCH path as well. Tested x86_64-pc-linux-gnu, applying to trunk and 4.8. commit 106091d492179afa32d3042c4c74fb2ae6c440f4 Author: Jason Merrill ja...@redhat.com Date: Thu Sep 12 18:33:21 2013 -0400 PR c++/57850 * decl2.c (dump_tu): Split out from... (cp_write_global_declarations): ...here. Call it in PCH mode. diff --git a/gcc/cp/decl2.c b/gcc/cp/decl2.c index 4ebc155..5e5f5e8 100644 --- a/gcc/cp/decl2.c +++ b/gcc/cp/decl2.c @@ -4041,6 +4041,22 @@ handle_tls_init (void) expand_or_defer_fn (finish_function (0)); } +/* The entire file is now complete. If requested, dump everything + to a file. */ + +static void +dump_tu (void) +{ + int flags; + FILE *stream = dump_begin (TDI_tu, flags); + + if (stream) +{ + dump_node (global_namespace, flags ~TDF_SLIM, stream); + dump_end (TDI_tu, stream); +} +} + /* This routine is called at the end of compilation. Its job is to create all the code needed to initialize and destroy the global aggregates. We do the destruction @@ -4071,6 +4087,7 @@ cp_write_global_declarations (void) if (pch_file) { c_common_write_pch (); + dump_tu (); return; } @@ -4457,16 +4474,7 @@ cp_write_global_declarations (void) /* The entire file is now complete. If requested, dump everything to a file. */ - { -int flags; -FILE *stream = dump_begin (TDI_tu, flags); - -if (stream) - { - dump_node (global_namespace, flags ~TDF_SLIM, stream); - dump_end (TDI_tu, stream); - } - } + dump_tu (); if (flag_detailed_statistics) {
C++ Patch for core issue 1591 (deduction of array bound from initializer list)
If we're deducing an array type from an initializer list, we can also deduce the bound. Tested x86_64-pc-linux-gnu, applying to trunk. commit c1effcb5fe5b63628a4a7ef75c96d9415ee18f20 Author: Jason Merrill ja...@redhat.com Date: Mon Sep 23 06:15:25 2013 -0500 Core 1591 * pt.c (unify_array_domain): Split out from unify. (unify): Use it for list deduction, too. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index eed648a..a1eb7a2 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -17007,6 +17007,72 @@ unify_pack_expansion (tree tparms, tree targs, tree packed_parms, return unify_success (explain_p); } +/* Handle unification of the domain of an array. PARM_DOM and ARG_DOM are + INTEGER_TYPEs representing the TYPE_DOMAIN of ARRAY_TYPEs. The other + parameters and return value are as for unify. */ + +static int +unify_array_domain (tree tparms, tree targs, + tree parm_dom, tree arg_dom, + bool explain_p) +{ + tree parm_max; + tree arg_max; + bool parm_cst; + bool arg_cst; + + /* Our representation of array types uses N - 1 as the + TYPE_MAX_VALUE for an array with N elements, if N is + not an integer constant. We cannot unify arbitrarily + complex expressions, so we eliminate the MINUS_EXPRs + here. */ + parm_max = TYPE_MAX_VALUE (parm_dom); + parm_cst = TREE_CODE (parm_max) == INTEGER_CST; + if (!parm_cst) +{ + gcc_assert (TREE_CODE (parm_max) == MINUS_EXPR); + parm_max = TREE_OPERAND (parm_max, 0); +} + arg_max = TYPE_MAX_VALUE (arg_dom); + arg_cst = TREE_CODE (arg_max) == INTEGER_CST; + if (!arg_cst) +{ + /* The ARG_MAX may not be a simple MINUS_EXPR, if we are + trying to unify the type of a variable with the type + of a template parameter. For example: + + template unsigned int N + void f (char () [N]); + int g(); + void h(int i) { + char a[g(i)]; + f(a); + } + + Here, the type of the ARG will be int [g(i)], and + may be a SAVE_EXPR, etc. */ + if (TREE_CODE (arg_max) != MINUS_EXPR) + return unify_vla_arg (explain_p, arg_dom); + arg_max = TREE_OPERAND (arg_max, 0); +} + + /* If only one of the bounds used a MINUS_EXPR, compensate + by adding one to the other bound. */ + if (parm_cst !arg_cst) +parm_max = fold_build2_loc (input_location, PLUS_EXPR, +integer_type_node, +parm_max, +integer_one_node); + else if (arg_cst !parm_cst) +arg_max = fold_build2_loc (input_location, PLUS_EXPR, + integer_type_node, + arg_max, + integer_one_node); + + return unify (tparms, targs, parm_max, arg_max, + UNIFY_ALLOW_INTEGER, explain_p); +} + /* Deduce the value of template parameters. TPARMS is the (innermost) set of template parameters to a template. TARGS is the bindings for those template parameters, as determined thus far; TARGS may @@ -17092,13 +17158,17 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, flag_deduce_init_list) parm = listify (parm); - if (!is_std_init_list (parm)) + if (!is_std_init_list (parm) + TREE_CODE (parm) != ARRAY_TYPE) /* We can only deduce from an initializer list argument if the - parameter is std::initializer_list; otherwise this is a - non-deduced context. */ + parameter is std::initializer_list or an array; otherwise this + is a non-deduced context. */ return unify_success (explain_p); - elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0); + if (TREE_CODE (parm) == ARRAY_TYPE) + elttype = TREE_TYPE (parm); + else + elttype = TREE_VEC_ELT (CLASSTYPE_TI_ARGS (parm), 0); FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg), i, elt) { @@ -17121,6 +17191,15 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, explain_p); } + if (TREE_CODE (parm) == ARRAY_TYPE) + { + /* Also deduce from the length of the initializer list. */ + tree max = size_int (CONSTRUCTOR_NELTS (arg)); + tree idx = compute_array_index_type (NULL_TREE, max, tf_none); + return unify_array_domain (tparms, targs, TYPE_DOMAIN (parm), + idx, explain_p); + } + /* If the std::initializer_listT deduction worked, replace the deduced A with std::initializer_listA. */ if (orig_parm != parm) @@ -17494,63 +17573,8 @@ unify (tree tparms, tree targs, tree parm, tree arg, int strict, RECUR_AND_CHECK_FAILURE (tparms, targs, TREE_TYPE (parm), TREE_TYPE (arg), strict UNIFY_ALLOW_MORE_CV_QUAL, explain_p); if (TYPE_DOMAIN (parm) != NULL_TREE) - { - tree parm_max; - tree arg_max; - bool parm_cst; - bool arg_cst; - - /* Our representation of array types uses N - 1 as the - TYPE_MAX_VALUE for an array with N elements, if N is - not an integer constant. We cannot unify arbitrarily - complex expressions, so we eliminate the MINUS_EXPRs - here. */ - parm_max = TYPE_MAX_VALUE (TYPE_DOMAIN (parm)); - parm_cst = TREE_CODE (parm_max) == INTEGER_CST; -
[Patch] Fix undefined behaviors in regex
It's detected by `clang++ -g -std=c++11 -fsanitize=undefined` but not detected by g++ with flags `g++ -g -std=c++11 -fsanitize=undefined -lpthread`. Am I on the right way? -m32 and -m64 tested :) Thanks! -- Tim Shen a.patch Description: Binary data
C++ PATCH to printing 'noexcept' in diagnostics
Printing noexcept (true) is redundant. Tested x86_64-pc-linux-gnu, applying to trunk. commit 5305dc0cdb0736a0c56ce3d126b4baad0521f096 Author: Jason Merrill ja...@redhat.com Date: Thu Oct 3 15:35:11 2013 -0400 * error.c (dump_exception_spec): Print noexcept rather than noexcept (true). diff --git a/gcc/cp/error.c b/gcc/cp/error.c index 0c3cead..3f6f594 100644 --- a/gcc/cp/error.c +++ b/gcc/cp/error.c @@ -1595,13 +1595,16 @@ dump_exception_spec (cxx_pretty_printer *pp, tree t, int flags) if (t TREE_PURPOSE (t)) { pp_cxx_ws_string (pp, noexcept); - pp_cxx_whitespace (pp); - pp_cxx_left_paren (pp); - if (DEFERRED_NOEXCEPT_SPEC_P (t)) - pp_cxx_ws_string (pp, uninstantiated); - else - dump_expr (pp, TREE_PURPOSE (t), flags); - pp_cxx_right_paren (pp); + if (!integer_onep (TREE_PURPOSE (t))) + { + pp_cxx_whitespace (pp); + pp_cxx_left_paren (pp); + if (DEFERRED_NOEXCEPT_SPEC_P (t)) + pp_cxx_ws_string (pp, uninstantiated); + else + dump_expr (pp, TREE_PURPOSE (t), flags); + pp_cxx_right_paren (pp); + } } else if (t) {
Small C++ PATCH to apply_late_template_attributes
While reviewing a recent patch I noticed that we weren't using attribute_takes_identifier_p here. Fixed. Tested x86_64-pc-linux-gnu, applying to trunk. commit 021fe309badd55090a5bc212c44d24b7c23f37af Author: Jason Merrill ja...@redhat.com Date: Fri Oct 11 10:38:23 2013 -0400 * pt.c (apply_late_template_attributes): Use attribute_takes_identifier_p. diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index a1eb7a2..e9c3ce2 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -8610,10 +8610,7 @@ apply_late_template_attributes (tree *decl_p, tree attributes, int attr_flags, pass it through tsubst. Attributes like mode, format, cleanup and several target specific attributes expect it unmodified. */ - else if (TREE_VALUE (t) - TREE_CODE (TREE_VALUE (t)) == TREE_LIST - TREE_VALUE (TREE_VALUE (t)) - identifier_p (TREE_VALUE (TREE_VALUE (t + else if (attribute_takes_identifier_p (TREE_PURPOSE (t))) { tree chain = tsubst_expr (TREE_CHAIN (TREE_VALUE (t)), args, complain,
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
On Oct 16, 2013, at 1:51 PM, Paulo J. Matos pa...@matos-sorge.com wrote: On 16/10/13 19:32, Mike Stump wrote: Committed revision 203715. Awesome! Can I backport your patch to 4_8? First, we like to wait and let patches bake on mainline before considering back porting, this has no bake time yet in our tree. Second, only patches that impact quality enough should be back ported. I tend to think that this one does not impact quality enough to worry about. Also, you should develop on trunk, not 4_8. Arguably, I would say no. Now, a release manager can always review approve it; it should be very, very low risk. If you have your own private tree, you can also place it in your tree. If you merely have a checkout, you can place it into your checkout. This should get you 90% of the benefit. Additionally, you can remove customizations from your environment that conflict with the style set in that file, and instead, bundle such a file with the projects that want a non-default style. The default style works well with GNU software (as long as you don't edit .md files).
Re: [Patch] Fix undefined behaviors in regex
On 10/16/2013 11:48 PM, Tim Shen wrote: It's detected by `clang++ -g -std=c++11 -fsanitize=undefined` but not detected by g++ with flags `g++ -g -std=c++11 -fsanitize=undefined -lpthread`. Am I on the right way? -m32 and -m64 tested :) Thank you! Paolo.
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
On Wed, Oct 16, 2013 at 2:02 AM, Richard Biener rguent...@suse.de wrote: On Tue, 15 Oct 2013, Cong Hou wrote: Thank you for your reminder, Jeff! I just noticed Richard's comment. I have modified the patch according to that. The new patch is attached. (posting patches inline is easier for review, now you have to deal with no quoting markers ;)) Comments inline. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..2637309 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-10-15 Cong Hou co...@google.com + + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant + statement that contains data refs with zero-step. + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..9d0f4a5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-15 Cong Hou co...@google.com + + * gcc.dg/vect/pr58508.c: New test. + 2013-10-14 Tobias Burnus bur...@net-b.de PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c b/gcc/testsuite/gcc.dg/vect/pr58508.c new file mode 100644 index 000..cb22b50 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details } */ + + +/* The GCC vectorizer generates loop versioning for the following loop + since there may exist aliasing between A and B. The predicate checks + if A may alias with B across all iterations. Then for the loop in + the true body, we can assert that *B is a loop invariant so that + we can hoist the load of *B before the loop body. */ + +void foo (int* a, int* b) +{ + int i; + for (i = 0; i 10; ++i) +a[i] = *b + 1; +} + + +/* { dg-final { scan-tree-dump-times hoist 2 vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..f4fdec2 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo, adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } Note that applying this kind of transform at this point invalidates some of the earlier analysis the vectorizer performed (namely the def-kind which now effectively gets vect_external_def from vect_internal_def). In this case it doesn't seem to cause any issues (we re-compute the def-kind everytime we need it (how wasteful)). + /* Extract load and store statements on pointers with zero-stride + accesses. */ + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) +{ + /* In the loop body, we iterate each statement to check if it is a load +or store. Then we check the DR_STEP of the data reference. If +DR_STEP is zero, then we will hoist the load statement to the loop +preheader, and move the store statement to the loop exit. */ We don't move the store yet. Micha has a patch pending that enables vectorization of zero-step stores. + for (gimple_stmt_iterator si = gsi_start_bb (loop-header); + !gsi_end_p (si);) While technically ok now (vectorized loops contain a single basic block) please use LOOP_VINFO_BBS () to get at the vector of basic-blcoks and iterate over them like other code does. Have done it. + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + if (dr integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + hoist the statement to outside of the loop ); hoisting out of the vectorized loop: + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, \n); + } + + gsi_remove (si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), stmt); Note that this will result in a bogus VUSE on the stmt at this point which will be only fixed because of implementation details of loop versioning. Either get the correct VUSE from the loop header virtual PHI node preheader edge (if there is none then the current VUSE is the correct one to use) or clear it. I just cleared the VUSE since I noticed that after the vectorization pass the correct VUSE is reassigned to the load. + } + /* TODO: We also consider vectorizing loops containing zero-step +data refs as writes. For example:
Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
On Oct 16, 2013, at 12:03 PM, Tom Tromey tro...@redhat.com wrote: Visit ChangeLog. Find a reference to a PR. M-x bug-reference-mode Click or C-c RET on the PR text Tested, confirmed it works as expected; thanks.
Re: [Patch] Fix undefined behaviors in regex
On Wed, Oct 16, 2013 at 6:13 PM, Paolo Carlini paolo.carl...@oracle.com wrote: Thank you! Committed. Thanks! -- Tim Shen
Re: [Patch] Fix undefined behaviors in regex
On 10/17/2013 12:29 AM, Tim Shen wrote: On Wed, Oct 16, 2013 at 6:13 PM, Paolo Carlini paolo.carl...@oracle.com wrote: Thank you! Committed. By the way, please feel free to prepare a reduced testcase for the -fsanitize people (Marek, Jakub?) Thanks, Paolo.
Re: [Patch] Fix undefined behaviors in regex
On Wed, Oct 16, 2013 at 6:33 PM, Paolo Carlini paolo.carl...@oracle.com wrote: By the way, please feel free to prepare a reduced testcase for the -fsanitize people (Marek, Jakub?) Here it is. And we should get undefined behaviors before this commit(r203732). -- Tim Shen split.cc Description: Binary data
Re: [Patch] Fix undefined behaviors in regex
On 10/17/2013 12:39 AM, Tim Shen wrote: On Wed, Oct 16, 2013 at 6:33 PM, Paolo Carlini paolo.carl...@oracle.com wrote: By the way, please feel free to prepare a reduced testcase for the -fsanitize people (Marek, Jakub?) Here it is. And we should get undefined behaviors before this commit(r203732). To be honest, I was thinking something much smaller than the whole regex ;) But let's add Marek in CC. Paolo.
[rl78] add rtx paranoia
Building newlib uncovered a few invalid assumptions... Committed. * config/rl78/rl78.c (rl78_alloc_address_registers_macax): Verify op is a REG before checking REGNO. (rl78_alloc_physical_registers): Verify pattern is a SET before checking SET_SRC. Index: config/rl78/rl78.c === --- config/rl78/rl78.c (revision 203732) +++ config/rl78/rl78.c (working copy) @@ -3047,13 +3047,14 @@ rl78_alloc_address_registers_macax (rtx replace_in_op0 = (op 0 rtx_equal_p (OP (op), OP (0))); replace_in_op1 = (op 1 rtx_equal_p (OP (op), OP (1))); OP (op) = transcode_memory_rtx (OP (op), HL, insn); if (op == 2 MEM_P (OP (op)) - (REGNO (XEXP (OP (op), 0)) == SP_REG + ((GET_CODE (XEXP (OP (op), 0)) == REG + REGNO (XEXP (OP (op), 0)) == SP_REG) || (GET_CODE (XEXP (OP (op), 0)) == PLUS REGNO (XEXP (XEXP (OP (op), 0), 0)) == SP_REG))) { emit_insn_before (gen_movhi (HL, gen_rtx_REG (HImode, SP_REG)), insn); OP (op) = replace_rtx (OP (op), gen_rtx_REG (HImode, SP_REG), HL); } @@ -3137,13 +3138,14 @@ rl78_alloc_physical_registers (void) pattern = XVECEXP (pattern, 0, 0); if (JUMP_P (insn) || CALL_P (insn) || GET_CODE (pattern) == CALL) clear_content_memory (); if (GET_CODE (pattern) != SET GET_CODE (pattern) != CALL) continue; - if (GET_CODE (SET_SRC (pattern)) == ASM_OPERANDS) + if (GET_CODE (pattern) == SET + GET_CODE (SET_SRC (pattern)) == ASM_OPERANDS) continue; valloc_method = get_attr_valloc (insn); PATTERN (insn) = copy_rtx_if_shared (PATTERN (insn));