Re: [PATCH 2/6] Introduce new edge_summary class and replace ipa_edge_args_sum.
On 07/16/2015 04:05 PM, Martin Liška wrote: On 07/10/2015 03:31 PM, Martin Jambor wrote: Hi, thanks for working on this and sorry for a tad late review: On Thu, Jul 09, 2015 at 11:13:52AM +0200, Martin Liska wrote: gcc/ChangeLog: 2015-07-03 Martin Liska mli...@suse.cz * cgraph.c (symbol_table::create_edge): Introduce summary_uid for cgraph_edge. * cgraph.h (struct GTY): Likewise. struct GTY does not look right :-) * ipa-inline-analysis.c (estimate_function_body_sizes): Use new data structure. * ipa-profile.c (ipa_profile): Likewise. * ipa-prop.c (ipa_print_node_jump_functions): Likewise. (ipa_propagate_indirect_call_infos): Likewise. (ipa_free_edge_args_substructures): Likewise. (ipa_free_all_edge_args): Likewise. (ipa_edge_args_t::remove): Likewise. (ipa_edge_removal_hook): Likewise. (ipa_edge_args_t::duplicate): Likewise. (ipa_register_cgraph_hooks): Likewise. (ipa_unregister_cgraph_hooks): Likewise. * ipa-prop.h (ipa_check_create_edge_args): Likewise. (ipa_edge_args_info_available_for_edge_p): Likewise. Definition of ipa_edge_args_t is missing here. * symbol-summary.h (gt_ggc_mx): Indent properly. (gt_pch_nx): Likewise. (edge_summary): New class. --- gcc/cgraph.c | 2 + gcc/cgraph.h | 5 +- gcc/ipa-inline-analysis.c | 2 +- gcc/ipa-profile.c | 2 +- gcc/ipa-prop.c| 71 +++- gcc/ipa-prop.h| 44 ++ gcc/symbol-summary.h | 208 +- 7 files changed, 252 insertions(+), 82 deletions(-) ... diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h index e6725aa..f0af9b2 100644 --- a/gcc/ipa-prop.h +++ b/gcc/ipa-prop.h @@ -493,13 +493,36 @@ public: extern ipa_node_params_t *ipa_node_params_sum; /* Vector of IPA-CP transformation data for each clone. */ extern GTY(()) vecipcp_transformation_summary, va_gc *ipcp_transformations; -/* Vector where the parameter infos are actually stored. */ -extern GTY(()) vecipa_edge_args, va_gc *ipa_edge_args_vector; + +/* Function summary for ipa_node_params. */ +class GTY((user)) ipa_edge_args_t: public edge_summary ipa_edge_args * +{ +public: + ipa_edge_args_t (symbol_table *symtab): +edge_summary ipa_edge_args* (symtab, true) { } + + static ipa_edge_args_t *create_ggc (symbol_table *symtab) + { Please move the body of this function to where the bodies of the rest of the member functions are. +ipa_edge_args_t *summary = new (ggc_cleared_alloc ipa_edge_args_t ()) + ipa_edge_args_t (symtab); +return summary; + } + + /* Hook that is called by summary when a node is duplicated. */ + virtual void duplicate (cgraph_edge *edge, + cgraph_edge *edge2, + ipa_edge_args *data, + ipa_edge_args *data2); + + virtual void remove (cgraph_edge *edge, ipa_edge_args *data); +}; + +extern GTY(()) edge_summary ipa_edge_args * *ipa_edge_args_sum; /* Return the associated parameter/argument info corresponding to the given node/edge. */ #define IPA_NODE_REF(NODE) (ipa_node_params_sum-get (NODE)) -#define IPA_EDGE_REF(EDGE) ((*ipa_edge_args_vector)[(EDGE)-uid]) +#define IPA_EDGE_REF(EDGE) (ipa_edge_args_sum-get (EDGE)) /* This macro checks validity of index returned by ipa_get_param_decl_index function. */ #define IS_VALID_JUMP_FUNC_INDEX(I) ((I) != -1) @@ -532,19 +555,8 @@ ipa_check_create_node_params (void) static inline void ipa_check_create_edge_args (void) { - if (vec_safe_length (ipa_edge_args_vector) - = (unsigned) symtab-edges_max_uid) -vec_safe_grow_cleared (ipa_edge_args_vector, symtab-edges_max_uid + 1); -} - -/* Returns true if the array of edge infos is large enough to accommodate an - info for EDGE. The main purpose of this function is that debug dumping - function can check info availability without causing reallocations. */ - -static inline bool -ipa_edge_args_info_available_for_edge_p (struct cgraph_edge *edge) -{ - return ((unsigned) edge-uid vec_safe_length (ipa_edge_args_vector)); + if (ipa_edge_args_sum == NULL) +ipa_edge_args_sum = ipa_edge_args_t::create_ggc (symtab); } static inline ipcp_transformation_summary * diff --git a/gcc/symbol-summary.h b/gcc/symbol-summary.h index eefbfd9..5799443 100644 --- a/gcc/symbol-summary.h +++ b/gcc/symbol-summary.h @@ -108,7 +108,7 @@ public: /* Allocates new data that are stored within map. */ T* allocate_new () { -return m_ggc ? new (ggc_alloc T ()) T() : new T () ; +return m_ggc ? new (ggc_alloc T ()) T () : new T () ; } /* Release an item that is stored within map. */ @@ -234,7 +234,7 @@ private: template typename T void -gt_ggc_mx(function_summaryT ** const summary)
Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3
Le 03/08/2015 14:36, Paul Richard Thomas a écrit : Dear Mikael, Thanks for your green light! I have been mulling over the trans-decl part of the patch and having been wondering if it is necessary. You mean marking entities as public? Or setting the hidden visibility attribute? Or both? I think both are necessary. Without optimization, private entities can be linked to. Given the discussion concerning the combination of submodules and private entities, I wonder if this is not sufficient? Within submodule scope, an advisory could be given for undefined references to suggest recompiling the module without optimization or making the entities public. About recompiling without optimization: If the module contains no code, I guess that would be OK. But otherwise, it would be pretty bad. And one would have to do the same for submodules of a submodule: the parent submodule would be compiled without optimization. :-( About making the entities public: I think the goal of submodules is providing a way to specify a (hopefully) stable interface free of any internal implementation details that users would start playing with if the opportunity was given to them. Making all entities public would go against that. I've been reading about the hidden visibility attribute since you submitted the 3/3 patch(es). I think it's the right thing. :-) Mikael
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On 03/08/15 16:45, Uros Bizjak wrote: On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I don't think this is a good idea. In the expander, there is already quite some target-dependent code that goes great length to utilize sbb insn as much as possible, before cmove is used. IMO, as far as x86 is concerned, the best solution would be to revert the change. ix86_expand_int_movcc already does some tricks from your patch in a target-efficient way. Generic change that was introduced by your patch now interferes with this expansion. Well, technically the transformation was already there, it was just never reached for an x86 compilation because noce_try_cmove was tried in front of it and used a target-specific expansion. In any case, how's this proposal? The transformation noce_try_store_flag_constants /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ Is a catch-all-immediates transformation in noce_try_store_flag_constants. What if we moved it to noce_try_cmove and performed it only if the target-specific conditional move expansion there failed? That way we can try the x86_64-specific sequence first and still give the opportunity to noce_try_store_flag_constants to perform the transformations that can benefit targets that don't have highly specific conditional move expanders. Yes, let's try this approach. As was found out, some targets (e.g. x86) hide lots of different target-dependent expansion strategies into movcc expander. Perhaps this fact should be documented in the comment in the generic code? Ok, I'll work on that approach and add a comment. Thanks, Kyrill Uros.
Re: [PATCH] Build *-match.o as early as possible
On August 3, 2015 5:03:05 PM GMT+02:00, Segher Boessenkool seg...@kernel.crashing.org wrote: The two files *-match.o files always finish building last, so if we start building them as soon as possible (instead of pretty late) the total build time will be less on a parallel build. Bootstrapped and tested on powerpc64-linux. Is this okay for trunk? OK. Richard. Segher 2014-080-3 Segher Boessenkool seg...@kernel.crashing.org * Makefile.in (OBJS): Put gimple-match.o and generic-match.o first. --- gcc/Makefile.in | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index be259e8..683c42a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1162,10 +1162,12 @@ C_COMMON_OBJS = c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o \ c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o # Language-independent object files. -# We put the insn-*.o files first so that a parallel make will build -# them sooner, because they are large and otherwise tend to be the -# last objects to finish building. +# We put the *-match.o and insn-*.o files first so that a parallel make +# will build them sooner, because they are large and otherwise tend to be +# the last objects to finish building. OBJS = \ + gimple-match.o \ + generic-match.o \ insn-attrtab.o \ insn-automata.o \ insn-dfatab.o \ @@ -1260,8 +1262,6 @@ OBJS = \ gimple-fold.o \ gimple-laddress.o \ gimple-low.o \ - gimple-match.o \ - generic-match.o \ gimple-pretty-print.o \ gimple-ssa-isolate-paths.o \ gimple-ssa-strength-reduction.o \
Re: [BUILDROBOT] Go runtime: calling ‘__builtin_frame_address’ with a nonzero argument is unsafe
On 08/03/2015 08:48 AM, Martin Sebor wrote: On 08/03/2015 05:55 AM, Jan-Benedict Glaw wrote: On Sun, 2015-08-02 17:15:27 -0600, Martin Sebor mse...@gmail.com wrote: OK for the trunk. Sorry for the delay. Thank you. Committed in revision 226480. ...und breaks native builds. When doing builds using config-list.mk, I first build a GCC for the build machine, then re-build a cross-configured GCC with that. I don't know if pragma GCC diagnostic is valid in Go (still waiting for my build to finish to confirm) but disabling the warning in cases where the calls are known to be safe should fix the compilation error. My understanding is the runtime is shared across gcc-go and golang. So the push/pop diagnostics may not be appropriate. Ian Taylor should have the final say about the best way forward. Jeff
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On 03/08/15 15:15, Uros Bizjak wrote: On Mon, Aug 3, 2015 at 3:43 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 03/08/15 14:37, Uros Bizjak wrote: On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 03/08/15 13:33, Uros Bizjak wrote: Hello! 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. This patch regressed following tests on x86_64: FAIL: gcc.target/i386/cmov2.c scan-assembler sbb FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3] Sorry for that :( I could have sworn they passed for me during my run... While cmov3 case is questionable by itself in light of PR56309 [1], the cnov2 case regressed from: cmpl%edi, %esi sbbl%eax, %eax andl$-10, %eax addl$5, %eax ret to: xorl%eax, %eax cmpl%esi, %edi setbe %al negl%eax andl$10, %eax subl$5, %eax ret Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated anymore. Non-QImode setcc instructions are problematic on x86, since they need to be zero-extended to their full length. IMO, we can check if the target is able to generate insn that implements conditional move between 0 and -1 for certain comparison mode, like: #(insn:TI 27 26 28 2 (parallel [ #(set (reg:SI 0 ax [orig:87 D.1845 ] [87]) #(if_then_else:SI (ltu:SI (reg:CC 17 flags) #(const_int 0 [0])) #(const_int -1 [0x]) #(const_int 0 [0]))) #(clobber (reg:CC 17 flags)) #]) cmov2.c:9 947 {*x86_movsicc_0_m1} # (expr_list:REG_DEAD (reg:CC 17 flags) #(expr_list:REG_UNUSED (reg:CC 17 flags) #(nil sbbl%eax, %eax # 27*x86_movsicc_0_m1 [length = 2] this is the key insn, and as shown above, further asm sequence is similar between patched and unpatched compiler. Hmm yes. We have a HAVE_conditional_move check, but that's not fine grained enough. How about trying an emit_conditional_move and checking that the result is a single insn? Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I don't think this is a good idea. In the expander, there is already quite some target-dependent code that goes great length to utilize sbb insn as much as possible, before cmove is used. IMO, as far as x86 is concerned, the best solution would be to revert the change. ix86_expand_int_movcc already does some tricks from your patch in a target-efficient way. Generic change that was introduced by your patch now interferes with this expansion. Well, technically the transformation was already there, it was just never reached for an x86 compilation because noce_try_cmove was tried in front of it and used a target-specific expansion. In any case, how's this proposal? The transformation noce_try_store_flag_constants /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ Is a catch-all-immediates transformation in noce_try_store_flag_constants. What if we moved it to noce_try_cmove and performed it only if the target-specific conditional move expansion there failed? That way we can try the x86_64-specific sequence first and still give the opportunity to noce_try_store_flag_constants to perform the transformations that can benefit targets that don't have highly specific conditional move expanders. Kyrill Uros.
Re: C++ delayed folding branch review
On 08/03/2015 05:42 AM, Kai Tietz wrote: 2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com: On 07/31/2015 05:54 PM, Kai Tietz wrote: The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could remove, but for one case in constexpr. Without folding we don't do type-sinking/raising. Right. So binary/unary operations might be containing cast, which were in the past unexpected. Why aren't the casts folded away? On such cast constructs, as for this vector-sample, we can't fold away Which testcase is this? the cast chain. The difference here to none-delayed-folding branch is that the cast isn't moved out of the plus-expr. What we see now is (plus ((vec) (const vector ...) { }), ...). Before we had (vec) (plus (const vector ...) { ... }). How could a PLUS_EXPR be considered a reduced constant, regardless of where the cast is? On verify_constant we check by reduced_constant_expression_p, if value is a constant. We don't handle here, that NOP_EXPRs are something we want to look through here, as it doesn't change anything if this is a constant, or not. NOPs around constants should have been folded away by the time we get there. Not in this cases, as the we actually have here a switch from const to none-const. So there is an attribute-change, which we can't ignore in general. I wasn't suggesting we ignore it, we should be able to change the type of the vector_cst. But I agree that for constexpr's we could special case cast from const to none-const (as required in expressions like const vec v = v + 1). Right. But really this should happen in convert.c, it shouldn't be specific to C++. Jason
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I don't think this is a good idea. In the expander, there is already quite some target-dependent code that goes great length to utilize sbb insn as much as possible, before cmove is used. IMO, as far as x86 is concerned, the best solution would be to revert the change. ix86_expand_int_movcc already does some tricks from your patch in a target-efficient way. Generic change that was introduced by your patch now interferes with this expansion. Well, technically the transformation was already there, it was just never reached for an x86 compilation because noce_try_cmove was tried in front of it and used a target-specific expansion. In any case, how's this proposal? The transformation noce_try_store_flag_constants /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ Is a catch-all-immediates transformation in noce_try_store_flag_constants. What if we moved it to noce_try_cmove and performed it only if the target-specific conditional move expansion there failed? That way we can try the x86_64-specific sequence first and still give the opportunity to noce_try_store_flag_constants to perform the transformations that can benefit targets that don't have highly specific conditional move expanders. Yes, let's try this approach. As was found out, some targets (e.g. x86) hide lots of different target-dependent expansion strategies into movcc expander. Perhaps this fact should be documented in the comment in the generic code? Uros.
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
2015-08-03 17:04 GMT+03:00 Uros Bizjak ubiz...@gmail.com: On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 03/08/15 13:33, Uros Bizjak wrote: Hello! 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. This patch regressed following tests on x86_64: FAIL: gcc.target/i386/cmov2.c scan-assembler sbb FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3] The difference for cmov3.c on x86_64 is: cmpl%esi, %edi movl$-5, %edx movl$5, %eax cmovg %edx, %eax ret vs. new code: xorl%eax, %eax cmpl%esi, %edi setle %al negl%eax andl$10, %eax subl$5, %eax ret I'm not sure old code is really better than new. HJ, do you have any better insight? Uros. The original code looks better, tree height is just 2 and therefore it can be executed in 2 cycles. New code has more dependencies and tree height becomes 5. It is always hard to say for all x86 targets but as a generic code the original version is better. Thanks, Ilya
[patch] libstdc++/67078 change _N bad name in new container access functions
We can't use _N as an identifier, as per https://gcc.gnu.org/onlinedocs/libstdc++/manual/source_code_style.html Tested ppc64le-linux, committed to trunk. commit ae65b96cfb69c64d636f86d9f9154da2364e0440 Author: Jonathan Wakely jwak...@redhat.com Date: Mon Aug 3 16:29:44 2015 +0100 PR libstdc++/67078 * include/bits/range_access.h (size, empty, data): Fix _N bad name. diff --git a/libstdc++-v3/include/bits/range_access.h b/libstdc++-v3/include/bits/range_access.h index 2a10598..586d162 100644 --- a/libstdc++-v3/include/bits/range_access.h +++ b/libstdc++-v3/include/bits/range_access.h @@ -237,10 +237,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @brief Return the size of an array. * @param __array Array. */ - template typename _Tp, size_t _N + template typename _Tp, size_t _Nm constexpr size_t -size(const _Tp (/*__array*/)[_N]) noexcept -{ return _N; } +size(const _Tp (/*__array*/)[_Nm]) noexcept +{ return _Nm; } /** * @brief Return whether a container is empty. @@ -255,9 +255,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @brief Return whether an array is empty (always false). * @param __array Container. */ - template typename _Tp, size_t _N + template typename _Tp, size_t _Nm constexpr bool -empty(const _Tp (/*__array*/)[_N]) noexcept +empty(const _Tp (/*__array*/)[_Nm]) noexcept { return false; } /** @@ -291,9 +291,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION * @brief Return the data pointer of an array. * @param __array Array. */ - template typename _Tp, size_t _N + template typename _Tp, size_t _Nm constexpr _Tp* -data(_Tp (__array)[_N]) noexcept +data(_Tp (__array)[_Nm]) noexcept { return __array; } /**
Re: [BUILDROBOT] Go runtime: calling ‘__builtin_frame_address’ with a nonzero argument is unsafe
On 08/03/2015 05:55 AM, Jan-Benedict Glaw wrote: On Sun, 2015-08-02 17:15:27 -0600, Martin Sebor mse...@gmail.com wrote: OK for the trunk. Sorry for the delay. Thank you. Committed in revision 226480. ...und breaks native builds. When doing builds using config-list.mk, I first build a GCC for the build machine, then re-build a cross-configured GCC with that. I don't know if pragma GCC diagnostic is valid in Go (still waiting for my build to finish to confirm) but disabling the warning in cases where the calls are known to be safe should fix the compilation error. Index: runtime/mprof.goc === --- runtime/mprof.goc(revision 226505) +++ runtime/mprof.goc(working copy) @@ -404,10 +404,15 @@ func Stack(b Slice, all bool) (n int) { byte *pc, *sp; bool enablegc; - + +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored -Wframe-address + sp = runtime_getcallersp(b); pc = (byte*)(uintptr)runtime_getcallerpc(b); +#pragma GCC diagnostic pop + if(all) { runtime_semacquire(runtime_worldsema, false); runtime_m()-gcing = 1; Martin While building GCC targeting the build=host machine, I get ie: http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=459572 /bin/bash ./libtool --tag=CC --mode=compile /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/xgcc -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/ -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/bin/ -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/lib/ -isystem /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/include -isystem /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/sys-include -DHAVE_CONFIG_H -I. -I/home/jbglaw/repos-configlist_mk/gcc/libgo -I /home/jbglaw/repos-configlist_mk/gcc/libgo/runtime -I/home/jbglaw/repos-configlist_mk/gcc/libgo/../libffi/include -I../libffi/include -pthread -fexceptions -fnon-call-exceptions -fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE -D_LAR GEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -I /home/jbglaw/repos-configlist_mk/gcc/libgo/../libgcc -I /home/jbglaw/repos-configlist_mk/gcc/libgo/../libbacktrace -I ../../gcc/include -g -O2 -MT mprof.lo -MD -MP -MF .deps/mprof.Tpo -c -o mprof.lo mprof.c libtool: compile: /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/xgcc -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/build-gcc/native-compiler-build/./gcc/ -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/bin/ -B/home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/lib/ -isystem /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/include -isystem /home/jbglaw/build-configlist_mk/rs6000-ibm-aix5.1.0/_install_/x86_64-pc-linux-gnu/sys-include -DHAVE_CONFIG_H -I. -I/home/jbglaw/repos-configlist_mk/gcc/libgo -I /home/jbglaw/repos-configlist_mk/gcc/libgo/runtime -I/home/jbglaw/repos-configlist_mk/gcc/libgo/../libffi/include -I../libffi/include -pthread -fexceptions -fnon-call-exceptions -fplan9-extensions -fsplit-stack -Wall -Wextra -Wwrite-strings -Wcast-qual -Werror -minline-all-stringops -D_GNU_SOURCE -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BIT S=64 -I /home/jbglaw/repos-configlist_mk/gcc/libgo/../libgcc -I /home/jbglaw/repos-configlist_mk/gcc/libgo/../libbacktrace -I ../../gcc/include -g -O2 -MT mprof.lo -MD -MP -MF .deps/mprof.Tpo -c mprof.c -fPIC -DPIC -o .libs/mprof.o /home/jbglaw/repos-configlist_mk/gcc/libgo/runtime/mprof.goc: In function ‘runtime_Stack’: /home/jbglaw/repos-configlist_mk/gcc/libgo/runtime/mprof.goc:408:5: error: calling ‘__builtin_frame_address’ with a nonzero argument is unsafe [-Werror=frame-address] sp = runtime_getcallersp(b); ^ cc1: all warnings being treated as errors Makefile:2613: recipe for target 'mprof.lo' failed make[4]: *** [mprof.lo] Error 1 Seems Go's runtime uses that feature. MfG, JBG
[gomp4] OpenACC first private
I've committed this patch to gomp4. The existing implementation of firstprivate presumes the existence of memory at the CTA level. This patch does away with that, treating firstprivate as thread-private variables initialized from the host. During development there was some fallout from declare handling, as that wasn't creating the expected omp_region context object. The previous handling of firstprivate just happened to work. Jim has been working on resolving that problem. nathan 2015-08-03 Nathan Sidwell nat...@codesourcery.com * gimplify.c (GOVD_GANGLOCAL): Delete. (oacc_default_clause): Only derereference reference types. Mark firstprivate as GOVD_FIRSTPRIVATE. (gimplify_adjust_omp_clauses_1): Remove GANGLOCALL handling. (gimplify_omp_for): Remove bogus OpenACC outer context lookup. * omp-low.c (build_outer_var_ref): Simplify openacc outer ref lookup. (scan_sharing_clauses): Handle openacc firstprivate. (lower_omp_target): Handle openacc firstprivate. c/ * c-parser.c (c_parser_oacc_data_clause): Remove firstprivate handling. (c_parser_oac_all_clauses): Firstpribsste is a firstprivate clause. * c-typeck.c (c_finish_omp_clauses): Remove GANGLOCAL handling. fortran/ * trans-openmp.c (gfc_trans_omp_clauses_1): Remove GANGLOCAL handling. * gfortran.h (OMP_MAP_GANGLOCAL): Delete. (OMP_MAP_FORCE_TO_GANGLOCAL): Likewise. * openmp.c (gfc_match_omp_clauses): Remove openacc specific firstprivate handling. testsuite/ * gfortran.dg/goacc/parallel-tree.f95: Remove ganglocal expectation. * gfortran.dg/goacc/list.f95: Stop expected firstprivate to be a data clause. * c-c++-common/goacc/firstprivate.c: Likewise. cp/ * semantics.c (finish_omp_clauses): Remove OpenACC-specific firstprivate handling. * parser.c (cp_parser_oacc_data_clause): Remove firstprivate here. (cp_parser_oacc_all_clauses): First private is a firstprivate clause. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 226462) +++ gcc/gimplify.c (working copy) @@ -94,9 +94,6 @@ enum gimplify_omp_var_data GOVD_FORCE_MAP = 1 16, - /* Gang-local OpenACC variable. */ - GOVD_GANGLOCAL = 1 17, - /* OpenACC deviceptr clause. */ GOVD_USE_DEVPTR = 1 18, @@ -5937,14 +5934,13 @@ oacc_default_clause (struct gimplify_omp if (is_global_var (decl) device_resident_p (decl)) flags |= GOVD_MAP_TO_ONLY | GOVD_MAP; else if (ctx-acc_region_kind == ARK_KERNELS) - /* Scalars under kernels are default 'copy'. */ + /* Everything under kernels are default 'copy'. */ flags |= GOVD_FORCE_MAP | GOVD_MAP; else if (ctx-acc_region_kind == ARK_PARALLEL) { tree type = TREE_TYPE (decl); - /* Should this be REFERENCE_TYPE_P? */ - if (POINTER_TYPE_P (type)) + if (TREE_CODE (type) == REFERENCE_TYPE) type = TREE_TYPE (type); if (AGGREGATE_TYPE_P (type)) @@ -5952,12 +5948,12 @@ oacc_default_clause (struct gimplify_omp flags |= GOVD_MAP; else /* Scalars default to 'firstprivate'. */ - flags |= GOVD_GANGLOCAL | GOVD_MAP_TO_ONLY | GOVD_MAP; + flags |= GOVD_FIRSTPRIVATE; } else gcc_unreachable (); } - break; +break; } return flags; @@ -6812,10 +6808,7 @@ gimplify_adjust_omp_clauses_1 (splay_tre else if (code == OMP_CLAUSE_MAP) { OMP_CLAUSE_SET_MAP_KIND (clause, - flags GOVD_MAP_TO_ONLY - ? (flags GOVD_GANGLOCAL - ? GOMP_MAP_FORCE_TO_GANGLOCAL - : GOMP_MAP_TO) + flags GOVD_MAP_TO_ONLY ? GOMP_MAP_TO : (flags GOVD_FORCE_MAP ? GOMP_MAP_FORCE_TOFROM : GOMP_MAP_TOFROM)); @@ -7542,11 +7535,7 @@ gimplify_omp_for (tree *expr_p, gimple_s else if (omp_is_private (gimplify_omp_ctxp, decl, 0)) omp_notice_variable (gimplify_omp_ctxp, decl, true); else - { - if (ork == ORK_OACC gimplify_omp_ctxp-outer_context) - omp_notice_variable (gimplify_omp_ctxp-outer_context, decl, true); - omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN); - } + omp_add_variable (gimplify_omp_ctxp, decl, GOVD_PRIVATE | GOVD_SEEN); /* If DECL is not a gimple register, create a temporary variable to act as an iteration counter. This is valid, since DECL cannot be Index: gcc/c/c-parser.c === --- gcc/c/c-parser.c (revision 226462) +++ gcc/c/c-parser.c (working copy) @@ -10719,9 +10719,6 @@ c_parser_oacc_data_clause (c_parser *par case PRAGMA_OACC_CLAUSE_DEVICE_RESIDENT: kind = GOMP_MAP_DEVICE_RESIDENT; break; -case PRAGMA_OACC_CLAUSE_FIRSTPRIVATE: - kind = GOMP_MAP_FORCE_TO_GANGLOCAL; - break; case PRAGMA_OACC_CLAUSE_HOST: kind = GOMP_MAP_FORCE_FROM; break; @@ -12316,7 +12313,7 @@ c_parser_oacc_all_clauses (c_parser *par c_name = deviceptr; break; case PRAGMA_OACC_CLAUSE_FIRSTPRIVATE: - clauses =
[gomp4] Fix ptx warning
I've committed this to gomp4. I was reusing a size_t variable in a loop where unsigned would do, and fprintf got upset. nathan 2015-08-03 Nathan Sidwell nat...@codesourcery.com * config/nvptx/mkoffload.c (process): Avoid printf formatting warning. Index: config/nvptx/mkoffload.c === --- config/nvptx/mkoffload.c (revision 226504) +++ config/nvptx/mkoffload.c (working copy) @@ -225,16 +225,16 @@ access_check (const char *name, int mode static void process (FILE *in, FILE *out) { - size_t len; + size_t len = 0; const char *input = read_file (in, len); const char *comma; id_map const *id; unsigned obj_count = 0; - size_t i; + unsigned ix; /* Dump out char arrays for each PTX object file. These are terminated by a NUL. */ - for (i = 0; i != len;) + for (size_t i = 0; i != len;) { char c; @@ -280,8 +280,8 @@ process (FILE *in, FILE *out) const char *code;\n __SIZE_TYPE__ size;\n } ptx_objs[] = {); - for (comma = , i = 0; i != obj_count; comma = ,, i++) -fprintf (out, %s\n\t{ptx_code_%u, sizeof (ptx_code_%u)}, comma, i, i); + for (comma = , ix = 0; ix != obj_count; comma = ,, ix++) +fprintf (out, %s\n\t{ptx_code_%u, sizeof (ptx_code_%u)}, comma, ix, ix); fprintf (out, \n};\n\n); /* Dump out variable idents. */
[PING][PATCH][AArch64] Change aarch64 vector cost to match vectorizer
Hi, Ping for the patch submitted here https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02251.html Thanks, Pawel Kupidura
Re: Fix two more memory leaks in threader
On 07/20/2015 08:19 AM, James Greenhalgh wrote: I think we either want to defer the unordered_remove until we're done processing all the vector elements, or make sure to look at element 'i' again after we've moved something new in to it. Correct. Two loops had this mistake -- while others got it right. Sigh. The easiest fix is to change how we increment i in those loops so that it's only incremented if we do not delete the path. That happens to also match how other cases are handled. Thanks for catching these mistakes. A testcase would need to expose at least two threads which we later want to cancel, one of which ends up at the end of the vector of threading opportunities. We should then see only the first of those threads actually get cancelled, and the second skipped over. Reproducing these conditions is quite tough, which has stopped me finding a useful example for the list, I'll be sure to follow-up this thread if I do get to one. Thankfully BZ had two closely related testcases. Bootstrapped and regression tested on x86_64-unknown-linux-gnu. Installed on the trunk. (Unfortunately it didn't resolve the ppc64 issue I was looking at ;( Jeff diff --git a/gcc/ChangeLog b/gcc/ChangeLog index fb9d115..9abde9f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,11 @@ +2015-08-03 Jeff Law l...@redhat.com + + PR middle-end/66314 + PR gcov-profile/66899 + * tree-ssa-threadupdate.c (mark_threaded_blocks): Correctly + iterate over the jump threading paths when an element in the + jump threading paths array is eliminated. + 2015-08-03 Segher Boessenkool seg...@kernel.crashing.org * Makefile.in (OBJS): Put gimple-match.o and generic-match.o first. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index a403767..0a841b5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,10 @@ +2015-08-03 Jeff Law l...@redhat.com + + PR middle-end/66314 + PR gcov-profile/66899 + * gcc.dg/pr66899.c: New test. + * gcc.dg/pr66314.c: New test. + 2015-08-03 Marek Polacek pola...@redhat.com PR c/67088 diff --git a/gcc/testsuite/gcc.dg/pr66314.c b/gcc/testsuite/gcc.dg/pr66314.c new file mode 100644 index 000..f74ab5b --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr66314.c @@ -0,0 +1,33 @@ +/* { dg-do compile } */ +/* { dg-options -std=gnu89 -Os -fprofile-arcs -fsanitize=kernel-address } */ + +char *a; +int d; + +static int +fn1 (int b, int c) +{ + while (a) +if (*a) + return -126; + if (b) +return -12; + if (c == -12) +return c; +} + +void +fn2 (int b, int c) +{ + for (;;) +{ + d = fn1 (b, c); + switch (d) +{ +case -126: + continue; +default: + return; +} +} +} diff --git a/gcc/testsuite/gcc.dg/pr66899.c b/gcc/testsuite/gcc.dg/pr66899.c new file mode 100644 index 000..1fff181 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr66899.c @@ -0,0 +1,41 @@ +/* { dg-do compile } */ +/* { dg-options -Os -fprofile-arcs } */ + +struct +{ + int authority; +} * a, *b, c, d; +int e, f; +static int +fn1 () +{ + if (a) +goto verified; + if (b) +goto matched; + return -126; +matched: + e = 0; +verified: + if (b) +for (; c != b; c = d) + ; + return 0; +} + +int +fn2 () +{ + for (;;) +{ + f = fn1 (); + switch (f) +{ +case -126: + continue; +default: + return 0; +} +} +} + diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c index 31ddf25..5a5f8df 100644 --- a/gcc/tree-ssa-threadupdate.c +++ b/gcc/tree-ssa-threadupdate.c @@ -2130,7 +2130,7 @@ mark_threaded_blocks (bitmap threaded_blocks) cases where the second path starts at a downstream edge on the same path). First record all joiner paths, deleting any in the unexpected case where there is already a path for that incoming edge. */ - for (i = 0; i paths.length (); i++) + for (i = 0; i paths.length ();) { vecjump_thread_edge * *path = paths[i]; @@ -2140,6 +2140,7 @@ mark_threaded_blocks (bitmap threaded_blocks) if ((*path)[0]-e-aux == NULL) { (*path)[0]-e-aux = path; + i++; } else { @@ -2149,10 +2150,15 @@ mark_threaded_blocks (bitmap threaded_blocks) delete_jump_thread_path (path); } } + else + { + i++; + } } + /* Second, look for paths that have any other jump thread attached to them, and either finish converting them or cancel them. */ - for (i = 0; i paths.length (); i++) + for (i = 0; i paths.length ();) { vecjump_thread_edge * *path = paths[i]; edge e = (*path)[0]-e; @@ -2167,7 +2173,10 @@ mark_threaded_blocks (bitmap threaded_blocks) /* If we iterated through the entire path without exiting the loop, then we are good to go,
Re: [PATCH][AArch64] Change aarch64 vector cost to match vectorizer
On Mon, Jul 27, 2015 at 02:22:41PM +0100, Pawel Kupidura wrote: diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 10df325..ffafc3f 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,7 @@ +2015-07-27 Pawel Kupidura pawel.kupid...@arm.com Two spaces between your name and your email address, like so: 2015-07-27 Pawel Kupidura pawel.kupid...@arm.com + +* config/aarch64/aarch64.c: Changed inner loop statement cost +to be consistent with vectorizer code. + s/Changed/Change 2015-07-26 Uros Bizjak ubiz...@gmail.com * config/alpha/alpha.c: Use SUBREG_P predicate. diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 020f63c..3b6f8c5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -7079,15 +7079,9 @@ aarch64_add_stmt_cost (void *data, int count, enum vect_cost_for_stmt kind, /* Statements in an inner loop relative to the loop being vectorized are weighted more heavily. The value here is - a function (linear for now) of the loop nest level. */ + arbitrary and could potentially be improved with analysis. */ Your mail client has mangled the tabs in this diff, so the patch will not apply in this form. Could you try posting again having resolved the issues with your mail client? if (where == vect_body stmt_info stmt_in_inner_loop_p (stmt_info)) -{ - loop_vec_info loop_info = STMT_VINFO_LOOP_VINFO (stmt_info); - struct loop *loop = LOOP_VINFO_LOOP (loop_info); - unsigned nest_level = loop_depth (loop); - - count *= nest_level; -} +count *= 50; /* FIXME */ Likewise here. Thanks, James
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On 08/03/2015 09:50 AM, Ilya Enkovich wrote: The original code looks better, tree height is just 2 and therefore it can be executed in 2 cycles. New code has more dependencies and tree height becomes 5. It is always hard to say for all x86 targets but as a generic code the original version is better. Agreed. Reducing tree height is definitely a good thing as a general rule. jeff
[gomp4] Remove superfluous code
I've committed this to gomp4 branch. It tidies up the error message processing and removes some code that shows a remarkable confusion about how shared libraries work. nathan -- Nathan Sidwell - Director, Sourcery Services - Mentor Embedded 2015-08-03 Nathan Sidwell nat...@codesourcery.com libgomp/ * plugin/plugin-nvptx.c: Don't include dlfcn.h. (cuda_errlist): Constify. (errmsg): Move into ... (cuda_error): ... here. Make smaller. (_XSTR, _STR): Delete. (cuda_synames): Delete. (verify_device_library): Delete. (nvptx_init): Don't call it. Index: libgomp/plugin/plugin-nvptx.c === --- libgomp/plugin/plugin-nvptx.c (revision 226533) +++ libgomp/plugin/plugin-nvptx.c (working copy) @@ -43,16 +43,15 @@ #include stdint.h #include string.h #include stdio.h -#include dlfcn.h #include unistd.h #include assert.h #define ARRAYSIZE(X) (sizeof (X) / sizeof ((X)[0])) -static struct +static const struct { CUresult r; - char *m; + const char *m; } cuda_errlist[]= { { CUDA_ERROR_INVALID_VALUE, invalid value }, @@ -109,9 +108,7 @@ static struct { CUDA_ERROR_UNKNOWN, unknown } }; -static char errmsg[128]; - -static char * +static const char * cuda_error (CUresult r) { int i; @@ -119,12 +116,14 @@ cuda_error (CUresult r) for (i = 0; i ARRAYSIZE (cuda_errlist); i++) { if (cuda_errlist[i].r == r) - return cuda_errlist[i].m[0]; + return cuda_errlist[i].m; } - sprintf (errmsg[0], unknown result code: %5d, r); + static char errmsg[30]; + + snprintf (errmsg, sizeof (errmsg), unknown error code: %d, r); - return errmsg[0]; + return errmsg; } static unsigned int instantiated_devices = 0; @@ -383,74 +382,6 @@ static struct ptx_event *ptx_events; static struct ptx_device **ptx_devices; -#define _XSTR(s) _STR(s) -#define _STR(s) #s - -static struct _synames -{ - char *n; -} cuda_symnames[] = -{ - { _XSTR (cuCtxCreate) }, - { _XSTR (cuCtxDestroy) }, - { _XSTR (cuCtxGetCurrent) }, - { _XSTR (cuCtxPushCurrent) }, - { _XSTR (cuCtxSynchronize) }, - { _XSTR (cuDeviceGet) }, - { _XSTR (cuDeviceGetAttribute) }, - { _XSTR (cuDeviceGetCount) }, - { _XSTR (cuEventCreate) }, - { _XSTR (cuEventDestroy) }, - { _XSTR (cuEventQuery) }, - { _XSTR (cuEventRecord) }, - { _XSTR (cuInit) }, - { _XSTR (cuLaunchKernel) }, - { _XSTR (cuLinkAddData) }, - { _XSTR (cuLinkComplete) }, - { _XSTR (cuLinkCreate) }, - { _XSTR (cuMemAlloc) }, - { _XSTR (cuMemAllocHost) }, - { _XSTR (cuMemcpy) }, - { _XSTR (cuMemcpyDtoH) }, - { _XSTR (cuMemcpyDtoHAsync) }, - { _XSTR (cuMemcpyHtoD) }, - { _XSTR (cuMemcpyHtoDAsync) }, - { _XSTR (cuMemFree) }, - { _XSTR (cuMemFreeHost) }, - { _XSTR (cuMemGetAddressRange) }, - { _XSTR (cuMemHostGetDevicePointer) }, - { _XSTR (cuMemHostRegister) }, - { _XSTR (cuMemHostUnregister) }, - { _XSTR (cuModuleGetFunction) }, - { _XSTR (cuModuleLoadData) }, - { _XSTR (cuStreamDestroy) }, - { _XSTR (cuStreamQuery) }, - { _XSTR (cuStreamSynchronize) }, - { _XSTR (cuStreamWaitEvent) } -}; - -static int -verify_device_library (void) -{ - int i; - void *dh, *ds; - - dh = dlopen (libcuda.so, RTLD_LAZY); - if (!dh) -return -1; - - for (i = 0; i ARRAYSIZE (cuda_symnames); i++) -{ - ds = dlsym (dh, cuda_symnames[i].n); - if (!ds) -return -1; -} - - dlclose (dh); - - return 0; -} - static inline struct nvptx_thread * nvptx_thread (void) { @@ -631,16 +562,11 @@ static bool nvptx_init (void) { CUresult r; - int rc; int ndevs; if (instantiated_devices != 0) return true; - rc = verify_device_library (); - if (rc 0) -return false; - r = cuInit (0); if (r != CUDA_SUCCESS) GOMP_PLUGIN_fatal (cuInit error: %s, cuda_error (r));
Re: [PING] Re: [PATCH] c/66516 - missing diagnostic on taking the address of a builtin function
I suspect the back end or even the middle end route isn't going to work even if there was enough context to diagnose the problem expressions because some of them will have been optimized away by then (e.g., 'if ( __builtin_foo != 0)' is optimized into if (1) by gimple). I was thinking that if they're optimized away, they aren't problematic anymore; that was part of the attraction for me of handling this lower down. Yes, they're not problematic in that they don't cause linker unsats. I have a few concerns with the approach of accepting or rejecting constructs based on optimization (e.g., that it might lead to similar problems as with -Wmaybe-uninitialized; or that it could be perceived as inconsistent). But it may not be as bad as it seems -- let me look into it if only as a learning exercise and see how it goes. I've prototyped this approach in a couple places in the middle end. Both implementations are very simple and work great when the code isn't optimized away. The problem is that the optimizations done at various points between the front end and the final gimplification make the diagnostics inconsistent. For instance, with F being the type of the builtin, this is diagnosed in the first prototype: F* foo () { if (0) return __builtin_trap; return 0; } but this isn't: F* bar () { return 0 ? __builtin_trap : 0; } because the ternary ?: expression is folded into a constant by the front end even before it reaches the gimplifier, while the if-else statement isn't folded until the control flow graph is built. (As an aside: I'm wondering why that is. Why have the front end do any optimization at all if the middle can and does them too, and better?) Diagnosing neither of these cases might perhaps seem like the way to go since they are both safe (don't result in a linker error). I implemented this approach in the second prototype in a later pass but that exposed even more inconsistencies as even more code is optimized (for instance, exception handling). I think this sort of inconsistency in diagnosing semantically equivalent language constructs would be viewed as arbitrary and be unhelpful to users. It's too bad because otherwise this solution would have been quite elegant and simple (touching just one function). Unless you have some other ideas I'm going to abandon this approach see if the original patch can be simplified by consoldating some of the handling into mark_rvalue_use. Martin PS Another problem, one that in my view sinks the middle end approach, is the fact that references in unused static inline functions aren't diagnosed either, even if calling the inline function would result in taking the address of the builtin function. E.g., this isn't diagnosed: static inline F* baz (void) { return __builtin_trap; } unless baz() is called. This would make the feature largely ineffective in C++ template libraries.
[PATCH], PR target/67071, Improve easy_altivec_constants on PowerPC
In preparing the next IEEE 128-bit floating point patch, I needed a quick way to load -0.0q into a vector registers (i.e. just the MSB set). I originally had a special purpose insn to load this value, but I decided to widen it to allow the easy_altivec_constant support to generate constants where you use the VSLDOI instruction to create the bottom part of all 0's or all 1's. When I started doing the coding, I noticed that the current support to load vectors with the MSB set in each element no longer worked, because the test assumed the constant was stored as an unsigned value, and we now store a sign extended number. I raised PR 67071 about this, and this patch fixes that problem and adds more patterns of vector constants that can be loaded without using memory. I have built this on both big endian power7 and little endian power8 machines with no regressions. Can I install the patch? I would also like to backport this patch to the active branches. [gcc] 2015-08-03 Michael Meissner meiss...@linux.vnet.ibm.com PR target/67071 * config/rs6000/predicates.md (easy_vector_constant_vsldoi): New predicate to allow construction of vector constants using the VSLDOI vector shift instruction. * config/rs6000/rs6000-protos.h (vspltis_shifted): Add declaration. * config/rs6000/rs6000.c (vspltis_shifted): New function to return the number of bytes to be shifted left and filled in with either all zero or all one bits. (gen_easy_altivec_constant): Call vsplitis_shifted if no other methods exist. (output_vec_const_move): On power8, generate XXLORC to generate a vector constant with all 1's. Do a split if we need to use a VSLDOI instruction. * config/rs6000/rs6000.h (EASY_VECTOR_MSB): Use mode mask to properly test for the MSB. * config/rs6000/altivec.md (VSLDOI splitter): Add splitter for vector constants that can be created with VSLDOI. [gcc/testsuite] 2015-08-03 Michael Meissner meiss...@linux.vnet.ibm.com PR target/67071 * gcc.target/powerpc/pr67071-1.c: New file to test PR 67071 new vector constants. * gcc.target/powerpc/pr67071-2.c: Likewise. * gcc.target/powerpc/pr67071-3.c: Likewise. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797 Index: gcc/config/rs6000/predicates.md === --- gcc/config/rs6000/predicates.md (revision 226414) +++ gcc/config/rs6000/predicates.md (working copy) @@ -562,6 +562,14 @@ (define_predicate easy_vector_constant_ return EASY_VECTOR_MSB (val, GET_MODE_INNER (mode)); }) +;; Return true if this is an easy altivec constant that we form +;; by using VSLDOI. +(define_predicate easy_vector_constant_vsldoi + (and (match_code const_vector) + (and (match_test TARGET_ALTIVEC) + (and (match_test easy_altivec_constant (op, mode)) +(match_test vspltis_shifted (op) != 0) + ;; Return 1 if operand is constant zero (scalars and vectors). (define_predicate zero_constant (and (match_code const_int,const_double,const_wide_int,const_vector) Index: gcc/config/rs6000/rs6000-protos.h === --- gcc/config/rs6000/rs6000-protos.h (revision 226413) +++ gcc/config/rs6000/rs6000-protos.h (working copy) @@ -31,6 +31,7 @@ extern void init_cumulative_args (CUMULA #endif /* TREE_CODE */ extern bool easy_altivec_constant (rtx, machine_mode); +extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); extern int num_insns_constant (rtx, machine_mode); Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 226414) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -5448,6 +5448,96 @@ vspltis_constant (rtx op, unsigned step, return true; } +/* Like vsplitis_constant, but allow the value to be shifted left with a VSLDOI + instruction, filling in the bottom elements with 0 or -1. + + Return 0 if the constant cannot be generated with VSLDOI. Return positive + for the number of zeroes to shift in, or negative for the number of 0xff + bytes to shift in. + + OP is a CONST_VECTOR. */ + +int +vspltis_shifted (rtx op) +{ + machine_mode mode = GET_MODE (op); + machine_mode inner = GET_MODE_INNER (mode); + + unsigned i, j; + unsigned nunits; + unsigned mask; + + HOST_WIDE_INT val; + + if (mode != V16QImode mode != V8HImode mode != V4SImode) +return false; + + /* We need to create pseudo registers to do the shift, so don't recognize + shift vector constants after reload. */ + if (!can_create_pseudo_p ()) +return false; + + nunits
Re: [fortran,patch] Extend IEEE support to all real kinds
On Mon, Aug 3, 2015 at 11:14 PM, FX fxcoud...@gmail.com wrote: The attached patch extends the IEEE modules to all floating-point kinds. Last time, when I added IEEE support, I restricted it to the float and double types (real kinds 4 and 8), to be extra safe. After discussion with Uros Bizjak and some reading, I’ve come to the conclusion that on most hardware where we support IEEE at all, we do support enough IEEE features on extended and quad prec (long double and __float128, on x86_64 hardware) to satisfy the Fortran standard. So, this enables full IEEE support for all real kinds. Nothing changes to the underlying architecture, it’s almost exclusively mechanical changes (adding the necessary variants to the interfaces, etc.). Bootstrapped and regtested on x86_64-apple-darwin14 (with associated libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html). OK to commit to trunk? Index: libgfortran/config/fpu-387.h === --- libgfortran/config/fpu-387.h (revision 226429) +++ libgfortran/config/fpu-387.h (working copy) @@ -461,12 +461,12 @@ set_fpu_state (void *state) int -support_fpu_underflow_control (int kind) +support_fpu_underflow_control (int kind __attribute__((unused))) { if (!has_sse()) return 0; - return (kind == 4 || kind == 8) ? 1 : 0; + return 1; } x86 doesn't support underflow control for long double (x87) and __float128 types, so the above change is wrong. Index: libgfortran/config/fpu-glibc.h === --- libgfortran/config/fpu-glibc.h (revision 226429) +++ libgfortran/config/fpu-glibc.h (working copy) @@ -439,7 +439,7 @@ int support_fpu_underflow_control (int kind __attribute__((unused))) { #if defined(__alpha__) defined(FE_MAP_UMZ) - return (kind == 4 || kind == 8) ? 1 : 0; + return 1; #else return 0; #endif I'm not sure for alpha, if it supports underflow control for extended types. Let's keep this as is for now, it is trivial to change later. Uros.
Re: [fortran,patch] Extend IEEE support to all real kinds
On Mon, Aug 3, 2015 at 11:14 PM, FX fxcoud...@gmail.com wrote: The attached patch extends the IEEE modules to all floating-point kinds. Last time, when I added IEEE support, I restricted it to the float and double types (real kinds 4 and 8), to be extra safe. After discussion with Uros Bizjak and some reading, I’ve come to the conclusion that on most hardware where we support IEEE at all, we do support enough IEEE features on extended and quad prec (long double and __float128, on x86_64 hardware) to satisfy the Fortran standard. So, this enables full IEEE support for all real kinds. Nothing changes to the underlying architecture, it’s almost exclusively mechanical changes (adding the necessary variants to the interfaces, etc.). Bootstrapped and regtested on x86_64-apple-darwin14 (with associated libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html). OK to commit to trunk? Can you please also introduce tests for ieee exceptions for long double and __float128 types (as is done for real and double in gfortran.dg/ieee/ieee_1.F90) as well as test for supported rounding modes as done for real and double in gfortran.dg/ieee/rounding_1.f90 ? On x86_64, these new tests will exercise exceptions and rounding modes for __float128 soft-fp library, in addition to x87 long double tests. Uros.
Re: [fortran,patch] Extend IEEE support to all real kinds
On Mon, Aug 03, 2015 at 11:14:41PM +0200, FX wrote: So, this enables full IEEE support for all real kinds. Nothing changes to the underlying architecture, it???s almost exclusively mechanical changes (adding the necessary variants to the interfaces, etc.). Bootstrapped and regtested on x86_64-apple-darwin14 (with associated libquadmath patch: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00124.html). OK to commit to trunk? This is OK to commit. It's unclear whether I've sufficient authority to OK the libquadmath path. PS: Once this is in, I intend to focus on the next item: allowing all standard-mandated IEEE functions in constant expressions. Then, I believe our IEEE support will be entirely bug-free (in the sense that there will be no known bugs in it!). See PR libquadmath/65757. I'm not sure if it effects the IEEE support other than libquadmath has issues. -- Steve
Re: [PR64164] drop copyrename, integrate into expand
On Jul 30, 2015, H.J. Lu hjl.to...@gmail.com wrote: aoliva/pr64164 is fine on x32. Thanks. I have made a large number of changes since you tested it, fixing all the reported issues and then some. Now, x86_64-linux-gnu (-m64 and -m32), i686-pc-linux-gnu, powerpc64-linux-gnu and powerpc64el-linux-gnu pass regstrap (r226317), and the many tens of targets I cross-tested still get the same 'make all' errors that the pristine tree did. The bulk of the incremental changes had to do with handling splitting and unsplitting of complex args, and BLKmode types passed by reference. For the former, I had naively assumed complex args would always be represented as CONCATs. I now use read_complex_part to split the expand-assigned parm rtl into components, and I use it again at unsplit time to make sure the expand-assigned parm rtl matches that of the split components. The latter, in turn, almost required me to give up the entire notion of coalescing parms. The problem is that, for arguments passed as a BLKmode pointer, copying the argument to an expand-assigned stack slot is not only wasteful, it doesn't really work: we'd expand the copy in assign_parms* and insert it before the stack allocation performed by expand_user_vars, so we'd initialize the pseudo holding the address of the stack slot only after its first use. The solution I came up with was to detect BLKmode parms and NOT allow them to coalesce with other variables, so that we can easily detect partitions that need special handling. The special handling amounts to not allocating a stack slot for the partition holding the param default def, and leaving it for assign_parms to do so. We do, however, allocate a MEM, in theory assigned to all partition members (though they're all the same parm ATM, but not necessarily all SSA_NAMEs of the same parm, since optimization causes different versions to conflict). We leave the address of that MEM unset, so that assign_parm knows it is to fill it in with a pseudo holding a copy of the incoming parm address, or with the address of the local stack slot created to hold a copy of the parameter. It took me several rounds of trial and error to get these to pass all complex and vector tests on x86 and ppc. The last remaining failure was a regression in gcc.target/powerpc/pr16458-4, caused by our inability to hold SSA_NAMEs as REG_EXPRs in pseudos. emit_case_decision_tree attempted to preserve the decl as the REG_EXPR of a pseudo holding a copy of the switch expr, and its type appears to be used to decide whether to emit signed or unsigned compares, even though we explicitly pass mode and unsignedp down to the cmp_and_jump expanders. I figured there was no good reason to prevent SSA_NAMEs in REG_EXPRs, just like MEM_EXPRs, so I went ahead and adjusted the DECL_MODE that prevented it, and now we expand the case decision tree as intended. Here's a consolidated patch, followed by the consolidated incremental patch. I don't intend to install it this week, even if approved, because I'm going to be away Aug 5-10, and I would like to be around should any further problems arise. So, ok to install when I return? [PR64164] Drop copyrename, use coalescible partition as base when optimizing. From: Alexandre Oliva aol...@redhat.com for gcc/ChangeLog PR rtl-optimization/64164 PR bootstrap/66978 PR middle-end/66983 PR rtl-optimization/67000 PR middle-end/67034 PR middle-end/67035 * Makefile.in (OBJS): Drop tree-ssa-copyrename.o. * tree-ssa-copyrename.c: Removed. * opts.c (default_options_table): Drop -ftree-copyrename. Add -ftree-coalesce-vars. * passes.def: Drop all occurrences of pass_rename_ssa_copies. * common.opt (ftree-copyrename): Ignore. (ftree-coalesce-inlined-vars): Likewise. * doc/invoke.texi: Remove the ignored options above. * gimple-expr.h (gimple_can_coalesce_p): Move declaration * tree-ssa-coalesce.h: ... here. * tree-ssa-uncprop.c: Include tree-ssa-coalesce.h and other headers required by it. * gimple-expr.c (gimple_can_coalesce_p): Allow coalescing across variables when flag_tree_coalesce_vars. Check register use and promoted modes to allow coalescing. Do not coalesce maybe-byref parms with SSA_NAMEs of other variables, or anonymous SSA_NAMEs. Moved to tree-ssa-coalesce.c. * tree-ssa-live.c (struct tree_int_map_hasher): Move along with its member functions to tree-ssa-coalesce.c. (var_map_base_init): Likewise. Renamed to compute_samebase_partition_bases. (partition_view_normal): Drop want_bases parameter. (partition_view_bitmap): Likewise. * tree-ssa-live.h: Adjust declarations. * tree-ssa-coalesce.c: Include explow.h and cfgexpand.h. (build_ssa_conflict_graph): Process PARM_ and RESULT_DECLs's default defs at the entry point.
Re: [gomp4] Remove superfluous code
On 08/03/15 19:52, Nathan Sidwell wrote: I've committed this to gomp4 branch. It tidies up the error message processing and removes some code that shows a remarkable confusion about how shared libraries work. I've also committed it to trunk as obvious. nathan
Re: [gofrontend-dev] Re: Go patch committed: Fix error reporting for invalid builtin calls
Now I get ../../../gcc/libgo/runtime/mprof.goc: In function ‘runtime_Stack’: ../../../gcc/libgo/runtime/mprof.goc:437:19: error: ‘enablegc’ may be used uninitialized in this function [-Werror=maybe-uninitialized] mstats.enablegc = enablegc; ^ ../../../gcc/libgo/runtime/mprof.goc:406:7: note: ‘enablegc’ was declared here bool enablegc; ^ Am I doing something wrong? Cheers, mwh On 4 August 2015 at 05:55, Ian Lance Taylor i...@golang.org wrote: On Mon, Aug 3, 2015 at 2:10 AM, Andreas Schwab sch...@suse.de wrote: ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack': ../../../libgo/runtime/mprof.goc:408:5: error: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Werror=frame-address] sp = runtime_getcallersp(b); Fixed by this patch by Chris Manghane. The call was not actually necessary. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. This fixes PR 67101. Ian -- You received this message because you are subscribed to the Google Groups gofrontend-dev group. To unsubscribe from this group and stop receiving emails from it, send an email to gofrontend-dev+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [PATCH, rs6000] Fix source operand contraints for tabort. pattern
On Mon, 2015-08-03 at 16:27 -0400, David Edelsohn wrote: On Mon, Aug 3, 2015 at 3:39 PM, Peter Bergner berg...@vnet.ibm.com wrote: This has passed bootstrapping and regtesting on trunk. Ok for mainline? I'd like to also backport this to the release branches. Is this ok for them once bootstrapping and regtesting are complete on them? [snip] Okay. Thanks, I have now committed this to trunk and the 4.9 and 5 branches. Peter
Re: [gofrontend-dev] Re: Go patch committed: Fix error reporting for invalid builtin calls
On Mon, Aug 3, 2015 at 7:24 PM, Michael Hudson-Doyle michael.hud...@canonical.com wrote: Now I get ../../../gcc/libgo/runtime/mprof.goc: In function ‘runtime_Stack’: ../../../gcc/libgo/runtime/mprof.goc:437:19: error: ‘enablegc’ may be used uninitialized in this function [-Werror=maybe-uninitialized] mstats.enablegc = enablegc; ^ ../../../gcc/libgo/runtime/mprof.goc:406:7: note: ‘enablegc’ was declared here bool enablegc; I don't know why I am not seeing this, but I've committed this patch that should fix it. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 226533) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -a850225433a66a58613c22185c3b09626f5545eb +bdd98c601f2c8dbd0bf821548ba09c038f7645c4 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/mprof.goc === --- libgo/runtime/mprof.goc (revision 226525) +++ libgo/runtime/mprof.goc (working copy) @@ -403,7 +403,7 @@ func ThreadCreateProfile(p Slice) (n int func Stack(b Slice, all bool) (n int) { byte *pc; - bool enablegc; + bool enablegc = false; pc = (byte*)(uintptr)runtime_getcallerpc(b);
Minor typo fixes
I was starting to look at Abe's changes to the gimple if-converter and realized a handful of the changes were just fixing comments. No reason those shouldn't go in immediately. So I pulled them out and applied those changes to the trunk. Abe -- if you find more of those kind of changes, don't hesitate to break them out into their own patch and they can go forward very quickly. Attached are the fixes that were actually applied. They were bootstrapped for completeness. Thanks, Jeff * tree-if-conv.c: Fix various typos in comments. * tree-vect-stmts.c: Likewise. diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c index a284c2a..b5de1b2 100644 --- a/gcc/tree-if-conv.c +++ b/gcc/tree-if-conv.c @@ -2283,8 +2283,8 @@ combine_blocks (struct loop *loop, bool any_mask_load_store) ifc_bbs = NULL; } -/* Version LOOP before if-converting it, the original loop - will be then if-converted, the new copy of the loop will not, +/* Version LOOP before if-converting it; the original loop + will be if-converted, the new copy of the loop will not, and the LOOP_VECTORIZED internal call will be guarding which loop to execute. The vectorizer pass will fold this internal call into either true or false. */ @@ -2456,7 +2456,7 @@ ifcvt_walk_pattern_tree (tree var, vecgimple *defuse_list, return; } -/* Returns true if STMT can be a root of bool pattern apllied +/* Returns true if STMT can be a root of bool pattern applied by vectorizer. */ static bool @@ -2486,7 +2486,7 @@ stmt_is_root_of_bool_pattern (gimple stmt) return false; } -/* Traverse all statements in BB which correspondent to loop header to +/* Traverse all statements in BB which correspond to loop header to find out all statements which can start bool pattern applied by vectorizer and convert multiple uses in it to conform pattern restrictions. Such case can occur if the same predicate is used both @@ -2567,7 +2567,7 @@ ifcvt_local_dce (basic_block bb) gimple_set_plf (phi, GF_PLF_2, true); worklist.safe_push (phi); } - /* Consider load/store statemnts, CALL and COND as live. */ + /* Consider load/store statements, CALL and COND as live. */ for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) { stmt = gsi_stmt (gsi); @@ -2655,7 +2655,7 @@ tree_if_conversion (struct loop *loop) ifc_bbs = NULL; bool any_mask_load_store = false; - /* Set-up aggressive if-conversion for loops marked with simd pragma. */ + /* Set up aggressive if-conversion for loops marked with simd pragma. */ aggressive_if_conv = loop-force_vectorize; /* Check either outer loop was marked with simd pragma. */ if (!aggressive_if_conv) @@ -2687,7 +2687,7 @@ tree_if_conversion (struct loop *loop) combine_blocks (loop, any_mask_load_store); /* Delete dead predicate computations and repair tree correspondent - to bool pattern to delete multiple uses of preidcates. */ + to bool pattern to delete multiple uses of predicates. */ if (aggressive_if_conv) { ifcvt_local_dce (loop-header); diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index 1256b09..33a7e0c 100644 --- a/gcc/tree-vect-stmts.c +++ b/gcc/tree-vect-stmts.c @@ -7125,7 +7125,7 @@ vect_is_simple_cond (tree cond, gimple stmt, loop_vec_info loop_vinfo, When STMT is vectorized as nested cycle, REDUC_DEF is the vector variable to be used at REDUC_INDEX (in then clause if REDUC_INDEX is 1, and in - else caluse if it is 2). + else clause if it is 2). Return FALSE if not a vectorizable STMT, TRUE otherwise. */
Re: revised and updated new-if-converter patch… [PATCH] fix PR46029: reimplement if conversion of loads and stores
On 07/17/2015 01:57 PM, Abe wrote: Dear all, Relative to the previous submission of this same patch, the below corrects some minor spacing and/or indentation issues, misc. other formatting fixes, and makes the disabled vectorization tests be disabled via xfail rather than by adding spaces to deliberately cause the relevant scanned-for text to not be found by DejaGNU so as to prevent the DejaGNU line from being interpreted. The below is also based on a Git checkout that was rebased to the latest upstream check-in from today, so it should merge cleanly with trunk as of today. Regards, Abe 2015-06-12 Sebastian Pop s@samsung.com Abe Skolnik a.skol...@samsung.com PR tree-optimization/46029 * tree-data-ref.c (struct data_ref_loc_d): Moved... (get_references_in_stmt): Exported. * tree-data-ref.h (struct data_ref_loc_d): ... here. (get_references_in_stmt): Declared. * tree-if-conv.c (struct ifc_dr): Removed. (IFC_DR): Removed. (DR_WRITTEN_AT_LEAST_ONCE): Removed. (DR_RW_UNCONDITIONALLY): Removed. (memrefs_read_or_written_unconditionally): Removed. (write_memrefs_written_at_least_once): Removed. (ifcvt_could_trap_p): Does not take refs parameter anymore. (ifcvt_memrefs_wont_trap): Removed. (has_non_addressable_refs): New. (if_convertible_gimple_assign_stmt_p): Call has_non_addressable_refs. Removed use of refs. (if_convertible_stmt_p): Removed use of refs. (if_convertible_gimple_assign_stmt_p): Same. (if_convertible_loop_p_1): Removed use of refs. Remove initialization of dr-aux, DR_WRITTEN_AT_LEAST_ONCE, and DR_RW_UNCONDITIONALLY. (insert_address_of): New. (create_scratchpad): New. (create_indirect_cond_expr): New. (predicate_mem_writes): Call create_indirect_cond_expr. Take an extra parameter for scratch_pad. (combine_blocks): Same. (tree_if_conversion): Same. testsuite/ * g++.dg/tree-ssa/ifc-pr46029.C: New. * gcc.dg/tree-ssa/ifc-5.c: Make it exactly like the FFmpeg kernel. * gcc.dg/tree-ssa/ifc-8.c: New. * gcc.dg/tree-ssa/ifc-9.c: New. * gcc.dg/tree-ssa/ifc-10.c: New. * gcc.dg/tree-ssa/ifc-11.c: New. * gcc.dg/tree-ssa/ifc-12.c: New. * gcc.dg/vect/if-cvt-stores-vect-ifcvt-18.c: Disabled. diff --git a/gcc/common.opt b/gcc/common.opt index 6b2ccbc..49f6b9f 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -1413,7 +1413,7 @@ Common Report Var(flag_tree_loop_if_convert) Init(-1) Optimization Convert conditional jumps in innermost loops to branchless equivalents ftree-loop-if-convert-stores -Common Report Var(flag_tree_loop_if_convert_stores) Optimization +Common Report Var(flag_tree_loop_if_convert_stores) Init(-1) Optimization Also if-convert conditional jumps containing memory writes ; -finhibit-size-directive inhibits output of .size for ELF. I don't see this change mentioned anywhere in the ChangeLog. That seems to be a relatively common situation. I called out some of those issues, but didn't try to catch them all. Please make sure all changes are reflected in the ChangeLog. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c index f392fbe..775fcd5 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/cunroll-10.c This change isn't mentioned in the ChangeLog. diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c index 875d2d3..fc69ca2 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/ifc-5.c @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -c -O2 -ftree-vectorize -fdump-tree-ifcvt-stats { target *-*-* } } */ +/* { dg-options -c -O2 -ftree-vectorize -ftree-loop-if-convert-stores -fdump-tree-ifcvt-stats { target *-*-* } } */ ISTM this really should be two tests, one with this code as-is, another that exactly matches the ffmpeg kernel. diff --git a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c index 11e9533..cbd3378 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c +++ b/gcc/testsuite/gcc.dg/vect/vect-mask-loadstore-1.c I don't see this mentioned in the ChangeLog. It also doesn't look like you actually disabled the test. Obviously this will need to be addressed before your patch could go in. diff --git a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c index 180b490..aedc66a 100644 --- a/gcc/testsuite/gcc.target/i386/avx2-gather-6.c +++ b/gcc/testsuite/gcc.target/i386/avx2-gather-6.c Not mentioned in the ChangeLog. xfail needs to be fixed. Similarly for the others were you added xfails. diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c index e6ff4ef..2a1e27d 100644 --- a/gcc/tree-data-ref.c +++ b/gcc/tree-data-ref.c @@ -4363,22 +4363,11 @@
Re: [PATCH], PowerPC IEEE 128-bit patch #4
On Mon, 3 Aug 2015, Michael Meissner wrote: The intention of theese changes (currently unwritten) is to change the existing problematical names that use TF in their name to be something else, and provide via a weak reference an alias for the old name. So if for example, we change the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of __floatditf. Older code that refers to __floatditf would still work fine. But as those names are in the implementation namespace and aren't user-visible, I don't see the point in such a change (other than as part of a complete new incompatible ABI which gets its own copies of libgcc, libstdc++, libc, libm and other affected libraries). Of course it *can* be done via symbol versioning to keep working with existing binaries / shared libraries using the existing symbols from libgcc, but the libgcc build system doesn't make that sort of target-specific symbol versioning particularly convenient. And as usual with symbol versioning, you'd break compatibility with existing static libraries / .o files. -- Joseph S. Myers jos...@codesourcery.com
libgo patch committed: Fix type of RawSockaddr Data field for PPC
The gc compiler determines the type of the Data field in the syscall.RawSockaddr type by looking at the system header files. The gccgo library just has a fixed definition for it. Unfortunately the fixed definition does not match the header files on PPC64 systems. This is https://golang.org/issue/11469 . This patch from Lynn Boger fixes this. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian
Re: [PATCH] Avoid recomputing multiple times the defining predicate-chain for the same PHI
On Sun, Aug 2, 2015 at 9:40 PM, Patrick Palka patr...@parcs.ath.cx wrote: On Sun, Aug 2, 2015 at 2:44 PM, Bernhard Reutner-Fischer rep.dot@gmail.com wrote: On August 2, 2015 8:18:51 PM GMT+02:00, Patrick Palka patr...@parcs.ath.cx wrote: No comment on the patch itself, but please s/visted/visited/ s/VISIED_PHIS/VISITED_PHIS/ while at it. TIA, Ok. Thanks, Richard. Sure, no problem.
Re: [PATCH][AArch64][10/14] Implement target pragmas
On 24/07/15 09:36, Kyrill Tkachov wrote: On 21/07/15 17:52, James Greenhalgh wrote: On Thu, Jul 16, 2015 at 04:21:05PM +0100, Kyrill Tkachov wrote: Hi all, This patch implements target pragmas for aarch64. The pragmas accepted are the same as for target attributes (as required). In addition pragmas will need to redefine the target-specific preprocessor macros if appropriate. A new file: aarch64-c.c is added and the code from TARGET_CPU_CPP_BUILTINS is moved there and split up into the unconditional parts that are always defined and the conditional stuff that depends on certain architectural features. The pragma processing code calls that to redefine preprocessor macros on the fly. The implementation is similar to the rs6000 one. With target pragmas implemented, we can use them in the arm_neon.h and arm_acle.h headers to specify the architectural features required for those intrinsics, rather than #ifdef'ing them out when FP/SIMD is not available from the command line. We need to do this in order to handle cases where the user compiles a file with -mgeneral-regs-only but has a function tagged with +simd and tries to use the arm_neon.h intrinsics. Tests and documentation comes as a separate patch later on in the series Bootstrapped and tested on aarch64. Ok for trunk? A couple of ChangeLog nits and some comments below. 2015-07-16 Kyrylo Tkachov kyrylo.tkac...@arm.com * config.gcc (aarch64*-*-*): Specify c_target_objs and cxx_target_objs. * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): This should say * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): New. Presumably (or maybe Define.). (TARGET_CPU_CPP_BUILTINS): Redefine to call aarch64_cpu_cpp_builtins. * config/aarch64/aarch64.c (aarch64_override_options_internal): Remove static keyword. (aarch64_reset_previous_fndecl): New function. * config/aarch64/aarch64-c.c: New file. * config/aarch64/arm_acle.h: Add pragma +crc+nofp at the top. Push and pop options at beginning and end. Remove ifdef __ARM_FEATURE_CRC32. * config/aarch64/arm_neon.h: Remove #ifdef check on __ARM_NEON. Add pragma arch=armv8-a+simd and +crypto where appropriate. * config/aarch64/t-aarch64 (aarch64-c.o): New rule. I don't see a ChangeLog entry for these hunks: diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 3a5482d..4704736 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -360,6 +360,10 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, enum machine_mode, RTX_CODE); #endif /* RTX_CODE */ void aarch64_init_builtins (void); + +bool aarch64_process_target_attr (tree, const char*); +void aarch64_override_options_internal (struct gcc_options *); + rtx aarch64_expand_builtin (tree exp, rtx target, rtx subtarget ATTRIBUTE_UNUSED, @@ -376,6 +380,9 @@ extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); extern bool aarch64_madd_needs_nop (rtx_insn *); extern void aarch64_final_prescan_insn (rtx_insn *); +extern void aarch64_reset_previous_fndecl (void); +extern void aarch64_cpu_cpp_builtins (cpp_reader *); +extern void aarch64_register_pragmas (void); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); bool aarch64_handle_option (struct gcc_options *, struct gcc_options *, +static bool +aarch64_pragma_target_parse (tree args, tree pop_target) +{ + + bool ret; + + /* If args is not NULL then process it and setup the target-specific + information that it specifies. */ + if (args) +{ + ret = aarch64_process_target_attr (args, pragma); + if (ret) +aarch64_override_options_internal (global_options); RET must equal true. + else +return false; Early return of false closes the other control path here. +} + + /* args is NULL, restore to the state described in pop_target. */ + else +{ + pop_target = pop_target ? pop_target : target_option_default_node; + cl_target_option_restore (global_options, +TREE_TARGET_OPTION (pop_target)); + ret = true; +} Therefore RET must equal true here. + + target_option_current_node += build_target_option_node (global_options); + + aarch64_reset_previous_fndecl (); + /* For the definitions, ensure all newly defined macros are considered + as used for -Wunused-macros. There is no point warning about the + compiler predefined macros. */ + cpp_options *cpp_opts = cpp_get_options (parse_in); + unsigned char saved_warn_unused_macros = cpp_opts-warn_unused_macros; + cpp_opts-warn_unused_macros = 0; + + aarch64_update_cpp_builtins (parse_in); + + cpp_opts-warn_unused_macros = saved_warn_unused_macros; + + return ret; So we don't need RET
Re: [PATCH][AArch64][3/14] Refactor option override code
On Fri, Jul 24, 2015 at 09:21:46AM +0100, Kyrill Tkachov wrote: Thanks for the review, here's an updated version. In the end, I chose to retain the use alloca (other patches in the series are reworked to use it too). How's this? A nit or two from code you were moving or that got caught up in the diff, but otherwise OK to commit with the issues highlighted below fixed up. - aarch64_build_bitmask_table (); +/* Validate a command-line -mcpu option. Parse the cpu and extensions (if any) + specified in STR and throw errors if appropriate. Put the results if + they are valid in RES and ISA_FLAGS. */ +static void +aarch64_validate_mcpu (const char *str, const struct processor **res, +unsigned long *isa_flags) Newline between comment and function. +/* Validate a command-line -march option. Parse the arch and extensions + (if any) specified in STR and throw errors if appropriate. Put the + results, if they are valid, in RES and ISA_FLAGS. */ +static void +aarch64_validate_march (const char *str, const struct processor **res, +unsigned long *isa_flags) And here. +/* Validate a command-line -mtune option. Parse the cpu + specified in STR and throw errors if appropriate. Put the + result, if it is valid, in RES. */ +static void +aarch64_validate_mtune (const char *str, const struct processor **res) And here. Thanks, James
[PATCH] Improve genmatch code-gen
Since we allow arbitrary IDs in captures the capture array is dense and thus we don't need to clear it before initializing. We can also emit more compact source by simply using an initializer. - tree captures[3] ATTRIBUTE_UNUSED = {}; - captures[0] = o20; - captures[1] = op1; - captures[2] = op2; + tree captures[3] ATTRIBUTE_UNUSED = { o20, op1, op2 }; this has a surprinsingly big effect on code-size at -O0 as well (on the positive side). Boostrap regtest running on x86_64-unknown-linux-gnu. Richard. 2015-08-03 Richard Biener rguent...@suse.de * genmatch.c (dt_simplify::gen): Create captures array with an initializer. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 226490) +++ gcc/genmatch.c (working copy) @@ -3023,16 +3195,15 @@ dt_simplify::gen (FILE *f, int indent, b output_line_directive (f, s-result ? s-result-location : s-match-location); if (s-capture_max = 0) -fprintf_indent (f, indent, tree captures[%u] ATTRIBUTE_UNUSED = {};\n, - s-capture_max + 1); - - for (int i = 0; i = s-capture_max; ++i) -if (indexes[i]) - { - char opname[20]; - fprintf_indent (f, indent, captures[%u] = %s;\n, - i, indexes[i]-get_name (opname)); - } +{ + char opname[20]; + fprintf_indent (f, indent, tree captures[%u] ATTRIBUTE_UNUSED = { %s, + s-capture_max + 1, indexes[0]-get_name (opname)); + + for (int i = 1; i = s-capture_max; ++i) + fprintf (f, , %s, indexes[i]-get_name (opname)); + fprintf (f, };\n); +} /* If we have a split-out function for the actual transform, call it. */ if (info info-fname)
Re: [PATCH 3/4] Add libgomp plugin for Intel MIC
Could you probably review the patch, please? 2015-07-28 18:42 GMT+03:00 Maxim Blumental bvm...@gmail.com: Applied the idea with python script alternative. Review, please. 2015-07-24 17:18 GMT+03:00 David Malcolm dmalc...@redhat.com: On Fri, 2015-07-24 at 10:01 +0200, Jakub Jelinek wrote: #!/usr/bin/python import sys with open(sys.argv[1],rb) as f: nextblock = f.read(12) while 1: block = nextblock nextblock = f.read(12) if block == : break str = for ch in block: if str == : str = else: str += , if ord(ch) 10: str += 0x0 + chr(ord('0')+ord(ch)) elif ord(ch) 16: str += 0x0 + chr(ord('a')+ord(ch)-10) else: str += hex(ord(ch)) if nextblock != : str += , print str python ./xxd.py $ $@ does the same thing as cat $ | xxd -include $@ (CCing David as python expert, my python knowledge is limited and 15 years old, not sure how portable this is (python 2 vs. python 3, and even python 2 minimal versions)). It doesn't work with Python 3 for various reasons (print syntax, and str vs bytes issues). I'm attaching a version which works with both Python 2 and Python 3 (2.7.5 and 3.3.2 were the versions I tried). It ought to work with much older python 2 versions (as your script appears to), but I don't have them handy. Presumably it would need a license header and some descriptive comments. (snip) Dave -- - Sincerely yours, Maxim Blumental -- - Sincerely yours, Maxim Blumental
Re: [gomp4, committed] Handle double reduction in oacc kernels pass group
Hi! On Tue, 28 Jul 2015 10:20:39 +0200, Tom de Vries tom_devr...@mentor.com wrote: this patch adds a test-case with a double reduction in an oacc kernels region. In order to get it in the proper shape for parloops to deal with, I needed to repeat the pass_lim/pass_copy_prop sequence. Bootstrapped and reg-tested on x86_64. Committed to gomp-4_0-branch. Thanks! --- a/gcc/passes.def +++ b/gcc/passes.def @@ -96,6 +96,8 @@ along with GCC; see the file COPYING3. If not see NEXT_PASS (pass_tree_loop_init); NEXT_PASS (pass_lim); NEXT_PASS (pass_copy_prop); + NEXT_PASS (pass_lim); + NEXT_PASS (pass_copy_prop); NEXT_PASS (pass_scev_cprop); NEXT_PASS (pass_parallelize_loops_oacc_kernels); NEXT_PASS (pass_expand_omp_ssa); Very much in line with the older r83, http://news.gmane.org/find-root.php?message_id=%3C871tjd1ch2.fsf%40kepler.schwinge.homeip.net%3E, now committed to gomp-4_0-branch in r226494: commit 3f2d750170034024232cf619f1ba6b703e37bc89 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Aug 3 10:31:23 2015 + Update testsuite for new pass_lim ... that got added in r226305. gcc/testsuite/ * c-c++-common/restrict-2.c: Update for new pass_lim. * c-c++-common/restrict-4.c: Same. * g++.dg/tree-ssa/pr33615.C: Same. * g++.dg/tree-ssa/restrict1.C: Same. * gcc.dg/tm/pub-safety-1.c: Same. * gcc.dg/tm/reg-promotion.c: Same. * gcc.dg/tree-ssa/20050314-1.c: Same. * gcc.dg/tree-ssa/loop-32.c: Same. * gcc.dg/tree-ssa/loop-33.c: Same. * gcc.dg/tree-ssa/loop-34.c: Same. * gcc.dg/tree-ssa/loop-35.c: Same. * gcc.dg/tree-ssa/loop-7.c: Same. * gcc.dg/tree-ssa/pr23109.c: Same. * gcc.dg/tree-ssa/restrict-3.c: Same. * gcc.dg/tree-ssa/restrict-5.c: Same. * gcc.dg/tree-ssa/ssa-lim-1.c: Same. * gcc.dg/tree-ssa/ssa-lim-10.c: Same. * gcc.dg/tree-ssa/ssa-lim-11.c: Same. * gcc.dg/tree-ssa/ssa-lim-12.c: Same. * gcc.dg/tree-ssa/ssa-lim-2.c: Same. * gcc.dg/tree-ssa/ssa-lim-3.c: Same. * gcc.dg/tree-ssa/ssa-lim-6.c: Same. * gcc.dg/tree-ssa/ssa-lim-7.c: Same. * gcc.dg/tree-ssa/ssa-lim-8.c: Same. * gcc.dg/tree-ssa/ssa-lim-9.c: Same. * gcc.dg/tree-ssa/structopt-1.c: Same. * gfortran.dg/pr32921.f: Same. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226494 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog.gomp| 30 +++ gcc/testsuite/c-c++-common/restrict-2.c |4 ++-- gcc/testsuite/c-c++-common/restrict-4.c |4 ++-- gcc/testsuite/g++.dg/tree-ssa/pr33615.C |4 ++-- gcc/testsuite/g++.dg/tree-ssa/restrict1.C |4 ++-- gcc/testsuite/gcc.dg/tm/pub-safety-1.c |4 ++-- gcc/testsuite/gcc.dg/tm/reg-promotion.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/20050314-1.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/loop-32.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/loop-33.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/loop-34.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/loop-35.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/loop-7.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/pr23109.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/restrict-3.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/restrict-5.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-1.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-10.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-11.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-12.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-2.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-3.c |6 +++--- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-6.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-7.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-8.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-9.c |4 ++-- gcc/testsuite/gcc.dg/tree-ssa/structopt-1.c |4 ++-- gcc/testsuite/gfortran.dg/pr32921.f |4 ++-- 28 files changed, 86 insertions(+), 56 deletions(-) diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp index 04286dd..33519ef 100644 --- gcc/testsuite/ChangeLog.gomp +++ gcc/testsuite/ChangeLog.gomp @@ -1,3 +1,33 @@ +2015-08-03 Thomas Schwinge tho...@codesourcery.com + + * c-c++-common/restrict-2.c: Update for new pass_lim. + * c-c++-common/restrict-4.c: Same. + * g++.dg/tree-ssa/pr33615.C: Same. + * g++.dg/tree-ssa/restrict1.C: Same. + * gcc.dg/tm/pub-safety-1.c: Same. + * gcc.dg/tm/reg-promotion.c: Same. + * gcc.dg/tree-ssa/20050314-1.c: Same. + * gcc.dg/tree-ssa/loop-32.c: Same. + * gcc.dg/tree-ssa/loop-33.c: Same. + * gcc.dg/tree-ssa/loop-34.c: Same.
[PATCH] Fix PR66917
The following patch fixes PR66917 - the vectorizer assumes element alignment even for unaligned accesses, which in case of packed ones is not a valid assumption. This breaks testcases on arm at least which can do unaligned (but not completely unaligned) loads and stores. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk sofar. Richard. 2015-08-03 Richard Biener rguent...@suse.de PR tree-optimization/66917 * tree-vectorizer.h (struct dataref_aux): Add base_element_aligned field. (DR_VECT_AUX): New macro. (set_dr_misalignment): Adjust. (dr_misalignment): Likewise. * tree-vect-data-refs.c (vect_compute_data_ref_alignment): Compute whether the base is at least element aligned. * tree-vect-stmts.c (ensure_base_align): Adjust. (vectorizable_store): If the base is not element aligned preserve alignment of the original access if misalignment is unknown. (vectorizable_load): Likewise. Index: gcc/tree-vectorizer.h === --- gcc/tree-vectorizer.h (revision 226396) +++ gcc/tree-vectorizer.h (working copy) @@ -707,11 +707,16 @@ typedef struct _stmt_vec_info { #define STMT_SLP_TYPE(S) (S)-slp_type struct dataref_aux { - tree base_decl; - bool base_misaligned; int misalignment; + /* If true the alignment of base_decl needs to be increased. */ + bool base_misaligned; + /* If true we know the base is at least vector element alignment aligned. */ + bool base_element_aligned; + tree base_decl; }; +#define DR_VECT_AUX(dr) ((dataref_aux *)(dr)-aux) + #define VECT_MAX_COST 1000 /* The maximum number of intermediate steps required in multi-step type @@ -910,14 +915,13 @@ destroy_cost_data (void *data) targetm.vectorize.destroy_cost_data (data); } - /*-*/ /* Info on data references alignment. */ /*-*/ inline void set_dr_misalignment (struct data_reference *dr, int val) { - dataref_aux *data_aux = (dataref_aux *) dr-aux; + dataref_aux *data_aux = DR_VECT_AUX (dr); if (!data_aux) { @@ -931,8 +935,7 @@ set_dr_misalignment (struct data_referen inline int dr_misalignment (struct data_reference *dr) { - gcc_assert (dr-aux); - return ((dataref_aux *) dr-aux)-misalignment; + return DR_VECT_AUX (dr)-misalignment; } /* Reflects actual alignment of first access in the vectorized loop, Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 226396) +++ gcc/tree-vect-data-refs.c (working copy) @@ -622,7 +622,6 @@ vect_compute_data_ref_alignment (struct tree ref = DR_REF (dr); tree vectype; tree base, base_addr; - bool base_aligned; tree misalign = NULL_TREE; tree aligned_to; unsigned HOST_WIDE_INT alignment; @@ -698,6 +697,19 @@ vect_compute_data_ref_alignment (struct } } + /* To look at alignment of the base we have to preserve an inner MEM_REF + as that carries alignment information of the actual access. */ + base = ref; + while (handled_component_p (base)) +base = TREE_OPERAND (base, 0); + if (TREE_CODE (base) == MEM_REF) +base = build2 (MEM_REF, TREE_TYPE (base), base_addr, + build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0)); + unsigned int base_alignment = get_object_alignment (base); + + if (base_alignment = TYPE_ALIGN (TREE_TYPE (vectype))) +DR_VECT_AUX (dr)-base_element_aligned = true; + alignment = TYPE_ALIGN_UNIT (vectype); if ((compare_tree_int (aligned_to, alignment) 0) @@ -713,21 +725,7 @@ vect_compute_data_ref_alignment (struct return true; } - /* To look at alignment of the base we have to preserve an inner MEM_REF - as that carries alignment information of the actual access. */ - base = ref; - while (handled_component_p (base)) -base = TREE_OPERAND (base, 0); - if (TREE_CODE (base) == MEM_REF) -base = build2 (MEM_REF, TREE_TYPE (base), base_addr, - build_int_cst (TREE_TYPE (TREE_OPERAND (base, 1)), 0)); - - if (get_object_alignment (base) = TYPE_ALIGN (vectype)) -base_aligned = true; - else -base_aligned = false; - - if (!base_aligned) + if (base_alignment TYPE_ALIGN (vectype)) { /* Strip an inner MEM_REF to a bare decl if possible. */ if (TREE_CODE (base) == MEM_REF @@ -757,8 +755,9 @@ vect_compute_data_ref_alignment (struct dump_printf (MSG_NOTE, \n); } - ((dataref_aux *)dr-aux)-base_decl = base; - ((dataref_aux *)dr-aux)-base_misaligned = true; + DR_VECT_AUX (dr)-base_decl = base; + DR_VECT_AUX (dr)-base_misaligned = true; + DR_VECT_AUX (dr)-base_element_aligned = true; } /* If this is a
Re: [PATCH] Unswitching outer loops.
On Fri, Jul 31, 2015 at 1:17 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi Richard, I learned your updated patch for 23825 and it is more general in comparison with my. I'd like to propose you a compromise - let's consider my patch only for force-vectorize outer loop only to allow outer-loop vecctorization. I don't see why we should special-case that if the approach in 23825 is sensible. Note that your approach will not hoist invariant guards if loops contains something else except for inner-loop, i.e. it won't be empty for taken branch. Yes, it does not perform unswitching but guard hoisting. Note that this is originally Zdenek Dvoraks patch. I also would like to answer on your last question - CFG cleanup is invoked to perform deletion of single-argument phi nodes from tail block through substitution - such phi's prevent outer-loop vectorization. But it is clear that such transformation can be done other pass. Hmm, I wonder why the copy_prop pass after unswitching does not get rid of them? What is your opinion? My opinion is that if we want to enhance unswitching to catch this (or similar) cases then we should make it a lot more general than your pattern-matching approach. I see nothing that should prevent us from considering unswitching non-innermost loops in general. It should be only a cost consideration to not do non-innermost loop unswitching (in addition to maybe a --param specifying the maximum depth of a loop nest to unswitch). So my first thought when seeing your patch still holds - the patch looks very much too specific. Richard. Yuri. 2015-07-28 13:50 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Thu, Jul 23, 2015 at 4:45 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi Richard, I checked that both test-cases from 23855 are sucessfully unswitched by proposed patch. I understand that it does not catch deeper loop nest as for (i=0; i10; i++) for (j=0;jn;j++) for (k=0;k20;k++) ... but duplication of middle-loop does not look reasonable. Here is dump for your second test-case: void foo(int *ie, int *je, double *x) { int i, j; for (j=0; j*je; ++j) for (i=0; i*ie; ++i) x[i+j] = 0.0; } grep -i unswitch t6.c.119t.unswitch ;; Unswitching outer loop I was saying that why go with a limited approach when a patch (in unknown state...) is available that does it more generally? Also unswitching is quite expensive compared to moving the invariant condition. In your patch: + if (!nloop-force_vectorize) +nloop-force_vectorize = true; + if (loop-safelen != 0) +nloop-safelen = loop-safelen; I see no guard on force_vectorize so = true looks bogus here. Please just use copy_loop_info. + if (integer_nonzerop (cond_new)) +gimple_cond_set_condition_from_tree (cond_stmt, boolean_true_node); + else if (integer_zerop (cond_new)) +gimple_cond_set_condition_from_tree (cond_stmt, boolean_false_node); gimple_cond_make_true/false (cond_stmt); btw, seems odd that we have to recompute which loop is the true / false variant when we just fed a guard condition to loop_version. Can't we statically determine whether loop or nloop has the in-loop condition true or false? + /* Clean-up cfg to remove useless one-argument phi in exit block of + outer-loop. */ + cleanup_tree_cfg (); I know unswitching is already O(number-of-unswitched-loops * size-of-function) because it updates SSA form after each individual unswitching (and it does that because it invokes itself recursively on unswitched loops). But do you really need to invoke CFG cleanup here? Richard. Yuri. 2015-07-14 14:06 GMT+03:00 Richard Biener richard.guent...@gmail.com: On Fri, Jul 10, 2015 at 12:02 PM, Yuri Rumyantsev ysrum...@gmail.com wrote: Hi All, Here is presented simple transformation which tries to hoist out of outer-loop a check on zero trip count for inner-loop. This is very restricted transformation since it accepts outer-loops with very simple cfg, as for example: acc = 0; for (i = 1; i = m; i++) { for (j = 0; j n; j++) if (l[j] == i) { v[j] = acc; acc++; }; acc = 1; } Note that degenerative outer loop (without inner loop) will be completely deleted as dead code. The main goal of this transformation was to convert outer-loop to form accepted by outer-loop vectorization (such test-case is also included to patch). Bootstrap and regression testing did not show any new failures. Is it OK for trunk? I think this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=23855 as well. It has a patch adding a invariant loop guard hoisting phase to loop-header copying. Yeah, it needs updating to trunk again I suppose. It's always non-stage1 when I come back to that patch. Your patch seems to be very specific and only handles outer loops of innermost loops. Richard. ChangeLog: 2015-07-10 Yuri Rumyantsev ysrum...@gmail.com *
Re: [ARM] Optimize compare against smin/umin
Hi Michael, It looks like this patch introduces regressions on armeb in: gcc.dg/vect/vect-reduc-7.c gcc.dg/vect/vect-reduc-8.c gcc.dg/vect/vect-reduc-9.c See http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/226476/report-build-info.html for a bit more details. Can you have a look? Thanks, Christophe. On 27 July 2015 at 10:17, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi Michael, On 26/07/15 23:54, Michael Collison wrote: Here is an updated patch that addresses the issues you mentioned: 2015-07-24 Michael Collison michael.colli...@linaro.org Ramana Radhakrishnan ramana.radhakrish...@arm.com * gcc/config/arm/arm.md (*arm_smin_cmp): New pattern. (*arm_umin_cmp): Likewise. * gcc.target/arm/mincmp.c: Test min compare idiom. Just New test. would be a better ChangeLog entry for this. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 0be70a8..361c292 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -3455,6 +3455,44 @@ (set_attr type multiple,multiple)] ) +;; t = (s/u)min (x, y) +;; cc = cmp (t, z) +;; is the same as +;; cmp x, z +;; cmpge(u) y, z + +(define_insn_and_split *arm_smin_cmp + [(set (reg:CC CC_REGNUM) +(compare:CC + (smin:SI (match_operand:SI 0 s_register_operand r) + (match_operand:SI 1 s_register_operand r)) + (match_operand:SI 2 s_register_operand r)))] + TARGET_32BIT + # + reload_completed + [(set (reg:CC CC_REGNUM) +(compare:CC (match_dup 0) (match_dup 2))) + (cond_exec (ge:CC (reg:CC CC_REGNUM) (const_int 0)) + (set (reg:CC CC_REGNUM) + (compare:CC (match_dup 1) (match_dup 2] +) + +(define_insn_and_split *arm_umin_cmp + [(set (reg:CC CC_REGNUM) +(compare:CC + (umin:SI (match_operand:SI 0 s_register_operand r) + (match_operand:SI 1 s_register_operand r)) + (match_operand:SI 2 s_register_operand r)))] + TARGET_32BIT + # + reload_completed + [(set (reg:CC CC_REGNUM) +(compare:CC (match_dup 0) (match_dup 2))) + (cond_exec (geu:CC (reg:CC CC_REGNUM) (const_int 0)) + (set (reg:CC CC_REGNUM) + (compare:CC (match_dup 1) (match_dup 2] +) + (define_expand umaxsi3 [(parallel [ (set (match_operand:SI 0 s_register_operand ) diff --git a/gcc/testsuite/gcc.target/arm/mincmp.c b/gcc/testsuite/gcc.target/arm/mincmp.c new file mode 100644 index 000..2a55c6d --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/mincmp.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-require-effective-target arm32 } */ + +#define min(x, y) ((x) = (y)) ? (x) : (y) + +unsigned int foo (unsigned int i, unsigned int x ,unsigned int y) Comma after unsigned int x in the wrong place. +{ + return i (min (x, y)); +} + +int bar (int i, int x, int y) +{ + return i (min (x, y)); +} Can you please use GNU style for the testcase. New line after return type, function name an arguments on their own line. Ok with these changes. Thanks, Kyrill
[PATCH, PING*2] Track indirect calls for call site information in debug info.
On 07/20/2015 02:45 PM, Pierre-Marie de Rodat wrote: On PowerPC targets with -mlongcall, most subprogram calls are turned into indirect calls: the call target is read from a register even though it is compile-time known. This makes it difficult for machine code static analysis engines to recover the callee information. The attached patch is an attempt to help such engines, generating DW_AT_abstract_origin attributes for all DW_TAG_GNU_call_site we are interested in. Ping for the patch submitted in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01641.html. Thanks! -- Pierre-Marie de Rodat
Re: C++ delayed folding branch review
2015-08-03 5:49 GMT+02:00 Jason Merrill ja...@redhat.com: On 07/31/2015 05:54 PM, Kai Tietz wrote: The STRIP_NOPS-requirement in 'reduced_constant_expression_p' I could remove, but for one case in constexpr. Without folding we don't do type-sinking/raising. Right. So binary/unary operations might be containing cast, which were in the past unexpected. Why aren't the casts folded away? On such cast constructs, as for this vector-sample, we can't fold away the cast chain. The difference here to none-delayed-folding branch is that the cast isn't moved out of the plus-expr. What we see now is (plus ((vec) (const vector ...) { }), ...). Before we had (vec) (plus (const vector ...) { ... }). On verify_constant we check by reduced_constant_expression_p, if value is a constant. We don't handle here, that NOP_EXPRs are something we want to look through here, as it doesn't change anything if this is a constant, or not. NOPs around constants should have been folded away by the time we get there. Not in this cases, as the we actually have here a switch from const to none-const. So there is an attribute-change, which we can't ignore in general. But I agree that for constexpr's we could special case cast from const to none-const (as required in expressions like const vec v = v + 1). Jason Kai
Re: [PATCH][AArch64][8/14] Implement TARGET_OPTION_VALID_ATTRIBUTE_P
On Fri, Jul 24, 2015 at 11:43:32AM +0100, Kyrill Tkachov wrote: On 21/07/15 16:37, James Greenhalgh wrote: On Thu, Jul 16, 2015 at 04:20:59PM +0100, Kyrill Tkachov wrote: Hi all, This patch implements target attribute support via the TARGET_OPTION_VALID_ATTRIBUTE_P hook. The aarch64_handle_option function in common/config/aarch64/aarch64-common.c is exported to the backend and beefed up a bit. The target attributes supported by this patch reflect the command-line options that we specified as Save earlier in the series. Explicitly, the target attributes supported are: - general-regs-only - fix-cortex-a53-835769 and no-fix-cortex-a53-835769 - cmodel= - strict-align - omit-leaf-frame-pointer and no-omit-leaf-frame-pointer - tls-dialect - arch= - cpu= - tune= These correspond to equivalent command-line options when prefixed with a '-m'. Additionally, this patch supports specifying architectural features, as in the -march and -mcpu options by themselves. So, for example we can write: __attribute__((target(+simd+crypto))) to enable 'simd' and 'crypto' on a per-function basis. The documentation and tests for this come as a separate patch later after the target pragma support and the inlining rules are implemented. Bootstrapped and tested on aarch64. In addition to the comments below, you may want to run contrib/check_GNU_style.sh on this patch, it shows a number of other issues that would be nice to fix. Thanks, here's the updated patch. I used alloca for memory allocation to keep consistent with the rest of aarch64.c, fixed the error message issues, spacing and control flow issues you pointed out. How's this? +static bool +aarch64_process_one_target_attr (char *arg_str, const char* pragma_or_attr) +{ + bool ret; + bool invert = false; + snip + ret = false; I don't think this variable is exactly what you want. Surely as soon as we see a failure case, we want to return false. Otherwise we could see junk,junk,ok_stuff and return true. So I would drop ret and rewrite this to bail out as soon as we see an error. The downside is we won't catch multiple errors that way. The other thing to do would be to invert the sense of your flag to something like failed and only set it where we fail. It depends what behaviour you're looking for. + for (p_attr = aarch64_attributes; !ret p_attr-name; p_attr++) +{ + /* If the names don't match up, or the user has given an argument + to an attribute that doesn't accept one, or didn't give an argument + to an attribute that expects one, fail to match. */ + if (strcmp (str_to_check, p_attr-name) != 0) + continue; + + bool attr_need_arg_p = p_attr-attr_type == aarch64_attr_custom + || p_attr-attr_type == aarch64_attr_enum; + + if (attr_need_arg_p ^ (arg != NULL)) + { + error (target %s %qs does not accept an argument, + pragma_or_attr, str_to_check); + return false; + } + + /* If the name matches but the attribute does not allow no- versions + then we can't match. */ + if (invert !p_attr-allow_neg) + { + error (target %s %qs does not allow a negated form, + pragma_or_attr, str_to_check); + return false; + } + + switch (p_attr-attr_type) + { + /* Has a custom handler registered. +For example, cpu=, arch=, tune=. */ + case aarch64_attr_custom: + gcc_assert (p_attr-handler); + ret = p_attr-handler (arg, pragma_or_attr); + break; + + /* Either set or unset a boolean option. */ + case aarch64_attr_bool: + { + struct cl_decoded_option decoded; + + generate_option (p_attr-opt_num, NULL, !invert, +CL_TARGET, decoded); + aarch64_handle_option (global_options, global_options_set, + decoded, input_location); + ret = true; + break; + } + /* Set or unset a bit in the target_flags. aarch64_handle_option + should know what mask to apply given the option number. */ + case aarch64_attr_mask: + { + struct cl_decoded_option decoded; + /* We only need to specify the option number. + aarch64_handle_option will know which mask to apply. */ + decoded.opt_index = p_attr-opt_num; + decoded.value = !invert; + aarch64_handle_option (global_options, global_options_set, + decoded, input_location); + ret = true; + break; + } + /* Use the option setting machinery to set an option to an enum. */ + case aarch64_attr_enum: + { + gcc_assert (arg); + bool valid; + int
Re: [PATCH][ARM] PR target/66731 Fix vnmul insn with -frounding-math
On 28/07/15 11:21, Szabolcs Nagy wrote: On 24/07/15 12:27, Kyrill Tkachov wrote: On 24/07/15 12:10, Szabolcs Nagy wrote: (-a)*b should not be compiled to vnmul a,b with -frounding-math. Added a new -(a*b) pattern for vnmul and the old one is only used if !flag_rounding_math. Updated the costs too. This is the ARM version of https://gcc.gnu.org/ml/gcc-patches/2015-07/msg00300.html Tested with arm-none-linux-gnueabihf cross compiler. is this OK? gcc/Changelog: 2015-07-20 Szabolcs Nagy szabolcs.n...@arm.com PR target/66731 * config/arm/arm.md (muldf3negdf_vfp): Handle -frounding-math. (mulsf3negsf_vfp): Likewise. This entry is misleading. You disable two existing patterns for flag_rounding_math and you add two new patterns. The entry should reflect that. .. Can you give the new pattern a different name to reflect that the neg is on the outside? Something like *negmulsf3_vfp. .. +/* { dg-options -O2 -mfpu=vfp -mfloat-abi=hard } */ Can you please add an explicit -fno-rounding-math here? That way we get a hint as to why these tests exist. Alternatively, you can rename the tests to be pr66731_1.c, pr66731_2.c etc. That way in the future we'll know what issue they're testing for. .. +float +foo_s (float a, float b) +{ + /* { dg-final { scan-assembler vneg\.f32 } } */ + /* { dg-final { scan-assembler vmul\.f32 } } */ + return -a * b; +} I'd prefer if you just do a scan-assembler not vnmul, which is what this patch really fixes. Whether the midend decides to use a pair of vneg+vmul is tangential to this patch, it's the vnmul that we're trying to avoid. [v2]: - used different names for the new patterns - fixed change log accordingly - used explicit -fno-rounding-math in tests - used scan-assembler-not vnmul (I havent changed the names of the tests to be consistent with the aarch64 patches but if ppl prefer prN.c name I can do that.) gcc/Changelog: 2015-07-28 Szabolcs Nagy szabolcs.n...@arm.com PR target/66731 * config/arm/arm.md (negmuldf3_vfp): Add new pattern. (negmulsf3_vfp): Likewise. (muldf3negdf_vfp): Disable for -frounding-math. (mulsf3negsf_vfp): Likewise. * config/arm/arm.c (arm_new_rtx_costs): Fix NEG cost for VNMUL, fix MULT cost with -frounding-math. gcc/testsuite/Changelog: 2015-07-28 Szabolcs Nagy szabolcs.n...@arm.com PR target/66731 * gcc.target/arm/vnmul-1.c: New. * gcc.target/arm/vnmul-2.c: New. * gcc.target/arm/vnmul-3.c: New. * gcc.target/arm/vnmul-4.c: New. This is ok, thanks. Kyrill
Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3
Le 29/07/2015 17:08, Paul Richard Thomas a écrit : Dear All, On 24 July 2015 at 10:08, Damian Rouson dam...@sourceryinstitute.org wrote: I love this idea and had similar thoughts as well. :D Sent from my iPhone On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas paul.richard.tho...@gmail.com wrote: Dear Mikael, It had crossed my mind also that a .mod and a .smod file could be written. Normally, the .smod files are produced by the submodules themselves, so that their descendants can pick up the symbols that they generate. There is no reason at all why this could not be implemented; early on in the development I did just this, although I think that it would now be easier to modify this patch. One huge advantage of proceeding in this way is that any resulting library can be distributed with the .mod file alone so that the private entities are never exposed. The penalty is that a second file is output. With best regards Paul Please find attached the implementation of this suggestion. Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the original preferred? There hasn't been a lot of voices about this among the other active and less active team members. I prefer this private members to separate smod variant. It's OK for trunk as far as I'm concerned. Thanks. Mikael PS: Regarding redundant initializations: rather have too many than too few. ;-)
Re: [PATCH][AArch64][9/14] Implement TARGET_CAN_INLINE_P
On Thu, Jul 23, 2015 at 11:17:20AM +0100, Kyrill Tkachov wrote: Thanks, I've implemented the suggestions. Re-bootstrapped and tested on aarch64. How's this? This is good. OK for trunk. Thanks, James
Re: [gomp4] openacc default handling
Hi! On Wed, 29 Jul 2015 19:13:01 -0400, Nathan Sidwell nat...@acm.org wrote: gcc/ * gimplify.c (oacc_default_clause): Outer scope searching moved to omp_notice_variable. (omp_notice_variable): For OpenACC search enclosing scopes before applying default. Committed to gomp-4_0-branch in r226495: commit 9446d5a23e74653407f079188e8be4bc9343e15e Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Aug 3 11:06:38 2015 + Resolve bootstrap failure in oacc_default_clause [...]/source-gcc/gcc/gimplify.c: In function 'unsigned int oacc_default_clause(gimplify_omp_ctx*, tree, bool, unsigned int)': [...]/source-gcc/gcc/gimplify.c:5913:13: error: unused parameter 'in_code' [-Werror=unused-parameter] bool in_code, unsigned flags) ^ Fixup for r226373. gcc/ * gimplify.c (oacc_default_clause): Remove in_code formal parameter. Adjust all users. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226495 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |5 + gcc/gimplify.c |5 ++--- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 3b907ea..a30f6a3 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,8 @@ +2015-08-03 Thomas Schwinge tho...@codesourcery.com + + * gimplify.c (oacc_default_clause): Remove in_code formal + parameter. Adjust all users. + 2015-08-02 Nathan Sidwell nat...@codesourcery.com * omp-low.c (oacc_xform_on_device): New function. diff --git gcc/gimplify.c gcc/gimplify.c index 0f8011d..f74f842 100644 --- gcc/gimplify.c +++ gcc/gimplify.c @@ -5909,8 +5909,7 @@ device_resident_p (tree decl) but not declared in an enclosing clause. */ static unsigned -oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, -bool in_code, unsigned flags) +oacc_default_clause (struct gimplify_omp_ctx *ctx, tree decl, unsigned flags) { switch (ctx-default_kind) { @@ -6036,7 +6035,7 @@ omp_notice_variable (struct gimplify_omp_ctx *ctx, tree decl, bool in_code) } if (is_oacc) - flags = oacc_default_clause (ctx, decl, in_code, flags); + flags = oacc_default_clause (ctx, decl, flags); else flags |= GOVD_MAP; Grüße, Thomas signature.asc Description: PGP signature
[PATCH, PING*2] PR debug/53927: fix value for DW_AT_static_link
On 07/20/2015 09:39 AM, Pierre-Marie de Rodat wrote: This patch fixes the static link description in DWARF to comply with the specification. In order to do so, it appends a field to all FRAME objects to hold the frame base address (DW_AT_frame_base) so that the nested subrograms can directly reference this field. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53927 for the context (in particular why we need this additional field in FRAME objects). Ping for the patch submitted in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01629.html. Thanks! -- Pierre-Marie de Rodat
Re: [PATCH] Use single shared memory block pool for all pool allocators
On Jul 26, 2015, at 11:50 AM, Andi Kleen a...@firstfloor.org wrote: I've been compiling gcc with tcmalloc to do a similar speedup. It would be interesting to compare that to your patch. I repeated the test with TCMalloc and jemalloc. TCMalloc shows nice results, though it required some tweaks: this allocator has a threshold block size equal to 32 KB, larger blocks are allocated from global heap, rather than thread cache (and this operation is expensive), so the original patch shows worse performance when used with TCMalloc. In order to fix this, I reduced the block size to 8 KB. Here there are 5 columns for each value: pristine version, pristine version + TCMalloc (and the difference in parenthesis), and patched version with TCMalloc (difference is relative to pristine version). Likewise, for memory usage. 400.perlbench26.86 26.17 ( -2.57%) 26.17 ( -2.57%) user 0.56 0.64 ( +14.29%) 0.61 ( +8.93%) sys 27.45 26.84 ( -2.22%) 26.81 ( -2.33%) real 401.bzip2 2.532.5 ( -1.19%) 2.48 ( -1.98%) user 0.07 0.09 ( +28.57%)0.1 ( +42.86%) sys 2.612.6 ( -0.38%) 2.59 ( -0.77%) real 403.gcc 73.59 72.62 ( -1.32%) 71.72 ( -2.54%) user 1.59 1.88 ( +18.24%) 1.88 ( +18.24%) sys 75.27 74.58 ( -0.92%) 73.67 ( -2.13%) real 429.mcf0.4 0.41 ( +2.50%)0.4 ( +0.00%) user 0.03 0.05 ( +66.67%) 0.05 ( +66.67%) sys 0.44 0.47 ( +6.82%) 0.47 ( +6.82%) real 433.milc 3.22 3.24 ( +0.62%) 3.25 ( +0.93%) user 0.22 0.32 ( +45.45%)0.3 ( +36.36%) sys 3.48 3.59 ( +3.16%) 3.59 ( +3.16%) real 444.namd 7.54 7.41 ( -1.72%) 7.37 ( -2.25%) user 0.1 0.15 ( +50.00%) 0.15 ( +50.00%) sys 7.66 7.58 ( -1.04%) 7.54 ( -1.57%) real 445.gobmk20.24 19.59 ( -3.21%) 19.6 ( -3.16%) user 0.52 0.67 ( +28.85%) 0.59 ( +13.46%) sys 20.8 20.29 ( -2.45%) 20.23 ( -2.74%) real 450.soplex 19.08 18.47 ( -3.20%) 18.51 ( -2.99%) user 0.87 1.11 ( +27.59%) 1.06 ( +21.84%) sys 19.99 19.62 ( -1.85%) 19.6 ( -1.95%) real 453.povray 42.27 41.42 ( -2.01%) 41.32 ( -2.25%) user 2.71 3.11 ( +14.76%) 3.09 ( +14.02%) sys 45.04 44.58 ( -1.02%) 44.47 ( -1.27%) real 456.hmmer 7.27 7.22 ( -0.69%) 7.15 ( -1.65%) user 0.31 0.36 ( +16.13%) 0.39 ( +25.81%) sys 7.61 7.61 ( +0.00%) 7.57 ( -0.53%) real 458.sjeng 3.22 3.14 ( -2.48%) 3.15 ( -2.17%) user 0.09 0.16 ( +77.78%) 0.14 ( +55.56%) sys 3.32 3.32 ( +0.00%)3.3 ( -0.60%) real 462.libquantum0.86 0.87 ( +1.16%) 0.85 ( -1.16%) user 0.05 0.08 ( +60.00%) 0.08 ( +60.00%) sys 0.92 0.96 ( +4.35%) 0.94 ( +2.17%) real 464.h264ref 27.62 27.27 ( -1.27%) 27.16 ( -1.67%) user 0.63 0.73 ( +15.87%) 0.75 ( +19.05%) sys 28.28 28.03 ( -0.88%) 27.95 ( -1.17%) real 470.lbm 0.27 0.27 ( +0.00%) 0.27 ( +0.00%) user 0.01 0.01 ( +0.00%) 0.01 ( +0.00%) sys 0.29 0.29 ( +0.00%) 0.29 ( +0.00%) real 471.omnetpp 28.29 27.63 ( -2.33%) 27.54 ( -2.65%) user 1.5 1.57 ( +4.67%) 1.62 ( +8.00%) sys 29.84 29.25 ( -1.98%) 29.21 ( -2.11%) real 473.astar 1.14 1.12 ( -1.75%) 1.11 ( -2.63%) user 0.05 0.07 ( +40.00%) 0.09 ( +80.00%) sys 1.21 1.21 ( +0.00%)1.2 ( -0.83%) real 482.sphinx3 4.65 4.57 ( -1.72%) 4.59 ( -1.29%) user 0.20.3 ( +50.00%) 0.26 ( +30.00%) sys 4.88 4.89 ( +0.20%) 4.88 ( +0.00%) real 483.xalancbmk284.5 276.4 ( -2.85%) 276.48 ( -2.82%) user 20.29 23.03 ( +13.50%) 22.82 ( +12.47%) sys 305.19 299.79 ( -1.77%) 299.67 ( -1.81%) real 400.perlbench 102308kB123004kB ( +20696kB)116104kB ( +13796kB) 401.bzip2 74628kB 86936kB ( +12308kB) 84316kB ( +9688kB) 403.gcc 190284kB218180kB ( +27896kB)212480kB ( +22196kB) 429.mcf19804kB 24464kB ( +4660kB) 24320kB ( +4516kB) 433.milc 36940kB 45308kB ( +8368kB) 44652kB ( +7712kB) 444.namd 183548kB193856kB ( +10308kB)192632kB ( +9084kB) 445.gobmk 73724kB 78792kB (
Re: [PATCH] Optimize certain end of loop conditions into min/max operation
On Fri, Jul 31, 2015 at 8:27 PM, Jeff Law l...@redhat.com wrote: On 07/31/2015 12:18 PM, Michael Collison wrote: Hi Jeff, Yes I will create a test case. I'm not quite sure what to check for even in the machine dependent test case. It's quite possible for the instructions that are generated to change over time. I think we're going to want to look at the gimple IR and search for the MIN/MAX expressions rather than the instructions. Given we don't know where the transformation is going to land (yet), you can probably start with -fdump-tree-optimized and scanning the .optimized dump. Yeah. FWIW I'm ok with the specialized pattern in match.pd. Indeed phiopt isn't a good fit here (though on some targets you may end up seeing a PHI node, dependent on LOGICAL_OP_NON_SHORT_CIRCUIT). For longer chains I'd say reassoc is the appropriate place to handle this as it has already code to combine s and ||s of conditions. Maybe you can give it a short try to only handle it there. I'm also somewhat worried about code-generation on random targets. So can you have a quick look (just for your testcase) what the code-gen difference is on primary platforms (just build some stage1 cc1s)? Thanks, Richard. We can still do that and have the test be target specific. jeff
Re: Go patch committed: Fix error reporting for invalid builtin calls
Ian Lance Taylor i...@golang.org writes: This patch from Chris Manghane fixes the Go frontend error reporting for invalid builtin calls, by not losing track of whether the call is erroneous. This fixes https://golang.org/issue/11561. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack': ../../../libgo/runtime/mprof.goc:408:5: error: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Werror=frame-address] sp = runtime_getcallersp(b); ^ Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [PATCH] Add __builtin_stack_top to x86 backend
On Thu, Jul 30, 2015 at 8:41 PM, H.J. Lu hongjiu...@intel.com wrote: On Tue, Jul 21, 2015 at 02:45:39PM -0700, H.J. Lu wrote: When __builtin_frame_address is used to retrieve the address of the function stack frame, the frame pointer is always kept, which wastes one register and 2 instructions. For x86-32, one less register means significant negative impact on performance. This patch adds a new builtin function, __builtin_ia32_stack_top, to x86 backend. It returns the stack address when the function is called. Any comments, feedbacks? Although this function is generic, but implementation is target specific. I submitted a generic patch: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01859.html So far there are no interests from other backends. Here is a patch to implement __builtin_stack_top in x86 backend. We can update x86 backedn after it is added to middle-end. OK for trunk? I think that the discussion about generic implementation should come to some conclusion first. From the discussion, here was no resolution on which way to go. Uros.
[PATCH, PING] DWARF: materialize subprogram renamings in Ada as imported declarations
On 07/25/2015 09:44 PM, Pierre-Marie de Rodat wrote: This change makes GCC materialize subprogram renamings in Ada as imported declarations (DW_TAG_imported_declarations). For instance, procedure Foo renames Bar; will output: DW_TAG_imported_declaration: DW_AT_name: foo DW_AT_import: reference to Bar Ping for the patch submitted in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02166.html. -- Pierre-Marie de Rodat
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On Mon, Aug 3, 2015 at 7:04 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Aug 3, 2015 at 3:02 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 03/08/15 13:33, Uros Bizjak wrote: Hello! 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. This patch regressed following tests on x86_64: FAIL: gcc.target/i386/cmov2.c scan-assembler sbb FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3] The difference for cmov3.c on x86_64 is: cmpl%esi, %edi movl$-5, %edx movl$5, %eax cmovg %edx, %eax ret vs. new code: xorl%eax, %eax cmpl%esi, %edi setle %al negl%eax andl$10, %eax subl$5, %eax ret I'm not sure old code is really better than new. HJ, do you have any better insight? We are looking into it. -- H.J.
Re: [PATCH 2/5] Add preferred_for_{size,speed} attributes
On Fri, Oct 17, 2014 at 7:48 AM, Richard Sandiford richard.sandif...@arm.com wrote: This is the main patch, to add new preferred_for_size and preferred_for_speed attributes that can be used to selectively disable alternatives when optimising for size or speed. As explained in the docs, the new attributes are just optimisation hints and it is possible that size-only alternatives will sometimes end up in a block that's optimised for speed, or vice versa. The patch deals with code that directly accesses the enabled_attributes mask and that ought to take size/speed choices into account. The next patch deals with indirect uses. Note that I'm not making reload support these attributes for hopefully obvious reasons :-) Richard gcc/ * doc/md.texi: Document preferred_for_size and preferred_for_speed attributes. * genattr.c (main): Handle preferred_for_size and preferred_for_speed in the same way as enabled. * recog.h (bool_attr): New enum. (target_recog): Replace x_enabled_alternatives with x_bool_attr_masks. (get_preferred_alternatives, check_bool_attrs): Declare. * recog.c (have_bool_attr, get_bool_attr, get_bool_attr_mask_uncached) (get_bool_attr_mask, get_preferred_alternatives, check_bool_attrs): New functions. (get_enabled_alternatives): Use get_bool_attr_mask. * ira-costs.c (record_reg_classes): Use get_preferred_alternatives instead of recog_data.enabled_alternatives. * ira.c (ira_setup_alts): Likewise. * postreload.c (reload_cse_simplify_operands): Likewise. * config/i386/i386.c (ix86_legitimate_combined_insn): Likewise. * ira-lives.c (preferred_alternatives): New variable. (process_bb_node_lives): Set it. (check_and_make_def_conflict, make_early_clobber_and_input_conflicts) (single_reg_class, ira_implicitly_set_insn_hard_regs): Use it instead of recog_data.enabled_alternatives. * lra-int.h (lra_insn_recog_data): Replace enabled_alternatives to preferred_alternatives. * lra-constraints.c (process_alt_operands): Update accordingly. * lra.c (lra_set_insn_recog_data): Likewise. (lra_update_insn_recog_data): Assert check_bool_attrs. This caused: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67029 -- H.J.
[PATCH] Build *-match.o as early as possible
The two files *-match.o files always finish building last, so if we start building them as soon as possible (instead of pretty late) the total build time will be less on a parallel build. Bootstrapped and tested on powerpc64-linux. Is this okay for trunk? Segher 2014-080-3 Segher Boessenkool seg...@kernel.crashing.org * Makefile.in (OBJS): Put gimple-match.o and generic-match.o first. --- gcc/Makefile.in | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/gcc/Makefile.in b/gcc/Makefile.in index be259e8..683c42a 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1162,10 +1162,12 @@ C_COMMON_OBJS = c-family/c-common.o c-family/c-cppbuiltin.o c-family/c-dump.o \ c-family/array-notation-common.o c-family/cilk.o c-family/c-ubsan.o # Language-independent object files. -# We put the insn-*.o files first so that a parallel make will build -# them sooner, because they are large and otherwise tend to be the -# last objects to finish building. +# We put the *-match.o and insn-*.o files first so that a parallel make +# will build them sooner, because they are large and otherwise tend to be +# the last objects to finish building. OBJS = \ + gimple-match.o \ + generic-match.o \ insn-attrtab.o \ insn-automata.o \ insn-dfatab.o \ @@ -1260,8 +1262,6 @@ OBJS = \ gimple-fold.o \ gimple-laddress.o \ gimple-low.o \ - gimple-match.o \ - generic-match.o \ gimple-pretty-print.o \ gimple-ssa-isolate-paths.o \ gimple-ssa-strength-reduction.o \ -- 1.8.1.4
[hsa] HSA: introduce more pure kernel information emission.
Hello. Following patch has been installed to HSA branch. Thanks, Martin From 64f7f75a8b16dee2071fef7f546fbf6b06f82c4f Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Fri, 31 Jul 2015 17:53:57 +0200 Subject: [PATCH 1/3] HSA: introduce more pure kernel information emission. libgomp/ChangeLog: 2015-07-31 Martin Liska mli...@suse.cz * plugin/plugin-hsa.c (struct hsa_kernel_description): New structure. (struct brig_image_desc): Use it. (struct kernel_info): Add omp_data_size member. (struct module_info): Remove unused field. (GOMP_OFFLOAD_load_image): Load the newly added format. (create_kernel_dispatch): Allocate exactly ammount of memory needed for OMP data passing. (init_single_kernel): Print the OMP memory. hsa.c (init_hsa_image): Use new structure. gcc/ChangeLog: 2015-07-31 Martin Liska mli...@suse.cz * hsa-brig.c (hsa_output_kernel_mapping): Serialize new data structure. * hsa-gen.c (hsa_function_representation::hsa_function_representation): Add new field. (gen_hsa_insns_for_kernel_call): Calculate maximum amount of memory needed for OMP data. (generate_hsa): Likewise. * hsa.c (hsa_add_kern_decl_mapping): New function. (hsa_get_decl_kernel_mapping_omp_size): Likewise. * hsa.h: Add declaration of these functions. --- gcc/hsa-brig.c | 231 ++-- gcc/hsa-gen.c | 16 ++- gcc/hsa.c | 13 ++- gcc/hsa.h | 8 +- libgomp/hsa.c | 35 +++ libgomp/plugin/plugin-hsa.c | 107 +--- 6 files changed, 209 insertions(+), 201 deletions(-) diff --git a/gcc/hsa-brig.c b/gcc/hsa-brig.c index 27c41a5..1cc0b22 100644 --- a/gcc/hsa-brig.c +++ b/gcc/hsa-brig.c @@ -1753,7 +1753,7 @@ hsa_output_kernel_mapping (tree brig_decl) unsigned map_count = hsa_get_number_decl_kernel_mappings (); tree int_num_of_kernels; - int_num_of_kernels = build_int_cst (integer_type_node, (int) map_count); + int_num_of_kernels = build_int_cst (uint32_type_node, map_count); tree kernel_num_index_type = build_index_type (int_num_of_kernels); tree host_functions_array_type = build_array_type (ptr_type_node, kernel_num_index_type); @@ -1768,7 +1768,7 @@ hsa_output_kernel_mapping (tree brig_decl) tree host_functions_ctor = build_constructor (host_functions_array_type, host_functions_vec); char tmp_name[64]; - ASM_GENERATE_INTERNAL_LABEL (tmp_name, hsa_host_functions, 1); + ASM_GENERATE_INTERNAL_LABEL (tmp_name, __hsa_host_functions, 1); tree hsa_host_func_table = build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier (tmp_name), host_functions_array_type); @@ -1782,115 +1782,161 @@ hsa_output_kernel_mapping (tree brig_decl) DECL_INITIAL (hsa_host_func_table) = host_functions_ctor; varpool_node::finalize_decl (hsa_host_func_table); - int len = 0; - for (unsigned i = 0; i map_count; ++i) -{ - char *name = hsa_get_decl_kernel_mapping_name (i); - /* We add 1 for the terminating zero and 1 for an ampersand prefix. */ - len = len + strlen (name) + 2; -} - len++; + /* Following code emits list of kernel_info structures. */ - /* Kernel mapping is a list of string names terminated by '\0'. */ - char *buf = XNEWVEC (char, len); - char *p = buf; - for (unsigned i = 0; i map_count; ++i) -{ - char *name = hsa_get_decl_kernel_mapping_name (i); - int ll = strlen (name); - gcc_assert (ll 0); - *p = ''; - p++; - memcpy (p, name, ll); - p += ll; - *p = '\0'; - p++; -} - *p = '\0'; - tree kern_names = build_string (len, buf); - TREE_TYPE (kern_names) = build_array_type (char_type_node, - build_index_type (size_int (len))); - free (buf); + tree kernel_info_type = make_node (RECORD_TYPE); + tree id_f1 = build_decl (BUILTINS_LOCATION, FIELD_DECL, + get_identifier (name), ptr_type_node); + DECL_CHAIN (id_f1) = NULL_TREE; + tree id_f2 = build_decl (BUILTINS_LOCATION, FIELD_DECL, + get_identifier (omp_data_size), + unsigned_type_node); + DECL_CHAIN (id_f2) = id_f1; + tree id_f3 = build_decl (BUILTINS_LOCATION, FIELD_DECL, + get_identifier (kernel_dependencies_count), + unsigned_type_node); + DECL_CHAIN (id_f3) = id_f2; + tree id_f4 = build_decl (BUILTINS_LOCATION, FIELD_DECL, + get_identifier (kernel_dependencies), + build_pointer_type (build_pointer_type + (char_type_node))); + DECL_CHAIN (id_f4) = id_f3; + finish_builtin_struct (kernel_info_type, __hsa_kernel_info, id_f4, + NULL_TREE); - /* Kernel dependencies is a list of lists, where for a given kernel A all - direct kernel dispatches to A1, A2, .., An are organized are joined by '.' - character and each entry in the list is separated by '\0'. */ + int_num_of_kernels = build_int_cstu (uint32_type_node, map_count); + tree kernel_info_vector_type = build_array_type +(kernel_info_type, build_index_type
[hsa] HSA: fix kernel initialization for dependent kernels.
Hello. Following patch has been installed to HSA branch. Thanks, Martin From 125ee1ed4123527d4ce841631d4930ac96bec281 Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Mon, 3 Aug 2015 13:00:42 +0200 Subject: [PATCH 2/3] HSA: fix kernel initialization for dependent kernels libgomp/ChangeLog: 2015-08-03 Martin Liska mli...@suse.cz * plugin/plugin-hsa.c (init_single_kernel): Initialize recursively all dependent kernels. (init_kernel): Moved from this location. --- libgomp/plugin/plugin-hsa.c | 19 +-- 1 file changed, 9 insertions(+), 10 deletions(-) diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c index 144548a..c05de1e 100644 --- a/libgomp/plugin/plugin-hsa.c +++ b/libgomp/plugin/plugin-hsa.c @@ -682,6 +682,15 @@ init_single_kernel (struct kernel_info *kernel) fprintf (stderr, omp_data_size: %u\n, kernel-omp_data_size); } + + /* FIXME: do not consider all kernels to live in a same module. */ + struct module_info *module = kernel-agent-first_module; + for (unsigned i = 0; i kernel-dependencies_count; i++) +{ + struct kernel_info *dependency = get_kernel_in_module + (module, kernel-dependencies[i]); + init_single_kernel (dependency); +} } /* Indent stream F by INDENT spaces. */ @@ -766,16 +775,6 @@ init_kernel (struct kernel_info *kernel) init_single_kernel (kernel); - struct agent_info *agent = kernel-agent; - struct module_info *module = agent-first_module; - - for (unsigned i = 0; i kernel-dependencies_count; i++) -{ - struct kernel_info *dependency = get_kernel_in_module - (module, kernel-dependencies[i]); - init_single_kernel (dependency); -} - if (debug) fprintf (stderr, \n); -- 2.4.6
[hsa] HSA: calculate properly maximum OMP data size which is, necessary.
Hello. Following patch has been installed to HSA branch. Thanks, Martin From 163c6a4bb917ae76dad265dca5762a3bd87c947a Mon Sep 17 00:00:00 2001 From: mliska mli...@suse.cz Date: Mon, 3 Aug 2015 14:35:26 +0200 Subject: [PATCH 3/3] HSA: calculate properly maximum OMP data size which is necessary. libgomp/ChangeLog: 2015-08-03 Martin Liska mli...@suse.cz * plugin/plugin-hsa.c (create_kernel_dispatch): Calculate maximum memory necessary for OMP data. (init_single_kernel): Likewise. (create_kernel_dispatch_recursive): Pass new argument with maximum OMP data size. (init_kernel): Likewise. (GOMP_OFFLOAD_run): Likewise. --- libgomp/plugin/plugin-hsa.c | 40 1 file changed, 24 insertions(+), 16 deletions(-) diff --git a/libgomp/plugin/plugin-hsa.c b/libgomp/plugin/plugin-hsa.c index c05de1e..4270245 100644 --- a/libgomp/plugin/plugin-hsa.c +++ b/libgomp/plugin/plugin-hsa.c @@ -573,17 +573,15 @@ create_and_finalize_hsa_program (struct agent_info *agent) /* Create kernel dispatch data structure for given KERNEL. */ static struct hsa_kernel_dispatch * -create_kernel_dispatch (struct kernel_info *kernel) +create_kernel_dispatch (struct kernel_info *kernel, unsigned omp_data_size) { struct agent_info *agent = kernel-agent; struct hsa_kernel_dispatch *shadow = GOMP_PLUGIN_malloc_cleared (sizeof (struct hsa_kernel_dispatch)); shadow-queue = agent-command_q; - - /* Compute right size needed for memory allocation. */ - shadow-omp_data_memory = GOMP_PLUGIN_malloc (kernel-omp_data_size); - + shadow-omp_data_memory = omp_data_size 0 +? GOMP_PLUGIN_malloc (omp_data_size) : NULL; unsigned dispatch_count = kernel-dependencies_count; shadow-kernel_dispatch_count = dispatch_count; @@ -634,10 +632,11 @@ release_kernel_dispatch (struct hsa_kernel_dispatch *shadow) free (shadow); } -/* Initialize a KERNEL without its dependencies. */ +/* Initialize a KERNEL without its dependencies. MAX_OMP_DATA_SIZE is used + to calculate maximum necessary memory for OMP data allocation. */ static void -init_single_kernel (struct kernel_info *kernel) +init_single_kernel (struct kernel_info *kernel, unsigned *max_omp_data_size) { hsa_status_t status; struct agent_info *agent = kernel-agent; @@ -683,13 +682,16 @@ init_single_kernel (struct kernel_info *kernel) kernel-omp_data_size); } + if (kernel-omp_data_size *max_omp_data_size) +*max_omp_data_size = kernel-omp_data_size; + /* FIXME: do not consider all kernels to live in a same module. */ struct module_info *module = kernel-agent-first_module; for (unsigned i = 0; i kernel-dependencies_count; i++) { struct kernel_info *dependency = get_kernel_in_module (module, kernel-dependencies[i]); - init_single_kernel (dependency); + init_single_kernel (dependency, max_omp_data_size); } } @@ -738,11 +740,14 @@ print_kernel_dispatch (struct hsa_kernel_dispatch *dispatch, unsigned indent) dependencies. */ static struct hsa_kernel_dispatch * -create_kernel_dispatch_recursive (struct kernel_info *kernel) +create_kernel_dispatch_recursive (struct kernel_info *kernel, + unsigned omp_data_size) { // TODO: find correct module struct module_info *module = kernel-agent-first_module; - struct hsa_kernel_dispatch *shadow = create_kernel_dispatch (kernel); + + struct hsa_kernel_dispatch *shadow = create_kernel_dispatch (kernel, + omp_data_size); shadow-debug = 0; for (unsigned i = 0; i kernel-dependencies_count; i++) @@ -750,7 +755,7 @@ create_kernel_dispatch_recursive (struct kernel_info *kernel) struct kernel_info *dependency = get_kernel_in_module (module, kernel-dependencies[i]); shadow-children_dispatches[i] = create_kernel_dispatch_recursive - (dependency); + (dependency, omp_data_size); } return shadow; @@ -760,7 +765,7 @@ create_kernel_dispatch_recursive (struct kernel_info *kernel) The function assumes the program has been created, finalized and frozen by create_and_finalize_hsa_program. */ -void +unsigned init_kernel (struct kernel_info *kernel) { if (pthread_mutex_lock (kernel-init_mutex)) @@ -770,10 +775,11 @@ init_kernel (struct kernel_info *kernel) if (pthread_mutex_unlock (kernel-init_mutex)) GOMP_PLUGIN_fatal (Could not unlock an HSA kernel initialization mutex); - return; + return 0; } - init_single_kernel (kernel); + unsigned max_omp_data_size = 0; + init_single_kernel (kernel, max_omp_data_size); if (debug) fprintf (stderr, \n); @@ -782,6 +788,8 @@ init_kernel (struct kernel_info *kernel) if (pthread_mutex_unlock (kernel-init_mutex)) GOMP_PLUGIN_fatal (Could not unlock an HSA kernel initialization mutex); + + return max_omp_data_size; } /* Part of the libgomp plugin interface. Run a kernel on a device N and pass @@ -797,9 +805,9 @@ GOMP_OFFLOAD_run (int n, void *fn_ptr,
Re: [PATCH] Simplify vector compare-not-select sequence
On Sun, 2015-08-02 at 12:18 +0200, Andreas Schwab wrote: Bill Schmidt wschm...@linux.vnet.ibm.com writes: * gcc.target/powerpc/vec-cmp-sel.c: New test. FAIL: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors) Excess errors: /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:14:1: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:15:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:16:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:3: error: use of 'long long' in AltiVec types is invalid without -mvsx /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:17:29: error: incompatible types when initializing type '__vector unsigned long long' using type '__vector __bool long long' /daten/gcc/gcc-20150801/gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c:18:3: error: use of 'long long' in AltiVec types is invalid without -mvsx Andreas. Hi Andreas, Can you please verify that this patch works for you? Thanks, Bill Index: gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c === --- gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c (revision 226505) +++ gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do compile { target powerpc64*-*-* } } */ /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-options -maltivec -O2 } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options -maltivec -O2 -mvsx -mpower8-vector } */ /* { dg-final { scan-assembler vcmpgtsd } } */ /* { dg-final { scan-assembler-not xxlnor } } */
Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.
PING. Thanks, Martin On 07/20/2015 09:51 AM, Martin Liška wrote: On 07/16/2015 01:03 PM, Martin Liška wrote: On 07/09/2015 06:24 PM, Jeff Law wrote: On 07/09/2015 07:56 AM, mliska wrote: gcc/ChangeLog: 2015-07-09 Martin Liska mli...@suse.cz * dbgcnt.def: Add new debug counter. * ipa-icf-gimple.c (func_checker::compare_ssa_name): Add flag for strict mode. (func_checker::compare_memory_operand): Likewise. (func_checker::compare_cst_or_decl): Handle if we are in tail_merge_mode. (func_checker::compare_operand): Pass strict flag properly. (func_checker::stmt_local_def): New function. (func_checker::compare_phi_node): Move from sem_function class. (func_checker::compare_bb_tail_merge): New function. (func_checker::compare_bb): Improve STMT iteration. (func_checker::compare_gimple_call): Pass strict flag. (func_checker::compare_gimple_assign): Likewise. (func_checker::compare_gimple_label): Remove unused flag. (ssa_names_set): New class. (ssa_names_set::build): New function. * ipa-icf-gimple.h (func_checker::gsi_next_nonlocal): New function. (ssa_names_set::contains): New function. (ssa_names_set::add): Likewise. * ipa-icf.c (sem_function::equals_private): Use transformed function. (sem_function::compare_phi_node): Move to func_checker class. * ipa-icf.h: Add new declarations. * tree-ssa-tail-merge.c (check_edges_correspondence): New function. (find_duplicate): Add usage of IPA ICF gimple infrastructure. (find_clusters_1): Pass new sem_function argument. (find_clusters): Likewise. (tail_merge_optimize): Call IPA ICF comparison machinery. So a general question. We're passing in STRICT to several routines, which is fine. But then we're also checking M_TAIL_MERGE_MODE. What's the difference between the two? Can they be unified? Hello. I would say that STRICT is a bit generic mechanism that was introduced some time before. It's e.g. used for checking of THIS arguments for methods and make checking more sensitive in situations that are somehow special. The newly added state is orthogonal to the previous one. -/* Verifies that trees T1 and T2 are equivalent from perspective of ICF. */ +/* Verifies that trees T1 and T2 are equivalent from perspective of ICF. + If STRICT flag is true, versions must match strictly. */ bool -func_checker::compare_ssa_name (tree t1, tree t2) +func_checker::compare_ssa_name (tree t1, tree t2, bool strict) This (and other) functions would seem to be used more than just ICF at this point. A pass over the comments to update them as appropriate would be appreciated. @@ -626,6 +648,136 @@ func_checker::parse_labels (sem_bb *bb) } } +/* Return true if gimple STMT is just a local difinition in a + basic block. Used SSA names are contained in SSA_NAMES_SET. */ s/difinition/definition/ Thanks. I didn't find this comment particularly useful in understanding what this function does. AFAICT the function looks as the uses of the LHS of STMT and verifies they're all in the same block as STMT, right? It also verifies that the none of the operands within STMT are part of SSA_NAMES_SET. What role do those properties play in the meaning of local definition? I tried to explain it more deeply what's the purpose of this function. @@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2) return true; } +void +ssa_names_set::build (basic_block bb) Needs a function comment. What are the important names we're collecting here? Is a single forward and backward pass really sufficient to find all the important names? In the backward pass, do you have to consider things like ASMs? I guess it's difficult to understand what you need to look at because it's not entirely clear the set of SSA_NAMEs you're building. @@ -149,12 +153,20 @@ public: mapping between basic blocks and labels. */ void parse_labels (sem_bb *bb); + /* For given basic blocks BB1 and BB2 (from functions FUNC1 and FUNC), + true value is returned if phi nodes are semantically + equivalent in these blocks. */ + bool compare_phi_node (sem_bb *sem_bb1, sem_bb *sem_bb2); Presumably in the case of tail merging, FUNC1 and FUNC will be the same :-) Yes, the function is not called from tail-merge pass. /* Verifies that trees T1 and T2 are equivalent from perspective of ICF. */ - bool compare_ssa_name (tree t1, tree t2); + bool compare_ssa_name (tree t1, tree t2, bool strict = true); /* Verification function for edges E1 and E2. */ bool compare_edge (edge e1, edge e2); @@ -204,7 +216,7 @@ public: bool compare_tree_ssa_label (tree t1, tree t2); /* Function compare for equality given memory operands T1 and T2. */ - bool compare_memory_operand (tree t1, tree t2); + bool
Re: [PATCH][AArch64][8/14] Implement TARGET_OPTION_VALID_ATTRIBUTE_P
On 03/08/15 11:52, James Greenhalgh wrote: On Fri, Jul 24, 2015 at 11:43:32AM +0100, Kyrill Tkachov wrote: On 21/07/15 16:37, James Greenhalgh wrote: On Thu, Jul 16, 2015 at 04:20:59PM +0100, Kyrill Tkachov wrote: Hi all, This patch implements target attribute support via the TARGET_OPTION_VALID_ATTRIBUTE_P hook. The aarch64_handle_option function in common/config/aarch64/aarch64-common.c is exported to the backend and beefed up a bit. The target attributes supported by this patch reflect the command-line options that we specified as Save earlier in the series. Explicitly, the target attributes supported are: - general-regs-only - fix-cortex-a53-835769 and no-fix-cortex-a53-835769 - cmodel= - strict-align - omit-leaf-frame-pointer and no-omit-leaf-frame-pointer - tls-dialect - arch= - cpu= - tune= These correspond to equivalent command-line options when prefixed with a '-m'. Additionally, this patch supports specifying architectural features, as in the -march and -mcpu options by themselves. So, for example we can write: __attribute__((target(+simd+crypto))) to enable 'simd' and 'crypto' on a per-function basis. The documentation and tests for this come as a separate patch later after the target pragma support and the inlining rules are implemented. Bootstrapped and tested on aarch64. In addition to the comments below, you may want to run contrib/check_GNU_style.sh on this patch, it shows a number of other issues that would be nice to fix. Thanks, here's the updated patch. I used alloca for memory allocation to keep consistent with the rest of aarch64.c, fixed the error message issues, spacing and control flow issues you pointed out. How's this? +static bool +aarch64_process_one_target_attr (char *arg_str, const char* pragma_or_attr) +{ + bool ret; + bool invert = false; + snip + ret = false; I don't think this variable is exactly what you want. Surely as soon as we see a failure case, we want to return false. Otherwise we could see junk,junk,ok_stuff and return true. So I would drop ret and rewrite this to bail out as soon as we see an error. The downside is we won't catch multiple errors that way. The other thing to do would be to invert the sense of your flag to something like failed and only set it where we fail. It depends what behaviour you're looking for. + for (p_attr = aarch64_attributes; !ret p_attr-name; p_attr++) +{ + /* If the names don't match up, or the user has given an argument +to an attribute that doesn't accept one, or didn't give an argument +to an attribute that expects one, fail to match. */ + if (strcmp (str_to_check, p_attr-name) != 0) + continue; + + bool attr_need_arg_p = p_attr-attr_type == aarch64_attr_custom + || p_attr-attr_type == aarch64_attr_enum; + + if (attr_need_arg_p ^ (arg != NULL)) + { + error (target %s %qs does not accept an argument, + pragma_or_attr, str_to_check); + return false; + } + + /* If the name matches but the attribute does not allow no- versions +then we can't match. */ + if (invert !p_attr-allow_neg) + { + error (target %s %qs does not allow a negated form, + pragma_or_attr, str_to_check); + return false; + } + + switch (p_attr-attr_type) + { + /* Has a custom handler registered. + For example, cpu=, arch=, tune=. */ + case aarch64_attr_custom: + gcc_assert (p_attr-handler); + ret = p_attr-handler (arg, pragma_or_attr); + break; + + /* Either set or unset a boolean option. */ + case aarch64_attr_bool: + { + struct cl_decoded_option decoded; + + generate_option (p_attr-opt_num, NULL, !invert, + CL_TARGET, decoded); + aarch64_handle_option (global_options, global_options_set, + decoded, input_location); + ret = true; + break; + } + /* Set or unset a bit in the target_flags. aarch64_handle_option +should know what mask to apply given the option number. */ + case aarch64_attr_mask: + { + struct cl_decoded_option decoded; + /* We only need to specify the option number. +aarch64_handle_option will know which mask to apply. */ + decoded.opt_index = p_attr-opt_num; + decoded.value = !invert; + aarch64_handle_option (global_options, global_options_set, + decoded, input_location); + ret = true; + break; + } + /* Use the option setting machinery to set an option to an enum. */ + case aarch64_attr_enum: + { + gcc_assert (arg); + bool valid; +
Re: [PATCH] PR fortran/66942 -- avoid referencing a NULL C++ thing
On Thu, Jul 30, 2015 at 04:08:47PM +0200, Mikael Morin wrote: Le 29/07/2015 18:45, Steve Kargl a écrit : This builds and passes regression testing on x86_64-*-freebsd. OP found the problem by running the sanatizers. I don't know how to build gcc with this as a build option. I'll commit whichever diff you recommend. I prefer this second variant. OK for trunk without further comment from Richi. Thanks. Thanks Mikael. Committed revision 226517. -- Steve
[committed] Fix error: unable to find a register to spill in class 'R1_REGS' reloading indirect call operand
The 64-bit register indirect call does not use register %r1. So, %r1 shouldn't be clobbered in the call pattern. Normally, this doesn't matter as register %r1 is call clobbered. However, reload can fail when the register used for the indirect call needs to be reloaded from a static location. In this case, register %r1 is needed for the reload and the clobber blocks the reload. Tested on hppa64-hp-hpux11.11 and hppa2.0w-hp-hpux11.11. Committed to trunk, gcc-5 and gcc-4.9. Dave -- John David Anglin dave.ang...@bell.net 2015-08-03 John David Anglin dang...@gcc.gnu.org PR target/67060 * config/pa/pa.md (call_reg_64bit): Remove reg:DI 1 clobber. Adjust splits to match new pattern. Index: config/pa/pa.md === --- config/pa/pa.md (revision 226363) +++ config/pa/pa.md (working copy) @@ -7440,7 +7440,6 @@ (define_insn call_reg_64bit [(call (mem:SI (match_operand:DI 0 register_operand r)) (match_operand 1 i)) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (clobber (match_operand 2)) (use (reg:DI 27)) @@ -7461,7 +7460,6 @@ (define_split [(parallel [(call (mem:SI (match_operand 0 register_operand )) (match_operand 1 )) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (clobber (match_operand 2)) (use (reg:DI 27)) @@ -7472,7 +7470,6 @@ [(set (match_dup 2) (reg:DI 27)) (parallel [(call (mem:SI (match_dup 0)) (match_dup 1)) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (use (reg:DI 27)) (use (reg:DI 29)) @@ -7482,7 +7479,6 @@ (define_split [(parallel [(call (mem:SI (match_operand 0 register_operand )) (match_operand 1 )) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (clobber (match_operand 2)) (use (reg:DI 27)) @@ -7492,7 +7488,6 @@ [(set (match_dup 2) (reg:DI 27)) (parallel [(call (mem:SI (match_dup 0)) (match_dup 1)) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (use (reg:DI 27)) (use (reg:DI 29)) @@ -7503,7 +7498,6 @@ (define_insn *call_reg_64bit_post_reload [(call (mem:SI (match_operand:DI 0 register_operand r)) (match_operand 1 i)) - (clobber (reg:DI 1)) (clobber (reg:DI 2)) (use (reg:DI 27)) (use (reg:DI 29))
libgo patch committed: Kill sleep processes in testsuite
This patch from Andrew Wilkins kills off extra sleep processes when running the libgo testsuite. Otherwise, the sleep process would continue after the test completes. This doesn't affect the testsuite but the extra sleep processes are not desirable. Committed to mainline. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 226525) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -d5aad2f400a0f21724e33e4ae48e1583ed8b1a87 +33d59eff1bd5de29f1fbde3b7625db28595835fd The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/testsuite/gotest === --- libgo/testsuite/gotest (revision 226510) +++ libgo/testsuite/gotest (working copy) @@ -518,7 +518,7 @@ xno) wait $pid status=$? if ! test -f gotest-timeout; then - kill $alarmpid + ps -o pid,ppid | grep $alarmpid | cut -f1 -d | xargs kill -9 fi else if test $trace = true; then
Re: [RFC, patch] New attribute to create target clones
On 07/30/2015 04:19 PM, Evgeny Stupachenko wrote: Hi All, The patch enables new attribute 'ctarget', The attribute force compiler to create clones of a function with the attribute. For example: __attribute__((ctarget(avx,arch=slm,arch=core-avx2,default))) So presumably we're allowing both changing the ISA and the tuning options? In fact, it looks like we're able to change any -m option, right? What about something like -mregparm? I think the docs need to disallow clones with different ABI requirements. Currently default ctarget means that foo() will be optimized with current compiler options and target. The other option is to switch target to target specific minimum (like target x86-64). Is it better? I could make an argument for either. Do we have anything to guide us from other compilers such as ICC that may have a similar capability? What do you think about attribute name? 'ctarget' is short but not informative. Other variants are 'target_clones', 'targets'... target_clones seems good. For multiple_target.c: Can you please trim down the #include set. I can't believe you need all that stuff.If you really need backend stuff (tm.h), then use backend.h instead. It generally looks reasonable, but Jan knows the IPA code much better than I do and I'd like him to chime in. You might also ask Ilya to review the cloning and rules around clones since he head to deal with a lot of that stuff in his MPX work. jeff
Re: [PATCH], PowerPC IEEE 128-bit patch #4
On Fri, Jul 31, 2015 at 12:43:20AM +, Joseph Myers wrote: On Wed, 29 Jul 2015, Michael Meissner wrote: #6Add support for using different names for the 64/128-bit integer conversion to IBM extended double, to allow a future version to switch the default for what long double is. It is not expected that GCC 6.x will make this switch, but we would like to eventually use the standard TF names for the library when the default change is made. If this isn't clear, the following names use 'tf' in them, when they use IBM extended double: That would be for a completely separate ABI, incompatible with all existing objects both static and shared, since the existing ABI defines these names to have their existing meanings? The OpenPower 1.1 ABI for little endian PowerPC 64-bit does not mention the names at all. This ABI leaves it open whether long double is IBM extended double or IEEE 128-bit floating point. One of the goals of these patches is someday in the future change the default. A lot of work, particularly in the library space needs to be done to change the default. Until we can make the switch, users that want IEEE 128-bit support will need to use __float128. The intention of theese changes (currently unwritten) is to change the existing problematical names that use TF in their name to be something else, and provide via a weak reference an alias for the old name. So if for example, we change the default in GCC 7.0, code compiled by GCC 6.0 would work because it uses say __gcc_ltoq to call convert a 64-bit integer to IBM extended double instead of __floatditf. Older code that refers to __floatditf would still work fine. Then sometime later (such as GCC 8.0) we could make __floatditf be a weak reference to __floatdikt. If you have any ideas of how to do this seemlessly, please let me know. Steve Monroe and David Edelsohn requested that I explore that some date in the future, we will be able to use the standard names. I tend to be skeptical that we can do it without running into some incompatibility, and I feel that we just have to live with the existing TF names, and not use TF for IEEE 128-bit. Currently, I'm using KF for the IEEE 128-bit functions, even if long double is mapped to IEEE 128-bit instead of long double. Another wrinkle is the 32-bit RTEMS port actually uses IEEE 128-bit floating point with the standard names, because they never used the IBM extended double. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Re: PR c/c++/diagnostics/66098 Take -Werror into account when deciding what was the command-line status
PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg02581.html Thanks, Manuel. On 30 July 2015 at 17:35, Manuel López-Ibáñez lopeziba...@gmail.com wrote: When I fixed PR59304, I forgot that a command-line warning can be also an error if -Werror was enabled. This introduced a regression since anything enabled in the command-line together with -Werror would get initially classified as a warning when reaching the first #pragma GCC diagnostic, and this will be the setting after a #pragma pop. Options that appear as arguments of -W[no-]error= are not affected by this since those are initially classified as errors/warnings even before reaching the first #pragma, thus the pop sets them correctly (before and after this patch). Nonetheless, the tests also check that they work correctly. Bootregtested on x86_64-linux-gnu. OK? gcc/ChangeLog: 2015-07-29 Manuel López-Ibáñez m...@gcc.gnu.org PR c/66098 PR c/66711 * diagnostic.c (diagnostic_classify_diagnostic): Take -Werror into account when deciding what was the command-line status. gcc/testsuite/ChangeLog: 2015-07-29 Manuel López-Ibáñez m...@gcc.gnu.org PR c/66098 PR c/66711 * gcc.dg/pragma-diag-3.c: New test. * gcc.dg/pragma-diag-4.c: New test.
Re: [PATCH 1/2] Allow REG_EQUAL for ZERO_EXTRACT
On 07/28/2015 06:00 AM, Kugan wrote: On 27/07/15 05:38, Andreas Schwab wrote: Kugan kugan.vivekanandara...@linaro.org writes: * cse.c (cse_insn): Fix missing check for STRICT_LOW_PART and minor clean up. This breaks gcc.target/m68k/tls-ie-xgot.c scan-assembler jsr __m68k_read_tp gcc.target/m68k/tls-ie.c scan-assembler jsr __m68k_read_tp gcc.target/m68k/tls-le-xtls.c scan-assembler jsr __m68k_read_tp gcc.target/m68k/tls-le.c scan-assembler jsr __m68k_read_tp Andreas. Sorry for the breakage. My patch to add ZERO_EXTRACT unfortunately restricts the behaviour in one other case. That is, even when REG_EQUAL note and src are same, we were setting src_eqv to src when it is STRICT_LOW_PART. Not sure why but restored the old behaviour. I could reproduce this issue by inspecting the generated asm and made sure that it is fixed. However I could not run regression for m68k (Sorry I don’t have access to the set-up). I bootstrapped and regression tested on x86_64-linux-gnu and arm-none-linux-gnu with no new regressions. Thanks, Kugan gcc/ChangeLog: 2015-07-27 Kugan Vivekanandarajah kug...@linaro.org * cse.c (cse_insn): Restoring old behaviour for src_eqv when dest and value in the REG_EQUAL are same and dest is STRICT_LOW_PART. OK. I verified there were no regressions on m68k.exp with a cross compiler and that the tests referenced by Andreas indeed passes with the patch installed. Thanks, jeff
Re: PR middle-end/16351 NULL dereference warnings
PING: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01860.html Actually, the xfailed test was because the function folded to nothing and the offending code was removed without warning. Fixed in the attached version. Same changelog. On 22 July 2015 at 17:52, Manuel López-Ibáñez lopeziba...@gmail.com wrote: I took the patch in https://gcc.gnu.org/ml/gcc-patches/2014-01/msg01715.html and removed the Wnull-attribute part, since most of it can be done from the FE as shown in https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01857.html and also to make the patch smaller and easier to review. I also fixed the comments by Florian here: https://gcc.gnu.org/ml/gcc-patches/2014-02/msg00149.html and added more tests from the PR and its duplicates (one xfailed, I'll open a new PR about it). Futher cleanups may be possible (infer_nonnull_range_by_attribute checks flag_delete_null_pointer_checks, which seems weird to me but it matches the existing behavior of infer_nonnull_range). I added this to Wall to get as much testing as possible, we can always move it to Wextra or disable it by default just before the release if it turns out to be too noisy. Boostrapped and regression tested on x86_64-linux-gnu. OK? gcc/ChangeLog: 2015-07-22 Manuel López-Ibáñez m...@gcc.gnu.org Jeff Law l...@redhat.com PR c/16351 * doc/invoke.texi (Wnull-dereference): New. * tree-vrp.c (infer_value_range): Update call to infer_nonnull_range. * gimple-ssa-isolate-paths.c (find_implicit_erroneous_behaviour): Warn for potential NULL dereferences. (find_explicit_erroneous_behaviour): Warn for NULL dereferences. * ubsan.c (instrument_nonnull_arg): Call infer_nonnull_range_by_attribute. (instrument_nonnull_return): Likewise. * common.opt (Wnull-dereference); New. * gimple.c (infer_nonnull_range): Remove bool arguments. (infer_nonnull_range_by_dereference): New. (infer_nonnull_range_by_attribute): New. * gimple.h: Update declarations. gcc/testsuite/ChangeLog: 2015-07-22 Manuel López-Ibáñez m...@gcc.gnu.org Jeff Law l...@redhat.com PR c/16351 * gcc.dg/tree-ssa/isolate-2.c: Close comment. * gcc.dg/tree-ssa/isolate-4.c: Likewise. * gcc.dg/tree-ssa/wnull-dereference.c: New test. * gcc.dg/tree-ssa/isolate-1.c: Test warnings with -Wnull-dereference. * gcc.dg/tree-ssa/isolate-3.c: Likewise. * gcc.dg/tree-ssa/isolate-5.c: Likewise. * c-c++-common/wnonnull-1.c: New test. Index: gcc/tree-vrp.c === --- gcc/tree-vrp.c (revision 225868) +++ gcc/tree-vrp.c (working copy) @@ -4936,11 +4936,11 @@ infer_value_range (gimple stmt, tree op, break; if (e == NULL) return false; } - if (infer_nonnull_range (stmt, op, true, true)) + if (infer_nonnull_range (stmt, op)) { *val_p = build_int_cst (TREE_TYPE (op), 0); *comp_code_p = NE_EXPR; return true; } Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 225868) +++ gcc/doc/invoke.texi (working copy) @@ -258,10 +258,11 @@ Objective-C and Objective-C++ Dialects}. -Wframe-larger-than=@var{len} -Wno-free-nonheap-object -Wjump-misses-init @gol -Wignored-qualifiers -Wincompatible-pointer-types @gol -Wimplicit -Wimplicit-function-declaration -Wimplicit-int @gol -Winit-self -Winline -Wno-int-conversion @gol -Wno-int-to-pointer-cast -Wno-invalid-offsetof @gol +-Wnull-dereference @gol -Winvalid-pch -Wlarger-than=@var{len} -Wunsafe-loop-optimizations @gol -Wlogical-op -Wlogical-not-parentheses -Wlong-long @gol -Wmain -Wmaybe-uninitialized -Wmemset-transposed-args @gol -Wmisleading-indentation -Wmissing-braces @gol -Wmissing-field-initializers -Wmissing-include-dirs @gol @@ -4130,10 +4133,20 @@ All the above @option{-Wunused} options In order to get a warning about an unused function parameter, you must either specify @option{-Wextra -Wunused} (note that @option{-Wall} implies @option{-Wunused}), or separately specify @option{-Wunused-parameter}. +@item -Wnull-dereference +@opindex Wnull-dereference +@opindex Wno-null-dereference +Warn if the compiler detects paths which trigger erroneous or +undefined behaviour due to dereferencing a NULL pointer. This option +is only active when @option{-fdelete-null-pointer-checks} is active, +which is enabled by optimizations in most targets. The precision of +the warnings depends on the optimization options used. This option is +enabled by @option{-Wall}. + @item -Wuninitialized @opindex Wuninitialized @opindex Wno-uninitialized Warn if an automatic variable is used without first being initialized or if a variable may be clobbered by a @code{setjmp} call. In C++, Index: gcc/testsuite/gcc.dg/tree-ssa/isolate-2.c === ---
Go patch committed: Don't make erroneous type descriptors
This patch by Chris Manghane changes the Go frontend to not make type descriptor for named types if we have seen errors during the compilation. This avoids a compiler crash, and fixes https://golang.org/issue/11560 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian
[committed, PATCH]: Add a testcase for PR tree-optimization/67077
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67077 is fixed by https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=226089 I checked in this patch to add a testcase for PR tree-optimization/67077. Index: ChangeLog === --- ChangeLog (revision 226516) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2015-08-03 H.J. Lu hongjiu...@intel.com + + PR tree-optimization/67077 + * gcc.dg/pr67077.c: New test. + 2015-08-03 Jeff Law l...@redhat.com PR middle-end/66314 Index: gcc.dg/pr67077.c === --- gcc.dg/pr67077.c(revision 0) +++ gcc.dg/pr67077.c(working copy) @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -Wall -O2 } */ + +unsigned char buffer[8]; +unsigned long +foo (void) +{ + unsigned long i; + i = buffer[0]; + if (i = 8) +return i - 7; + i++; + while (i 8) +{ + if (buffer[i-1] != 0) +return 0; + i--; +} + return 1; +}
Re: [RFC] [Patch]: Try and vectorize with shift for mult expr with power 2 integer constant.
On 08/02/2015 05:03 AM, Kumar, Venkataramanan wrote: Hi Jakub, Thank you for reviewing the patch. I have incorporated your comments in the attached patch. Note Jakub is on PTO for the next 3 weeks. vectorize_mults_via_shift.diff.txt diff --git a/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c b/gcc/testsuite/gcc.dg/vect/vect-mult-patterns.c Jakub would probably like more testcases :-) The most obvious thing to test would be other shift factors. A negative test to verify we don't try to turn a multiply by non-constant or multiply by a constant that is not a power of 2 into shifts. [ Would it make sense, for example, to turn a multiply by 3 into a shift-add sequence? As Jakub said, choose_mult_variant can be your friend. ] @@ -2147,6 +2152,140 @@ vect_recog_vector_vector_shift_pattern (vecgimple *stmts, return pattern_stmt; } +/* Detect multiplication by constant which are postive or negatives of power 2, s/postive/positive/ Jeff
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On 03/08/15 16:53, Kyrill Tkachov wrote: On 03/08/15 16:45, Uros Bizjak wrote: On Mon, Aug 3, 2015 at 5:37 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I don't think this is a good idea. In the expander, there is already quite some target-dependent code that goes great length to utilize sbb insn as much as possible, before cmove is used. IMO, as far as x86 is concerned, the best solution would be to revert the change. ix86_expand_int_movcc already does some tricks from your patch in a target-efficient way. Generic change that was introduced by your patch now interferes with this expansion. Well, technically the transformation was already there, it was just never reached for an x86 compilation because noce_try_cmove was tried in front of it and used a target-specific expansion. In any case, how's this proposal? The transformation noce_try_store_flag_constants /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ Is a catch-all-immediates transformation in noce_try_store_flag_constants. What if we moved it to noce_try_cmove and performed it only if the target-specific conditional move expansion there failed? That way we can try the x86_64-specific sequence first and still give the opportunity to noce_try_store_flag_constants to perform the transformations that can benefit targets that don't have highly specific conditional move expanders. Yes, let's try this approach. As was found out, some targets (e.g. x86) hide lots of different target-dependent expansion strategies into movcc expander. Perhaps this fact should be documented in the comment in the generic code? Ok, I'll work on that approach and add a comment. I'm testing a patch that fix the testcases on x86_64 and does not harm codegen on aarch64. Feel free to file a PR and assign it to me. Thanks, Kyrill Thanks, Kyrill Uros.
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On Mon, Aug 3, 2015 at 7:20 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Looking at the x86 movcc expansion code (ix86_expand_int_movcc) I don't think this is a good idea. In the expander, there is already quite some target-dependent code that goes great length to utilize sbb insn as much as possible, before cmove is used. IMO, as far as x86 is concerned, the best solution would be to revert the change. ix86_expand_int_movcc already does some tricks from your patch in a target-efficient way. Generic change that was introduced by your patch now interferes with this expansion. Well, technically the transformation was already there, it was just never reached for an x86 compilation because noce_try_cmove was tried in front of it and used a target-specific expansion. In any case, how's this proposal? The transformation noce_try_store_flag_constants /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ Is a catch-all-immediates transformation in noce_try_store_flag_constants. What if we moved it to noce_try_cmove and performed it only if the target-specific conditional move expansion there failed? That way we can try the x86_64-specific sequence first and still give the opportunity to noce_try_store_flag_constants to perform the transformations that can benefit targets that don't have highly specific conditional move expanders. Yes, let's try this approach. As was found out, some targets (e.g. x86) hide lots of different target-dependent expansion strategies into movcc expander. Perhaps this fact should be documented in the comment in the generic code? Ok, I'll work on that approach and add a comment. I'm testing a patch that fix the testcases on x86_64 and does not harm codegen on aarch64. Feel free to file a PR and assign it to me. PR67103 [1] [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67103 Thanks, Uros.
Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.
On 07/16/2015 05:03 AM, Martin Liška wrote: So a general question. We're passing in STRICT to several routines, which is fine. But then we're also checking M_TAIL_MERGE_MODE. What's the difference between the two? Can they be unified? Hello. I would say that STRICT is a bit generic mechanism that was introduced some time before. It's e.g. used for checking of THIS arguments for methods and make checking more sensitive in situations that are somehow special. The newly added state is orthogonal to the previous one. Fair enough. There's some cases where we've documented STRICT, and others where we haven't. If STRICT flag is true, version must match strictly Appears as documentation for STRICT. It seems like it'd be better to describe what strictly means here. Elsewhere we have comments like: Be strict in case of tail-merge optimization Which tends to confuse things a bit. Perhaps something more like: In the case of tail merge optimization, XYZ must match It seems like a nit, but to someone else reading the code I don't think the distinctions are all that clear right now, so if we can improve things, it'd be good. I didn't find this comment particularly useful in understanding what this function does. AFAICT the function looks as the uses of the LHS of STMT and verifies they're all in the same block as STMT, right? It also verifies that the none of the operands within STMT are part of SSA_NAMES_SET. What role do those properties play in the meaning of local definition? I tried to explain it more deeply what's the purpose of this function. Thanks. It's much clearer now. We've actually got similar code in a couple places (ifcvt). I wonder if we could unify those implementations as a follow-up cleanup. @@ -1037,4 +1205,60 @@ func_checker::compare_gimple_asm (const gasm *g1, const gasm *g2) return true; } +void +ssa_names_set::build (basic_block bb) Needs a function comment. What are the important names we're collecting here? Is a single forward and backward pass really sufficient to find all the important names? In the backward pass, do you have to consider things like ASMs? I guess it's difficult to understand what you need to look at because it's not entirely clear the set of SSA_NAMEs you're building. These questions and lack of function comment don't seem to have been addressed yet. + +using namespace ipa_icf; +using namespace ipa_icf_gimple; Is this wise? Does it significantly help with conciseness within the tail merging pass where it wants things ipa-icf and ipa-icf-gimple? I'm not objecting at this point, I want to hear your thoughts. I must agree with you, as I've started using both namespaces in tree-tail merge pass, it makes not much sense anymore. I suggest to come up with a namespace that will encapsulate 'identical code folding'-related stuff. What about: namespace icf Sure if it helps and is clean. GCC does not have a policy against using namespace, but other codebases do (google for example) as it does introduce some long term maintenance issues. So when I see a using namespace of that nature, I'm naturally going to question if it really helps in a significant way. If it does, then I won't object. If it's not helping in a significant way, then I'm likely to object. I think the updated version is fine WRT namespaces. ? /* Describes a group of bbs with the same successors. The successor bbs are cached in succs, and the successor edge flags are cached in succ_flags. @@ -1220,17 +1231,48 @@ gsi_advance_bw_nondebug_nonlocal (gimple_stmt_iterator *gsi, tree *vuse, } } +static bool +check_edges_correspondence (basic_block bb1, basic_block bb2) Needs a function comment. Still needs function comment. I think we're pretty close here. Most of the issues are comments -- I still haven't looked *real* close at ssa_names_set::build. With a function comment I ought to be able to look at it more closely in the next (and hopefully final) iteration. Jeff
Re: [PR66726] Factor conversion out of COND_EXPR
On 07/26/2015 07:05 PM, Kugan wrote: I thought that when !gimple_assign_cast_p (stmt), RHS will always boolean. I have now added this check in the attached patch. Thanks. I also noticed that in maybe_optimize_range_tests, GIMPLE_COND can have non compatible types when new_op is updated (boolean types coming from tcc_compare results) and hence need to be converted. Changed that as well. Did you find this by examination or with some testcode? If the latter, including a testcase for this issue would be appreciated. Bootstrapped and regression tested on x86-64-none-linux-gnu with no new regressions. Is this OK for trunk? OK with an updated changelog. Jeff
Re: [PATCH] Simplify vector compare-not-select sequence
Bill Schmidt wschm...@linux.vnet.ibm.com writes: Index: gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c === --- gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c(revision 226505) +++ gcc/testsuite/gcc.target/powerpc/vec-cmp-sel.c(working copy) @@ -1,6 +1,7 @@ /* { dg-do compile { target powerpc64*-*-* } } */ If you want -m64 you need dg-require-effective-target lp64, but I see no need for that. /* { dg-require-effective-target powerpc_p8vector_ok } */ -/* { dg-options -maltivec -O2 } */ +/* { dg-require-effective-target powerpc_vsx_ok } */ +/* { dg-options -maltivec -O2 -mvsx -mpower8-vector } */ /* { dg-final { scan-assembler vcmpgtsd } } */ /* { dg-final { scan-assembler-not xxlnor } } */ Looks good. PASS: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors) PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler vcmpgtsd PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler-not xxlnor PASS: gcc.target/powerpc/vec-cmp-sel.c (test for excess errors) PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler vcmpgtsd PASS: gcc.target/powerpc/vec-cmp-sel.c scan-assembler-not xxlnor Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different.
Re: [PATCH 1/3] tree-ssa-tail-merge: add IPA ICF infrastructure.
On 07/20/2015 01:52 AM, Martin Liška wrote: Due to changes dump file, it will be necessary to amend scanning of dump files. Those changes are fine and can be installed once the prerequisites have been approved/installed. Thanks, Jeff
Re: [PATCH 1/2] gcc parts of timer refactoring
On 07/31/2015 03:45 PM, David Malcolm wrote: In r223092 (aka dd4d567f4b6b498242097c41d63666bdae320ac1) I moved the state of timevar.c from being global data into a class timer with a global instance g_timer. This followup patch generalizes the timing within toplev so that an external timer instance can (optionally) be passed in; this is used in patch 2 to add an API to libgccjit for client code: gcc_jit_timer. The gcc_jit_timer API allows client code to create arbitrary new timing items (beyond the hardcoded list in timevar.def), giving them names, so that client code can time whatever things are relevant to it; this patch adds the support necessary to timevar.[ch] for this. The patch also generalizes timevar.h's auto_timevar to avoid it depending on the g_timer global (which is only set during the lifetime of toplev. We can use this to add TV_JIT_ACQUIRING_MUTEX, and thus measure wallclock time spent waiting for the JIT mutex. Generalize timevar.c to add support for timing jit client code, by: * adding support for timing client items with arbitrary client-supplied names (e.g. compiling code, running code etc). * supporting having a timer instance created externally and passed in to toplev, allowing clients to control where toplev's timing information is accumulated * add a couple of jit-specific timevars This goes with the next patch; I've split them up for ease of review. OK for trunk? gcc/ChangeLog: * main.c (main): Pass in NULL for toplev's external_timer. * timevar.c: Include coretypes.h, hash-map.h, vec.h. (class timer::named_items): New. (timer::named_items::named_items): New. (timer::named_items::~named_items): New. (timer::named_items::push): New. (timer::named_items::pop): New. (timer::named_items::print): New. (timer::timer): Initialize field m_jit_client_items. (timer::~timer): New. (timer::push): Move bulk of implementation to... (timer::push_internal): ...here. New function. (timer::pop): Move bulk of implementation to... (timer::pop_internal): ...here. New function. (timer::push_client_item): New. (timer::pop_client_item): New. (timer::print_row): New function, taken from timer::print. (timer::print): Print GCC items header if we also have client items. Move row-printing to timer::print_row. Print any client items. (timer::get_topmost_item_name): New method. * timevar.def (TV_JIT_ACQUIRING_MUTEX): New. (TV_JIT_CLIENT_CODE): New. * timevar.h (timer::push_client_item): New declaration. (timer::pop_client_item): New declaration. (timer::get_topmost_item_name): New method. (timer::push_internal): New declaration. (timer::pop_internal): New declaration. (timer::print_row): New declaration. (timer::named_items): New declaration. (timer::m_jit_client_items): New field. (timer): Add friend class named_items. (auto_timevar::auto_timevar): Add timer param. (auto_timevar::~auto_timevar): Use field m_timer. (auto_timevar::m_timer): New field. * toplev.c (initialize_rtl): Add g_timer as param when constructing auto_timevar instance. (toplev::toplev): Add external_timer param, and use it to initialize the g_timer global if non-NULL. (toplev::~toplev): If this created g_timer, delete it. * toplev.h (toplev::toplev): Replace use_TV_TOTAL bool param with external_timer timer *. --- gcc/main.c | 2 +- gcc/timevar.c | 247 gcc/timevar.def | 2 + gcc/timevar.h | 35 +++- gcc/toplev.c| 18 +++-- gcc/toplev.h| 4 +- 6 files changed, 261 insertions(+), 47 deletions(-) diff --git a/gcc/main.c b/gcc/main.c index 79baf0d..bbd8b67 100644 --- a/gcc/main.c +++ b/gcc/main.c @@ -33,7 +33,7 @@ int main (int argc, char **argv); int main (int argc, char **argv) { - toplev toplev (true, /* use_TV_TOTAL */ + toplev toplev (NULL, /* external_timer */ true /* init_signals */); return toplev.main (argc, argv); diff --git a/gcc/timevar.c b/gcc/timevar.c index 76ad22a..eebb1dc 100644 --- a/gcc/timevar.c +++ b/gcc/timevar.c @@ -20,7 +20,10 @@ along with GCC; see the file COPYING3. If not see #include config.h #include system.h +#include coretypes.h #include timevar.h +#include hash-map.h +#include vec.h You shouldn't need to include hash-map.h and vec.h, as those get pulled in via core-types.h including hash-table.h now. + for (iter = m_stack; iter; iter=next) Nit, whitespace around the '=' in iter=next assignment. +{ + next = iter-next; + free (iter); +} + for (iter = m_unused_stack_instances; iter; iter=next) Here too. OK with those nits fixed. jeff
Re: Go patch committed: Fix error reporting for invalid builtin calls
On Mon, Aug 3, 2015 at 2:10 AM, Andreas Schwab sch...@suse.de wrote: ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack': ../../../libgo/runtime/mprof.goc:408:5: error: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Werror=frame-address] sp = runtime_getcallersp(b); Fixed by this patch by Chris Manghane. The call was not actually necessary. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. This fixes PR 67101. Ian Index: gcc/go/gofrontend/MERGE === --- gcc/go/gofrontend/MERGE (revision 226510) +++ gcc/go/gofrontend/MERGE (working copy) @@ -1,4 +1,4 @@ -2bf7c643a1d2f8503070c8e6cb87852026e32400 +d5aad2f400a0f21724e33e4ae48e1583ed8b1a87 The first line of this file holds the git revision number of the last merge done from the gofrontend repository. Index: libgo/runtime/mprof.goc === --- libgo/runtime/mprof.goc (revision 226510) +++ libgo/runtime/mprof.goc (working copy) @@ -402,10 +402,9 @@ func ThreadCreateProfile(p Slice) (n int } func Stack(b Slice, all bool) (n int) { - byte *pc, *sp; + byte *pc; bool enablegc; - sp = runtime_getcallersp(b); pc = (byte*)(uintptr)runtime_getcallerpc(b); if(all) { @@ -423,7 +422,6 @@ func Stack(b Slice, all bool) (n int) { g-writebuf = (byte*)b.__values; g-writenbuf = b.__count; USED(pc); - USED(sp); runtime_goroutineheader(g); runtime_traceback(); runtime_printcreatedby(g);
Re: Go patch committed: Fix error reporting for invalid builtin calls
On Mon, Aug 3, 2015 at 11:10 AM, Andreas Schwab sch...@suse.de wrote: Ian Lance Taylor i...@golang.org writes: This patch from Chris Manghane fixes the Go frontend error reporting for invalid builtin calls, by not losing track of whether the call is erroneous. This fixes https://golang.org/issue/11561. Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. ../../../libgo/runtime/mprof.goc: In function 'runtime_Stack': ../../../libgo/runtime/mprof.goc:408:5: error: calling '__builtin_frame_address' with a nonzero argument is unsafe [-Werror=frame-address] sp = runtime_getcallersp(b); ^ Introduced by Martin Sebors patch 2015-08-02 Martin Sebor mse...@redhat.com * c-family/c.opt (-Wframe-address): New warning option. * doc/invoke.texi (Wframe-address): Document it. * doc/extend.texi (__builtin_frame_address, __builtin_return_address): Clarify possible effects of calling the functions with non-zero arguments and mention -Wframe-address. * builtins.c (expand_builtin_frame_address): Handle -Wframe-address. Richard. Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [gomp4] expand acc_on_device earlier
Hi! On Mon, 3 Aug 2015 08:00:36 -0400, Nathan Sidwell nat...@codesourcery.com wrote: On 08/03/15 07:37, Thomas Schwinge wrote: On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote: I've committed this to gomp4 branch. It expands the acc_on_device builtin earlier in the new oacc_xform pass. This will allow more optimization earlier on. Thanks! The existing expansion point now only needs to deal with the host-side case. Actually, no; committed to gomp-4_0-branch in r226498: Please clarify. This suggests a logic fault elsewhere. Hmm, I had the following in the commit log: When building an x86_64-intelmicemul-linux-gnu offloading compiler with r226484, the __builtin_acc_on_device usage in libgomp/oacc-init.c:acc_on_device makes it blow up: [...]/source-gcc/libgomp/oacc-init.c: In function 'acc_on_device': [...]/source-gcc/libgomp/oacc-init.c:643:10: internal compiler error: in expand_builtin_acc_on_device, at builtins.c:5891 return __builtin_acc_on_device (dev); ^ 0x6d9632 expand_builtin_acc_on_device [...]/source-gcc/gcc/builtins.c:5891 In case that has been too terse (sorry), here's a bit more detail. GCC can be configured to use nvptx-none as well as x86_64-intelmicemul-linux-gnu offloading compilers. In offloading compilers' configurations, ACCEL_COMPILER is defined. The nvptx-none offloading compilers' libgcc/libgomp build will use the specific libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device implementation, but the x86_64-intelmicemul-linux-gnu one will use the default libgomp/oacc-init.c:acc_on_device implementation (which is implemented in terms of __builtin_acc_on_device). Thus, in the latter case, we'd run into this gcc_unreachable during the x86_64-intelmicemul-linux-gnu offloading compiler's libgomp build: /* Expand OpenACC acc_on_device. This is expanded in the openacc transform pass, but if the user has this outside of an offloaded region, we'll find it here. In that case we must be host or none. */ static rtx expand_builtin_acc_on_device (tree exp, rtx target) { #ifdef ACCEL_COMPILER gcc_unreachable (); #else gcc_assert (!get_oacc_fn_attrib (current_function_decl)); In r226498, I then restored the earlier logic of gcc/builtins.c:expand_builtin_acc_on_device, and could then also simplify libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device to the very same generic/simple implementation in terms of __builtin_acc_on_device. Does that clarify? Grüße, Thomas pgp1H8R9bv5JO.pgp Description: PGP signature
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
On 03/08/15 13:33, Uros Bizjak wrote: Hello! 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. This patch regressed following tests on x86_64: FAIL: gcc.target/i386/cmov2.c scan-assembler sbb FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3] Sorry for that :( I could have sworn they passed for me during my run... While cmov3 case is questionable by itself in light of PR56309 [1], the cnov2 case regressed from: cmpl%edi, %esi sbbl%eax, %eax andl$-10, %eax addl$5, %eax ret to: xorl%eax, %eax cmpl%esi, %edi setbe %al negl%eax andl$10, %eax subl$5, %eax ret Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated anymore. Non-QImode setcc instructions are problematic on x86, since they need to be zero-extended to their full length. The transformation at fault in noce_try_store_flag_constants is: /* if (test) x = a; else x = b; = x = (-(test != 0) (b - a)) + a; */ I suppose we want to not perform this transformation on x86. The question is, on what characteristic of the target do we want to gate this on? [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 Uros.
Re: [PATCH][AArch64][10/14] Implement target pragmas
On Mon, Aug 03, 2015 at 10:36:17AM +0100, Kyrill Tkachov wrote: And here is a rebased version to resolve a conflict after Alan's patches went in. OK with the nits below fixed. 2015-08-03 Kyrylo Tkachov kyrylo.tkac...@arm.com * config.gcc (aarch64*-*-*): Specify c_target_objs and cxx_target_objs. * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. (TARGET_CPU_CPP_BUILTINS): Redefine to call aarch64_cpu_cpp_builtins. * config/aarch64/aarch64.c (aarch64_override_options_internal): Remove static keyword. (aarch64_reset_previous_fndecl): New function. (aarch64_handle_attr_isa_flags): Handle +nothing in the beginning of the string. * config/aarch64/aarch64-c.c: New file. * config/aarch64/arm_acle.h: Add pragma +crc+nofp at the top. Push and pop options at beginning and end. Remove ifdef __ARM_FEATURE_CRC32. * config/aarch64/arm_neon.h: Remove #ifdef check on __ARM_NEON. Add pragma arch=armv8-a+simd and +crypto where appropriate. * config/aarch64/t-aarch64 (aarch64-c.o): New rule. * config/aarch64/aarch64-protos.h (aarch64_cpu_cpp_builtins): Define prototype. (aarch64_register_pragmas): Likewise. (aarch64_reset_previous_fndecl): Likewise. (aarch64_process_target_attr): Likewise. (aarch64_override_options_internal): Likewise. 2015-08-03 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/arm_neon-nosimd-error.c: Delete. +/* Define the macros that we always expect to have on AArch64. */ + +static void +aarch64_define_unconditional_macros (cpp_reader *pfile) +{ + builtin_define (__aarch64__); + builtin_define (__ARM_64BIT_STATE); + + builtin_define (__ARM_ARCH_ISA_A64); + builtin_define_with_int_value (__ARM_ALIGN_MAX_PWR, 28); + builtin_define_with_int_value (__ARM_ALIGN_MAX_STACK_PWR, 16); + + /* __ARM_ARCH_8A is not mandated by ACLE but we define it unconditionally + as interoperability with the same arm macro. */ + builtin_define (__ARM_ARCH_8A); + + builtin_define_with_int_value (__ARM_ARCH_PROFILE, 'A'); + builtin_define (__ARM_FEATURE_CLZ); + builtin_define (__ARM_FEATURE_IDIV); + builtin_define (__ARM_FEATURE_UNALIGNED); + builtin_define (__ARM_PCS_AAPCS64); + builtin_define_with_int_value (__ARM_SIZEOF_WCHAR_T, WCHAR_TYPE_SIZE / 8); + +} Extra newline. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 3a5482d..6844c90 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -249,6 +249,7 @@ enum aarch64_symbol_type aarch64_classify_symbolic_expression (rtx, enum aarch64_symbol_context); bool aarch64_const_vec_all_same_int_p (rtx, HOST_WIDE_INT); bool aarch64_constant_address_p (rtx); +extern void aarch64_cpu_cpp_builtins (cpp_reader *); No need for this extern - and keep this in alphabetical order (first by return type, then by name). +#pragma GCC push_options +#pragma GCC target (+crypto) Keep things simple to manage and understand by inspection, and make these +nothing+crypto. @@ -21067,7 +21065,8 @@ vrsrad_n_u64 (uint64_t __a, uint64_t __b, const int __c) return __builtin_aarch64_ursra_ndi_uuus (__a, __b, __c); } -#ifdef __ARM_FEATURE_CRYPTO +#pragma GCC push_options +#pragma GCC target (+crypto) /* vsha1 */ Likewise here. Thanks, James
Re: offload data version number
Hi! On Sat, 1 Aug 2015 20:20:49 -0400, Nathan Sidwell nat...@acm.org wrote: this is the gomp4 committed version. --- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (revision 226462) +++ liboffloadmic/plugin/libgomp-plugin-intelmic.cpp (working copy) @@ -327,12 +327,26 @@ offload_image (const void *target_image) free (image); } +/* Return the libgomp version number we're compatible with. There is + no requirement for cross-version compatibility. */ + +extern C unsigned +GOMP_OFFLOAD_version (void) +{ + return GOMP_VERSION; +} + extern C int -GOMP_OFFLOAD_load_image (int device, const void *target_image, - addr_pair **result) +GOMP_OFFLOAD_load_image (int device, const unsigned version, + void *target_image, addr_pair **result) { TRACE ((device = %d, target_image = %p), device, target_image); + if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) +GOMP_PLUGIN_fatal (Offload data incompatible with intelmic plugin + (expected %u, received %u), +GOMP_VERSION_INTEL_MIC, GOMP_VERSION_DEV (version)); + /* If target_image is already present in address_table, then there is no need to offload it. */ if (address_table-count (target_image) == 0) @@ -353,8 +367,12 @@ GOMP_OFFLOAD_load_image (int device, con } extern C void -GOMP_OFFLOAD_unload_image (int device, const void *target_image) +GOMP_OFFLOAD_unload_image (int device, unsigned version, +const void *target_image) { + if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) +return; + TRACE ((device = %d, target_image = %p), device, target_image); /* TODO: Currently liboffloadmic doesn't support __offload_unregister_image Committed to gomp-4_0-branch in r226497: commit 4e0158f41a00d6c4d09ca8a48eb63832abdd2f84 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Aug 3 11:14:24 2015 + Fix intelmic libgomp plugin build ... which got broken in r226469: [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In function 'unsigned int GOMP_OFFLOAD_version()': [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:336:10: error: 'GOMP_VERSION' was not declared in this scope return GOMP_VERSION; ^ [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In function 'int GOMP_OFFLOAD_load_image(int, unsigned int, void*, addr_pair**)': [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:345:32: error: 'GOMP_VERSION_DEV' was not declared in this scope if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) ^ [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:345:36: error: 'GOMP_VERSION_INTEL_MIC' was not declared in this scope if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) ^ [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp: In function 'void GOMP_OFFLOAD_unload_image(int, unsigned int, const void*)': [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:373:32: error: 'GOMP_VERSION_DEV' was not declared in this scope if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) ^ [...]/source-gcc/liboffloadmic/plugin/libgomp-plugin-intelmic.cpp:373:36: error: 'GOMP_VERSION_INTEL_MIC' was not declared in this scope if (GOMP_VERSION_DEV (version) GOMP_VERSION_INTEL_MIC) ^ make[6]: *** [libgomp_plugin_intelmic_la-libgomp-plugin-intelmic.lo] Error 1 liboffloadmic/ * plugin/Makefile.am (include_src_dir): Set. [PLUGIN_HOST] (libgomp_plugin_intelmic_la_CPPFLAGS): Use it. * plugin/Makefile.in: Regenerate. * plugin/libgomp-plugin-intelmic.cpp: Include gomp-constants.h. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226497 138bc75d-0d04-0410-961f-82ee72b054a4 --- liboffloadmic/ChangeLog.gomp |7 +++ liboffloadmic/plugin/Makefile.am |3 ++- liboffloadmic/plugin/Makefile.in |3 ++- liboffloadmic/plugin/libgomp-plugin-intelmic.cpp |3 ++- 4 files changed, 13 insertions(+), 3 deletions(-) diff --git liboffloadmic/ChangeLog.gomp liboffloadmic/ChangeLog.gomp index 93d1e02..adb9e05 100644 --- liboffloadmic/ChangeLog.gomp +++ liboffloadmic/ChangeLog.gomp @@ -1,3 +1,10 @@ +2015-08-03 Thomas Schwinge tho...@codesourcery.com + + * plugin/Makefile.am (include_src_dir): Set. + [PLUGIN_HOST] (libgomp_plugin_intelmic_la_CPPFLAGS): Use it. + * plugin/Makefile.in: Regenerate. + * plugin/libgomp-plugin-intelmic.cpp: Include gomp-constants.h. + 2015-08-01
Re: [PATCH][AArch64][11/14] Re-layout SIMD builtin types on builtin expansion
On Fri, Jul 24, 2015 at 09:38:34AM +0100, Kyrill Tkachov wrote: Thanks, here's an updated version. 2015-07-24 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_option_valid_attribute_p): Initialize simd builtins if TARGET_SIMD. * config/aarch64/aarch64-builtins.c (aarch64_init_simd_builtins): Make sure that the builtins are initialized only once no matter how many times the function is called. (aarch64_init_builtins): Unconditionally initialize crc builtins. (aarch64_relayout_simd_param): New function. (aarch64_simd_expand_args): Use above during argument expansion. * config/aarch64/aarch64-c.c (aarch64_pragma_target_parse): Initialize simd builtins if TARGET_SIMD. * config/aarch64/aarch64-protos.h (aarch64_init_simd_builtins): New prototype. (aarch64_relayout_simd_types): Likewise. 2015-07-24 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/target_attr_crypto_ice_1.c: New test. OK with a minor fix. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 6844c90..99fd80e 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -255,6 +255,7 @@ bool aarch64_float_const_zero_rtx_p (rtx); bool aarch64_function_arg_regno_p (unsigned); bool aarch64_gen_movmemqi (rtx *); bool aarch64_gimple_fold_builtin (gimple_stmt_iterator *); +void aarch64_init_simd_builtins (void); bool aarch64_is_extend_from_extract (machine_mode, rtx, rtx); bool aarch64_is_long_call_p (rtx); bool aarch64_label_mentioned_p (rtx); These should be first ordered by return type, then alphabetical order. @@ -325,6 +326,7 @@ void aarch64_print_operand (FILE *, rtx, char); void aarch64_print_operand_address (FILE *, rtx); void aarch64_emit_call_insn (rtx); void aarch64_register_pragmas (void); +void aarch64_relayout_simd_types (void); void aarch64_reset_previous_fndecl (void); /* Initialize builtins for SIMD intrinsics. */ Thanks, James
Re: [PATCH][AArch64][12/14] Target attributes and target pragmas tests
On Fri, Jul 24, 2015 at 09:40:28AM +0100, Kyrill Tkachov wrote: On 21/07/15 18:14, James Greenhalgh wrote: On Thu, Jul 16, 2015 at 04:21:15PM +0100, Kyrill Tkachov wrote: Hi all, These are the tests for target attributes and pragmas. I've tried to test for the inlining rules, some of the possible errors and the preprocessor macros changed from target pragmas. Ok for trunk? Mechanical changes in the pragma tests for the sake of grammar! s/defined but shouldn't/is defined but should not be/ s/not defined but should/is not defined but should be/ Note that some of the errors have different text, so you'll have to run through by hand and check these are consistent. It would be good to hand some of these target attribute tests off to the assembler to make sure we are also putting out appropriate directives in our output. Perhaps assemble is the more appropriate dg-do directive? Some more nits below (mostly missing comments on testcases). Thanks, here's an updated version. I've also added a test for the +nothing architectural feature attribute introduced in patch 10/14 and renamed the tests to use underscores in their names. How's this? Run s/is is/is/ To fix typos like this: +#pragma GCC push_options +#pragma GCC target (arch=armv8-a+fp+nosimd) +#ifndef __ARM_FP +#error __ARM_FP is is not defined but should be! +#endif diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_1.c b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c new file mode 100644 index 000..72d0838 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_1.c @@ -0,0 +1,12 @@ +/* { dg-do assemble } */ +/* { dg-options -O2 -mcpu=thunderx -save-temps } */ + +__attribute__ ((target (cpu=cortex-a72.cortex-a53))) +int +foo (int a) +{ + return a + 1; +} + +/* { dg-final { scan-assembler //.tune cortex-a72.cortex-a53 } } */ +/* { dg-final { scan-assembler-not thunderx } } */ Comment on what we are testing. diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_2.c b/gcc/testsuite/gcc.target/aarch64/target_attr_2.c new file mode 100644 index 000..0cd6866 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_2.c @@ -0,0 +1,39 @@ +/* { dg-do assemble } */ +/* { dg-options -O2 -mcpu=cortex-a57 -ftree-vectorize -fdump-tree-vect-all } */ + +/* The various was to turn off simd availability should s/was/ways/ + turn off vectorization. */ + +__attribute__ ((target (+nosimd))) +int +baz (int *a) +{ + for (int i = 0; i 1024; i++) +a[i] += 5; +} + +__attribute__ ((target (arch=armv8-a+nosimd))) +int +baz2 (int *a) +{ + for (int i = 0; i 1024; i++) +a[i] += 5; +} + +__attribute__ ((target (cpu=cortex-a53+nosimd))) +int +baz3 (int *a) +{ + for (int i = 0; i 1024; i++) +a[i] += 5; +} + +__attribute__ ((target (general-regs-only))) +int +baz4 (int *a) +{ + for (int i = 0; i 1024; i++) +a[i] += 5; +} + +/* { dg-final { scan-tree-dump-not vectorized 1 loops vect } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_8.c b/gcc/testsuite/gcc.target/aarch64/target_attr_8.c new file mode 100644 index 000..3f32569 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_8.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -save-temps } */ + +/* bar has a subset arch of bam. Inlining should be allowed. */ s/arch/set of architectural flags/ Or words to that effect. + +__attribute__ ((target (arch=armv8-a+nocrc))) +int +bar (int a) +{ + return a - 6; +} + +__attribute__ ((target (arch=armv8-a+crc))) +int +bam (int a) +{ + return a - bar (a); +} + + +/* { dg-final { scan-assembler-not bl.*bar } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/target_attr_9.c b/gcc/testsuite/gcc.target/aarch64/target_attr_9.c new file mode 100644 index 000..22bd99b --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/target_attr_9.c @@ -0,0 +1,21 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -save-temps } */ + +/* bar is not a subset arch of bam. Inlining should be rejected. */ Likewise. + +__attribute__ ((target (arch=armv8-a+crc))) +int +bar (int a) +{ + return a - 6; +} + +__attribute__ ((target (arch=armv8-a+nocrc))) +int +bam (int a) +{ + return a - bar (a); +} + + +/* { dg-final { scan-assembler bl.*bar } } */ OK with those changes. Thanks, James
Re: [PATCH] Work around host compiler placement new aliasing bug
Richard Biener wrote: On Wed, Jul 29, 2015 at 3:57 PM, Ulrich Weigand uweig...@de.ibm.com wrote: Hello, this patch is a workaround for the problem discussed here: https://gcc.gnu.org/ml/gcc-patches/2015-07/msg01597.html The problem is that the new pool allocator code relies on C++ aliasing rules related to placement new (basically, that placement new changes the dynamic type of the referenced memory). GCC compilers prior to version 4.3 did not implement this rule correctly (PR 29286). When building current GCC with a host compiler that is affected by this bug, and we build with optimization enabled (this typically only happens when building a cross-compiler), the resulting compiler binary may be miscompiled. The patch below attempts to detect this situation by checking whether the host compiler is a version of GCC prior to 4.3 (but stil accepts the -fno-strict-aliasing flag). If so, -fno-strict-aliasing is added to the flags when building the compiler binary. Tested on i686-linux, and when building an SPU cross-compiler using a gcc 4.1 powerpc64-linux host compiler. OK for mainline? Ok if nobody objects. I've checked this in now. Thanks, Ulrich -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
Re: [gomp4] expand acc_on_device earlier
Hi! On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote: I've committed this to gomp4 branch. It expands the acc_on_device builtin earlier in the new oacc_xform pass. This will allow more optimization earlier on. As far as I remember, the Fortran front end doesn't know about/doesn't know how to use these GCC builtins, so it's not currently able to do such compile-time folding of acc_on_device. libgomp/ * openacc.h (acc_on_device): Take int and explain why. --- libgomp/openacc.h (revision 226462) +++ libgomp/openacc.h (working copy) @@ -78,7 +78,11 @@ void acc_wait_all (void) __GOACC_NOTHROW void acc_wait_all_async (int) __GOACC_NOTHROW; void acc_init (acc_device_t) __GOACC_NOTHROW; void acc_shutdown (acc_device_t) __GOACC_NOTHROW; -int acc_on_device (acc_device_t) __GOACC_NOTHROW; +/* Library function declaration. Although it should take an + acc_device_t argument, that causes problems with matching the + builtin, which takes an int (to avoid declaring the enumeration + inside the compiler). */ +int acc_on_device (int) __GOACC_NOTHROW; void *acc_malloc (size_t) __GOACC_NOTHROW; void acc_free (void *) __GOACC_NOTHROW; /* Some of these would be more correct with const qualifiers, but Hmm, too bad this doesn't resolve the following C++ XFAIL (the test case is fine for C), gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c: /* Have to enable optimizations, as otherwise builtins won't be expanded. */ /* { dg-additional-options -O -fdump-rtl-expand } */ /* Duplicate parts of libgomp/openacc.h, because we can't include it here. */ #if __cplusplus extern C { #endif typedef enum acc_device_t { acc_device_X = 123 } acc_device_t; extern int acc_on_device (int); #if __cplusplus } #endif int f (void) { const acc_device_t dev = acc_device_X; return acc_on_device (dev); } /* With -fopenacc, we're expecting the builtin to be expanded, so no calls. TODO: in C++, the use of enum acc_device_t for acc_on_device's parameter perturbs expansion as a builtin, which expects an int parameter. It's fine when changing acc_device_t to plain int, but that's not necessarily what a user will be doing. { dg-final { scan-rtl-dump-times \\\(call \[^\\n\]* acc_on_device 0 expand { xfail c++ } } } */ Do you happen to have any insight into that? Meanwhile, committed to gomp-4_0-branch in r226501: commit 38d7f116bf18f4a4012fda532aa70d47f3f20652 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Aug 3 11:59:15 2015 + Update acc_on_device test cases ... to match r226484 changes. gcc/testsuite/ * c-c++-common/goacc/acc_on_device-2-off.c (acc_on_device): Change formal parameter to int. * c-c++-common/goacc/acc_on_device-2.c (acc_on_device): Likewise. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226501 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/testsuite/ChangeLog.gomp | 4 gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c | 4 +++- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c | 10 ++ 3 files changed, 13 insertions(+), 5 deletions(-) diff --git gcc/testsuite/ChangeLog.gomp gcc/testsuite/ChangeLog.gomp index 33519ef..8dd9a4c 100644 --- gcc/testsuite/ChangeLog.gomp +++ gcc/testsuite/ChangeLog.gomp @@ -1,5 +1,9 @@ 2015-08-03 Thomas Schwinge tho...@codesourcery.com + * c-c++-common/goacc/acc_on_device-2-off.c (acc_on_device): Change + formal parameter to int. + * c-c++-common/goacc/acc_on_device-2.c (acc_on_device): Likewise. + * c-c++-common/restrict-2.c: Update for new pass_lim. * c-c++-common/restrict-4.c: Same. * g++.dg/tree-ssa/pr33615.C: Same. diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c index 59c72f7..71abe11 100644 --- gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c +++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2-off.c @@ -1,12 +1,14 @@ /* Have to enable optimizations, as otherwise builtins won't be expanded. */ /* { dg-additional-options -O -fdump-rtl-expand -fno-openacc } */ +/* Duplicate parts of libgomp/openacc.h, because we can't include it here. */ + #if __cplusplus extern C { #endif typedef enum acc_device_t { acc_device_X = 123 } acc_device_t; -extern int acc_on_device (acc_device_t); +extern int acc_on_device (int); #if __cplusplus } diff --git gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c index ef622a8..243e562 100644 --- gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c +++ gcc/testsuite/c-c++-common/goacc/acc_on_device-2.c @@ -1,12 +1,14 @@ /* Have to enable optimizations, as otherwise builtins won't be expanded. */ /* {
Re: [PATCH][RTL-ifcvt] Improve conditional select ops on immediates
Hello! 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * ifcvt.c (noce_try_store_flag_constants): Make logic of the case when diff == STORE_FLAG_VALUE or diff == -STORE_FLAG_VALUE more explicit. Prefer to add the flag whenever possible. (noce_process_if_block): Try noce_try_store_flag_constants before noce_try_cmove. 2015-07-30 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/csel_bfx_1.c: New test. * gcc.target/aarch64/csel_imms_inc_1.c: Likewise. This patch regressed following tests on x86_64: FAIL: gcc.target/i386/cmov2.c scan-assembler sbb FAIL: gcc.target/i386/cmov3.c scan-assembler cmov[^3] While cmov3 case is questionable by itself in light of PR56309 [1], the cnov2 case regressed from: cmpl%edi, %esi sbbl%eax, %eax andl$-10, %eax addl$5, %eax ret to: xorl%eax, %eax cmpl%esi, %edi setbe %al negl%eax andl$10, %eax subl$5, %eax ret Please note that sbb (aka {*x86_movsicc_0_m1} ) is not generated anymore. Non-QImode setcc instructions are problematic on x86, since they need to be zero-extended to their full length. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56309 Uros.
Re: [Bug fortran/52846] [F2008] Support submodules - part 3/3
Dear Mikael, Thanks for your green light! I have been mulling over the trans-decl part of the patch and having been wondering if it is necessary. Without optimization, private entities can be linked to. Given the discussion concerning the combination of submodules and private entities, I wonder if this is not sufficient? Within submodule scope, an advisory could be given for undefined references to suggest recompiling the module without optimization or making the entities public. Cheers Paul On 3 August 2015 at 12:44, Mikael Morin mikael.mo...@sfr.fr wrote: Le 29/07/2015 17:08, Paul Richard Thomas a écrit : Dear All, On 24 July 2015 at 10:08, Damian Rouson dam...@sourceryinstitute.org wrote: I love this idea and had similar thoughts as well. :D Sent from my iPhone On Jul 24, 2015, at 1:06 AM, Paul Richard Thomas paul.richard.tho...@gmail.com wrote: Dear Mikael, It had crossed my mind also that a .mod and a .smod file could be written. Normally, the .smod files are produced by the submodules themselves, so that their descendants can pick up the symbols that they generate. There is no reason at all why this could not be implemented; early on in the development I did just this, although I think that it would now be easier to modify this patch. One huge advantage of proceeding in this way is that any resulting library can be distributed with the .mod file alone so that the private entities are never exposed. The penalty is that a second file is output. With best regards Paul Please find attached the implementation of this suggestion. Bootstraps and regtests on FC21/x86_64 - OK for trunk or is the original preferred? There hasn't been a lot of voices about this among the other active and less active team members. I prefer this private members to separate smod variant. It's OK for trunk as far as I'm concerned. Thanks. Mikael PS: Regarding redundant initializations: rather have too many than too few. ;-) -- Outside of a dog, a book is a man's best friend. Inside of a dog it's too dark to read. Groucho Marx
Re: [gomp4] expand acc_on_device earlier
On 08/03/15 07:37, Thomas Schwinge wrote: Hi! On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote: I've committed this to gomp4 branch. It expands the acc_on_device builtin earlier in the new oacc_xform pass. This will allow more optimization earlier on. Thanks! The existing expansion point now only needs to deal with the host-side case. Actually, no; committed to gomp-4_0-branch in r226498: Please clarify. This suggests a logic fault elsewhere. nathan -- Nathan Sidwell
RE: [PATCH, MIPS, Ping] Inline memcpy for MipsR6
Catherine, Inline-memcpy-2.c updated to not run with -Os. Patch rebased off current gcc sources. Thanks, Simon diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c index 1733457..627e078 100644 --- a/gcc/config/mips/mips.c +++ b/gcc/config/mips/mips.c @@ -7520,12 +7520,22 @@ mips_block_move_straight (rtx dest, rtx src, HOST_WIDE_INT length) half-word alignment, it is usually better to move in half words. For instance, lh/lh/sh/sh is usually better than lwl/lwr/swl/swr and lw/lw/sw/sw is usually better than ldl/ldr/sdl/sdr. - Otherwise move word-sized chunks. */ - if (MEM_ALIGN (src) == BITS_PER_WORD / 2 - MEM_ALIGN (dest) == BITS_PER_WORD / 2) -bits = BITS_PER_WORD / 2; + Otherwise move word-sized chunks. + + For ISA_HAS_LWL_LWR we rely on the lwl/lwr swl/swr load. Otherwise + picking the minimum of alignment or BITS_PER_WORD gets us the + desired size for bits. */ + + if (!ISA_HAS_LWL_LWR) +bits = MIN (BITS_PER_WORD, MIN (MEM_ALIGN (src), MEM_ALIGN (dest))); else -bits = BITS_PER_WORD; +{ + if (MEM_ALIGN (src) == BITS_PER_WORD / 2 + MEM_ALIGN (dest) == BITS_PER_WORD / 2) + bits = BITS_PER_WORD / 2; + else + bits = BITS_PER_WORD; +} mode = mode_for_size (bits, MODE_INT, 0); delta = bits / BITS_PER_UNIT; @@ -7644,8 +7654,9 @@ mips_block_move_loop (rtx dest, rtx src, HOST_WIDE_INT length, bool mips_expand_block_move (rtx dest, rtx src, rtx length) { - /* Disable entirely for R6 initially. */ - if (!ISA_HAS_LWL_LWR) + if (!ISA_HAS_LWL_LWR + (MEM_ALIGN (src) MIPS_MIN_MOVE_MEM_ALIGN + || MEM_ALIGN (dest) MIPS_MIN_MOVE_MEM_ALIGN)) return false; if (CONST_INT_P (length)) diff --git a/gcc/config/mips/mips.h b/gcc/config/mips/mips.h index ec69ed5..4b1787d 100644 --- a/gcc/config/mips/mips.h +++ b/gcc/config/mips/mips.h @@ -2969,6 +2969,9 @@ while (0) #undef PTRDIFF_TYPE #define PTRDIFF_TYPE (POINTER_SIZE == 64 ? long int : int) +/* The minimum alignment of any expanded block move. */ +#define MIPS_MIN_MOVE_MEM_ALIGN 16 + /* The maximum number of bytes that can be copied by one iteration of a movmemsi loop; see mips_block_move_loop. */ #define MIPS_MAX_MOVE_BYTES_PER_LOOP_ITER \ diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c b/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c new file mode 100644 index 000..5a254b1 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-1.c @@ -0,0 +1,16 @@ +/* { dg-options -fno-common isa_rev=6 } */ +/* { dg-skip-if code quality test { *-*-* } { -O0 -Os } { } } */ +/* { dg-final { scan-assembler-not \tmemcpy } } */ + +/* Test that memcpy is inline for target hardware + without swl, swr. */ + +#include string.h + +char c[40] __attribute__ ((aligned(8))); + +void +f1 () +{ + memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32); +} diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c b/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c new file mode 100644 index 000..e144e61 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-2.c @@ -0,0 +1,17 @@ +/* { dg-options -fno-common isa_rev=6 } */ +/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} { } } */ +/* { dg-final { scan-assembler-not \tmemcpy } } */ +/* { dg-final { scan-assembler-times \tsh\t 16 } } */ + +/* Test that inline memcpy is expanded for target hardware without + swl, swr when alignment is halfword and sufficent shs are produced. */ + +#include string.h + +char c[40] __attribute__ ((aligned(2))); + +void +f1 () +{ + memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32); +} diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c new file mode 100644 index 000..96a0387 --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-3.c @@ -0,0 +1,18 @@ +/* { dg-options -fno-common isa_rev=5 } */ +/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} { } } */ +/* { dg-final { scan-assembler-not \tmemcpy } } */ +/* { dg-final { scan-assembler-times swl 8 } } */ +/* { dg-final { scan-assembler-times swr 8 } } */ + +/* Test that inline memcpy for hardware with swl, swr handles subword + alignment and produces enough swl/swrs for mips32. */ + +#include string.h + +char c[40] __attribute__ ((aligned(2))); + +void +f1 () +{ + memcpy (c, 1234567890QWERTYUIOPASDFGHJKLZXCVBNM, 32); +} diff --git a/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c b/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c new file mode 100644 index 000..0e7a22e --- /dev/null +++ b/gcc/testsuite/gcc.target/mips/inline-memcpy-4.c @@ -0,0 +1,18 @@ +/* { dg-options -fno-common isa_rev=5 -mabi=64 } */ +/* { dg-skip-if code quality test { *-*-* } { -O0 -Os} { } } */ +/* { dg-final { scan-assembler-not \tmemcpy } } */ +/* { dg-final { scan-assembler-times sdl 4 } } */ +/* { dg-final { scan-assembler-times sdr 4 } } */ + +/* Test that inline memcpy for hardware with sdl,
Re: [PATCH][AArch64][1/3] Expand signed mod by power of 2 using CSNEG
On Fri, Jul 24, 2015 at 11:55:33AM +0100, Kyrill Tkachov wrote: Hi all, This patch implements an aarch64-specific expansion of the signed modulo by a power of 2. The proposed sequence makes use of the conditional negate instruction CSNEG. For a power of N, x % N can be calculated with: negs x1, x0 andx0, x0, #(N - 1) andx1, x1, #(N - 1) csneg x0, x0, x1, mi So, for N == 256 this would be: negs x1, x0 andx0, x0, #255 andx1, x1, #255 csneg x0, x0, x1, mi For comparison, the existing sequence emitted by expand_smod_pow2 in expmed.c is: asr x1, x0, 63 lsr x1, x1, 56 add x0, x0, x1 and x0, x0, 255 sub x0, x0, x1 Note that the CSNEG sequence is one instruction shorter and that the two and operations are independent, compared to the existing sequence where all instructions are dependent on the preceeding instructions. For the special case of N == 2 we can do even better: cmp x0, xzr and x0, x0, 1 csneg x0, x0, x0, ge I first tried implementing this in the generic code in expmed.c but that didn't work out for a few reasons: * This relies on having a conditional-negate instruction. We could gate it on HAVE_conditional_move and the combiner is capable of merging the final negate into the conditional move if a conditional negate is available (like on aarch64) but on targets without a conditional negate this would end up emitting a separate negate. * The first negs has to be a negs for the sequence to be a win i.e. having a separate negate and compare makes the sequence slower than the existing one (at least in my microbenchmarking) and I couldn't get subsequent passes to combine the negate and combine into the negs (presumably due to the use of the negated result in one of the ands). Doing it in the aarch64 backend where I could just call the exact gen_* functions that I need worked much more cleanly. The costing logic is updated to reflect this sequence during the intialisation of expmed.c where it calculates the smod_pow2_cheap metric. The tests will come in patch 3 of the series which are partly shared with the equivalent arm implementation. Bootstrapped and tested on aarch64. Ok for trunk? diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9d88a60..7bb4a55 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -6639,8 +6639,26 @@ cost_plus: if (VECTOR_MODE_P (mode)) *cost += extra_cost-vect.alu; else if (GET_MODE_CLASS (mode) == MODE_INT) - *cost += (extra_cost-mult[mode == DImode].add - + extra_cost-mult[mode == DImode].idiv); + { + /* We can expand signed mod by power of 2 using a + NEGS, two parallel ANDs and a CSNEG. Assume here + that CSNEG is COSTS_N_INSNS (1). This case should + only ever be reached through the set_smod_pow2_cheap check + in expmed.c. */ + if (code == MOD +CONST_INT_P (XEXP (x, 1)) +exact_log2 (INTVAL (XEXP (x, 1))) 0 +(mode == SImode || mode == DImode)) + { + *cost += COSTS_N_INSNS (3) ++ 2 * extra_cost-alu.logical ++ extra_cost-alu.arith; + return true; + } + + *cost += (extra_cost-mult[mode == DImode].add + + extra_cost-mult[mode == DImode].idiv); + } else if (mode == DFmode) *cost += (extra_cost-fp[1].mult + extra_cost-fp[1].div); This looks like it calculates the wrong cost for !speed. I think we will still expand through modmode3 when compiling for size, so we probably still want to cost the multiple instructions. Have I misunderstood? Thanks, James
Re: [gomp4] expand acc_on_device earlier
Hi! On Sun, 2 Aug 2015 21:23:30 -0400, Nathan Sidwell nat...@acm.org wrote: I've committed this to gomp4 branch. It expands the acc_on_device builtin earlier in the new oacc_xform pass. This will allow more optimization earlier on. Thanks! The existing expansion point now only needs to deal with the host-side case. Actually, no; committed to gomp-4_0-branch in r226498: commit 5d1009bd4b959da699cf439200ffa571bf547026 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Mon Aug 3 11:36:01 2015 + Fix offload compiler build, and de-duplicate acc_on_device logic When building an x86_64-intelmicemul-linux-gnu offloading compiler with r226484, the __builtin_acc_on_device usage in libgomp/oacc-init.c:acc_on_device makes it blow up: [...]/source-gcc/libgomp/oacc-init.c: In function 'acc_on_device': [...]/source-gcc/libgomp/oacc-init.c:643:10: internal compiler error: in expand_builtin_acc_on_device, at builtins.c:5891 return __builtin_acc_on_device (dev); ^ 0x6d9632 expand_builtin_acc_on_device [...]/source-gcc/gcc/builtins.c:5891 0x6d9632 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int) [...]/source-gcc/gcc/builtins.c:7127 0x8021ad expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) [...]/source-gcc/gcc/expr.c:10361 0x80e8b5 store_expr_with_bounds(tree_node*, rtx_def*, int, bool, tree_node*) [...]/source-gcc/gcc/expr.c:5400 0x810eb4 expand_assignment(tree_node*, tree_node*, bool) [...]/source-gcc/gcc/expr.c:5169 0x6f927b expand_call_stmt [...]/source-gcc/gcc/cfgexpand.c:2349 0x6f927b expand_gimple_stmt_1 [...]/source-gcc/gcc/cfgexpand.c:3238 0x6fa77c expand_gimple_stmt [...]/source-gcc/gcc/cfgexpand.c:3399 0x6fa842 expand_gimple_tailcall [...]/source-gcc/gcc/cfgexpand.c:3446 0x6fc615 expand_gimple_basic_block [...]/source-gcc/gcc/cfgexpand.c:5388 0x701456 execute [...]/source-gcc/gcc/cfgexpand.c:6023 (This is not a problem for a nvptx-none offloading compiler, because of its open-coded libgcc/config/nvptx/gomp-acc_on_device.c:acc_on_device implementation.) gcc/ * builtins.c (expand_builtin_on_device): Don't expect to be expanded on host compiler. libgcc/ * config/nvptx/gomp-acc_on_device.c: Don't include gomp-constants.h. (acc_on_device): Don't code directly here. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@226498 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp |3 +++ gcc/builtins.c | 30 ++ gcc/omp-low.c|7 ++- libgcc/ChangeLog.gomp| 10 -- libgcc/config/nvptx/gomp-acc_on_device.c | 16 +++- libgomp/oacc-init.c |2 +- 6 files changed, 39 insertions(+), 29 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index a30f6a3..d5cc1ed 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,8 @@ 2015-08-03 Thomas Schwinge tho...@codesourcery.com + * builtins.c (expand_builtin_on_device): Don't expect to be + expanded on host compiler. + * gimplify.c (oacc_default_clause): Remove in_code formal parameter. Adjust all users. diff --git gcc/builtins.c gcc/builtins.c index 9bc08f6..ce369a1 100644 --- gcc/builtins.c +++ gcc/builtins.c @@ -5880,39 +5880,45 @@ expand_stack_save (void) } -/* Expand OpenACC acc_on_device. This is expanded in the openacc - transform pass, but if the user has this outside of an offloaded - region, we'll find it here. In that case we must be host or none. */ +/* Expand OpenACC acc_on_device. This is usually expanded in the + oacc_transform pass, earlier on, but if used outside of an offloaded region, + we'll find it here. */ static rtx expand_builtin_acc_on_device (tree exp, rtx target) { -#ifdef ACCEL_COMPILER - gcc_unreachable (); -#else +#ifndef ACCEL_COMPILER gcc_assert (!get_oacc_fn_attrib (current_function_decl)); +#endif if (!validate_arglist (exp, INTEGER_TYPE, VOID_TYPE)) return NULL_RTX; tree arg = CALL_EXPR_ARG (exp, 0); - rtx val = expand_normal (arg); + + /* Return (arg == v1 || arg == v2) ? 1 : 0. */ + machine_mode v_mode = TYPE_MODE (TREE_TYPE (arg)); + rtx v = expand_normal (arg), v1, v2; +#ifdef ACCEL_COMPILER + v1 = GEN_INT (GOMP_DEVICE_NOT_HOST); + v2 = GEN_INT (ACCEL_COMPILER_acc_device); +#else + v1 = GEN_INT (GOMP_DEVICE_NONE); + v2 = GEN_INT (GOMP_DEVICE_HOST); +#endif machine_mode target_mode = TYPE_MODE (integer_type_node); if
[gomp4] PTX launch dimensions
I've committed this to gomp4. The ptx backend can now examine the openacc attribute to determine launch dimensions and figure out whether vector or worker single neutering is needed. nathan 2015-08-03 Nathan Sidwell nat...@codesourcery.com * config/nvptx/nvptx.c (nvptx_reorg): Check get_oacc_fn_attrib for launch dimensions and only do parallel processing when present. Check dimensions to determine neutering requirements. (nvptx_record_offload_symbol): Launch dimension attribute must be present on offloaded functions. Index: gcc/config/nvptx/nvptx.c === --- gcc/config/nvptx/nvptx.c (revision 226485) +++ gcc/config/nvptx/nvptx.c (working copy) @@ -2980,13 +2980,42 @@ nvptx_reorg (void) if (REG_N_SETS (i) == 0 REG_N_REFS (i) == 0) regno_reg_rtx[i] = const0_rtx; - parallel *pars = nvptx_discover_pars (bb_insn_map); - - nvptx_process_pars (pars); - nvptx_neuter_pars (pars, (GOMP_DIM_MASK (GOMP_DIM_VECTOR) - | GOMP_DIM_MASK (GOMP_DIM_WORKER)), 0); - - delete pars; + /* Determine launch dimensions of the function. If it is not an + offloaded function (i.e. this is a regular compiler), the + function has no neutering. */ + tree attr = get_oacc_fn_attrib (current_function_decl); + if (attr) +{ + unsigned mask = 0; + tree dims = TREE_VALUE (attr); + unsigned ix; + + for (ix = 0; ix != GOMP_DIM_MAX; ix++) + { + unsigned HOST_WIDE_INT dim = 0; + + if (dims) + { + tree cst = TREE_VALUE (dims); + + dim = TREE_INT_CST_LOW (cst); + dims = TREE_CHAIN (dims); + } + if (dim != 1) + mask |= GOMP_DIM_MASK (ix); + } + /* If there is worker neutering, there must be vector + neutering. Otherwise the hardware will fail. This really + should be dealt with earlier because it indicates faulty + logic in determining launch dimensions. */ + if (mask GOMP_DIM_MASK (GOMP_DIM_WORKER)) + mask |= GOMP_DIM_MASK (GOMP_DIM_VECTOR); + + parallel *pars = nvptx_discover_pars (bb_insn_map); + nvptx_process_pars (pars); + nvptx_neuter_pars (pars, mask, 0); + delete pars; +} nvptx_reorg_subreg (); @@ -3073,32 +3102,25 @@ nvptx_record_offload_symbol (tree decl) case FUNCTION_DECL: { tree attr = get_oacc_fn_attrib (decl); - tree dims = NULL_TREE; + tree dims = TREE_VALUE (attr); unsigned ix; - if (attr) - dims = TREE_VALUE (attr); fprintf (asm_out_file, //:FUNC_MAP \%s\, IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl))); - for (ix = 0; ix != GOMP_DIM_MAX; ix++) + for (ix = 0; ix != GOMP_DIM_MAX; ix++, dims = TREE_CHAIN (dims)) { - unsigned HOST_WIDE_INT dim = 0; - if (dims) - { - tree cst = TREE_VALUE (dims); - - /* When device_type support is added an ealier pass - should have massaged the attribute to be - ptx-specific. */ - gcc_assert (TREE_CODE (cst) == INTEGER_CST); - - dim = TREE_INT_CST_LOW (cst); - dims = TREE_CHAIN (dims); - } + tree cst = TREE_VALUE (dims); + + /* When device_type support is added an earlier pass + should have massaged the attribute to be + ptx-specific. */ + gcc_assert (TREE_CODE (cst) == INTEGER_CST); + + unsigned HOST_WIDE_INT dim = TREE_INT_CST_LOW (cst); fprintf (asm_out_file, , HOST_WIDE_INT_PRINT_HEX, dim); } - + fprintf (asm_out_file, \n); } break;
[PATCH] Merge even more tails (from for expansion) in genmatch.c
This changes for lowering for most cases that can end up being tail-merged to not substitute the operator into the result expression but refer to it using a local variable of the name of the for iterator. Boostrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. This drops the number of tail implementations from 744 to 224 (in my dev tree). Richard. 2015-08-03 Richard Biener rguent...@suse.de * genmatch.c (simplify::for_subst_vec): New member. (binary_ok): New helper for for lowering. (lower_for): Delay substituting operators into result expressions if we can merge the results eventually again. (capture_info::walk_result): Adjust for user_id appearing as result expression operator. (expr::gen_transform): Likewise. (dt_simplify::gen_1): Likewise. (dt_simplify::gen): Pass not substituted operators to tail functions or initialize local variable with it. (decision_tree::gen): Adjust function signature. * match.pd: Fix tests against global code and add default cases to switch stmts. Index: gcc/genmatch.c === --- gcc/genmatch.c (revision 226490) +++ gcc/genmatch.c (working copy) @@ -683,7 +683,7 @@ struct simplify simplify (simplify_kind kind_, operand *match_, operand *result_, vecvecuser_id * for_vec_, cid_map_t *capture_ids_) : kind (kind_), match (match_), result (result_), - for_vec (for_vec_), + for_vec (for_vec_), for_subst_vec (vNULL), capture_ids (capture_ids_), capture_max (capture_ids_-elements () - 1) {} simplify_kind kind; @@ -697,6 +697,7 @@ struct simplify /* Collected 'for' expression operators that have to be replaced in the lowering phase. */ vecvecuser_id * for_vec; + vecstd::pairuser_id *, id_base * for_subst_vec; /* A map of capture identifiers to indexes. */ cid_map_t *capture_ids; int capture_max; @@ -1135,6 +1136,38 @@ replace_id (operand *o, user_id *id, id_ return o; } +/* Return true if the binary operator OP is ok for delayed substitution + during for lowering. */ + +static bool +binary_ok (operator_id *op) +{ + switch (op-code) +{ +case PLUS_EXPR: +case MINUS_EXPR: +case MULT_EXPR: +case TRUNC_DIV_EXPR: +case CEIL_DIV_EXPR: +case FLOOR_DIV_EXPR: +case ROUND_DIV_EXPR: +case TRUNC_MOD_EXPR: +case CEIL_MOD_EXPR: +case FLOOR_MOD_EXPR: +case ROUND_MOD_EXPR: +case RDIV_EXPR: +case EXACT_DIV_EXPR: +case MIN_EXPR: +case MAX_EXPR: +case BIT_IOR_EXPR: +case BIT_XOR_EXPR: +case BIT_AND_EXPR: + return true; +default: + return false; +} +} + /* Lower recorded fors for SIN and output to SIMPLIFIERS. */ static void @@ -1154,9 +1187,46 @@ lower_for (simplify *sin, vecsimplify * vecuser_id * ids = for_vec[fi]; unsigned n_ids = ids.length (); unsigned max_n_opers = 0; + bool can_delay_subst = (sin-kind == simplify::SIMPLIFY); for (unsigned i = 0; i n_ids; ++i) - if (ids[i]-substitutes.length () max_n_opers) - max_n_opers = ids[i]-substitutes.length (); + { + if (ids[i]-substitutes.length () max_n_opers) + max_n_opers = ids[i]-substitutes.length (); + /* Require that all substitutes are of the same kind so that +if we delay substitution to the result op code generation +can look at the first substitute for deciding things like +types of operands. */ + enum id_base::id_kind kind = ids[i]-substitutes[0]-kind; + for (unsigned j = 0; j ids[i]-substitutes.length (); ++j) + if (ids[i]-substitutes[j]-kind != kind) + can_delay_subst = false; + else if (operator_id *op + = dyn_cast operator_id * (ids[i]-substitutes[j])) + { + operator_id *op0 + = as_a operator_id * (ids[i]-substitutes[0]); + if (strcmp (op-tcc, tcc_comparison) == 0 +strcmp (op0-tcc, tcc_comparison) == 0) + ; + /* Unfortunately we can't just allow all tcc_binary. */ + else if (strcmp (op-tcc, tcc_binary) == 0 + strcmp (op0-tcc, tcc_binary) == 0 + binary_ok (op) + binary_ok (op0)) + ; + else if ((strcmp (op-id + 1, SHIFT_EXPR) == 0 + || strcmp (op-id + 1, ROTATE_EXPR) == 0) + (strcmp (op0-id + 1, SHIFT_EXPR) == 0 +|| strcmp (op0-id + 1, ROTATE_EXPR) == 0)) + ; + else + can_delay_subst = false; + } + else if (is_a fn_id * (ids[i]-substitutes[j])) + ; + else + can_delay_subst = false; + }
Re: [C++/66443] virtual base of abstract class
On 08/02/15 23:44, Jason Merrill wrote: It seems to me that DR 1658 ignores vbases of abstract classes for determining whether a destructor is deleted, but says nothing about exception specifications. DR 1351 specifically ignores vbases of abstract classes for determining the exception specification of a constructor, but only for constructors. So I think that for destructors we want to walk the base, but pass in a fake delete_p. Ok. that all makes sense. Why the check for inherited_parms? I would think that inheriting constructors would be handled like other ctors. I think I continue to be confused ... nathan