Re: [PATCH] Don't necessarily emit object size checks for ARRAY_REFs
On Thu, Nov 06, 2014 at 11:19:08PM +0100, Marek Polacek wrote: First part of this patch is about removing the useless check that we talked about earlier today. The rest is about not emitting UBSAN_OBJECT_SIZE checks (those often come with multiple statements to compute a pointer difference) for ARRAY_REFs that are already instrumented by UBSAN_BOUNDS. I do this by moving the UBSAN_OBJECT_SIZE instrumentation so that it is done first in the ubsan pass - then I can just check whether the statement before that ARRAY_REF is a UBSAN_BOUND check. If it is, it should be clear that it is checking the ARRAY_REF, and I can drop the UBSAN_OBJECT_SIZE check. (Moving the UBSAN_OBJECT_SIZE instrumentation means that there won't be e.g. UBSAN_NULL check in between the ARRAY_REF and UBSAN_BOUND.) Earlier, I thought I should check that both UBSAN_OBJECT_SIZE and UBSAN_BOUND checks are checking the same array index, but that wouldn't work for multidimensional arrays, and just should not be needed. IMHO it is needed and is highly desirable, otherwise you risk missed diagnostics from -fsanitize=object-size when it is needed. Consider e.g.: extern int a[][10][10]; int foo (int x, int y, int z) { return a[x][y][z]; } int a[10][10][10] = {}; testcase, here only the y and z indexes are bounds checked, but the x index is not (UBSAN_BOUNDS is added early, before the a var definition is parsed, while ubsan pass runs afterwards, so can know the object size. If you have a multi-dimensional array, you can just walk backwards within the same bb, looking for UBSAN_BOUNDS calls that verify the indexes where needed. Say on: struct S { int a:3; }; extern struct S a[][10][10]; int foo (int x, int y, int z) { return a[5][11][z].a; } struct S a[10][10][10] = {}; you have: UBSAN_BOUNDS (0B, 11, 9); z.0_4 = z_3(D); UBSAN_BOUNDS (0B, z.0_4, 9); _6 = a[5][11][z.0_4].a; and you walk the handled components: 1) COMPONENT_REF - ok 2) ARRAY_REF with index z.0_4 and array index maximum is 9, there is UBSAN_BOUNDS right above it checking that 3) ARRAY_REF with index 11; 11 is bigger than index maximum 9, there is UBSAN_BOUNDS call for it in the same bb 4) ARRAY_REF with index 5; 5 is smaller or equal than index maximum 9, no UBSAN_BOUNDS is needed 5) decl inside of the innermost handled component, we can avoid the object-size instrumentation; if the base is not a decl, never omit object-size instrumentation. Jakub
Re: [PATCH, i386]: Use std::swap
On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 10 Nov 2014, Richard Biener wrote: No extra includes required? utility is already included in wide-int.h and rtl.h, should probably move those. Bah, we hit a problem. std::swap has been moved from algorithm to utility in C++11, and the patch breaks build on CentOS 5.11 (gcc-4.1.2). Short of reverting the i386.c patch, is there a quick solution by including some additional headers? Uros.
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote: Hello! do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function âint64_t sreal::to_int() constâ: ../../gcc/sreal.c:159: error: âINT64_MAXâ was not declared in this scope Index: system.h === --- system.h(revision 217338) +++ system.h(working copy) @@ -27,6 +27,7 @@ event inttypes.h gets pulled in by another header it is already defined. */ #define __STDC_FORMAT_MACROS +#define __STDC_LIMIT_MACROS /* We must include stdarg.h before stdio.h. */ #include stdarg.h Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... Jakub
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. IMO that's the sort of things some of us were afraid of when the C++ switch was being discussed and IIRC we were told this would not happen... -- Eric Botcazou
Re: [Patch,ARM/Thumb1]Fix 'mov' instruction for Thumb-1 UAL
On 11/11/14 08:40, Terry Guo wrote: Hi there, Attached patch intends to fix below trunk failure caused by recent thumb-1 UAL patch: /tmp/cc9EfnXy.s: Assembler messages: /tmp/cc9EfnXy.s:69: Error: MOV Rd, Rs with two low registers is not permitted on this architecture -- `mov r6,r7' Now for pre-v6 Thumb-1, the 'movs' will be used rather than the 'mov'. The multilib for ARM/Thumb1/hard-float all can be built. Tested with regression test on armv4t thumb and v6m thumb. No regression. Is it ok to trunk? This is OK. Ramana BR, Terry 2014-11-11 Terry Guo terry@arm.com * doc/invoke.texi (-masm-syntax-unified): Reword and fix typo. * config/arm/thumb1.md (*thumb_mulsi3): Use movs to move low registers. (*thumb1_movhf): Likewise.
[Patch,ARM/Thumb1]Fix 'mov' instruction for Thumb-1 UAL
Hi there, Attached patch intends to fix below trunk failure caused by recent thumb-1 UAL patch: /tmp/cc9EfnXy.s: Assembler messages: /tmp/cc9EfnXy.s:69: Error: MOV Rd, Rs with two low registers is not permitted on this architecture -- `mov r6,r7' Now for pre-v6 Thumb-1, the 'movs' will be used rather than the 'mov'. The multilib for ARM/Thumb1/hard-float all can be built. Tested with regression test on armv4t thumb and v6m thumb. No regression. Is it ok to trunk? BR, Terry 2014-11-11 Terry Guo terry@arm.com * doc/invoke.texi (-masm-syntax-unified): Reword and fix typo. * config/arm/thumb1.md (*thumb_mulsi3): Use movs to move low registers. (*thumb1_movhf): Likewise.diff --git a/gcc/config/arm/thumb1.md b/gcc/config/arm/thumb1.md index 8a2abe9..3d6f80b 100644 --- a/gcc/config/arm/thumb1.md +++ b/gcc/config/arm/thumb1.md @@ -131,12 +131,10 @@ (mult:SI (match_operand:SI 1 register_operand %l,*h,0) (match_operand:SI 2 register_operand l,l,l)))] TARGET_THUMB1 !arm_arch6 - * - if (which_alternative 2) -return \mov\\t%0, %1\;muls\\t%0, %2\; - else -return \muls\\t%0, %2\; - + @ + movs\\t%0, %1\;muls\\t%0, %2 + mov\\t%0, %1\;muls\\t%0, %2 + muls\\t%0, %2 [(set_attr length 4,4,2) (set_attr type muls)] ) @@ -787,6 +785,8 @@ * switch (which_alternative) { +case 0: + return \movs\\t%0, %1\; case 1: { rtx addr; diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index cd20b6e..13270bc 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -13040,13 +13040,11 @@ off by default. @item -masm-syntax-unified @opindex masm-syntax-unified -Assume the Thumb1 inline assembly code are using unified syntax. -The default is currently off, which means divided syntax is assumed. +Assume inline assembler is using unified asm syntax. The default is +currently off which implies divided syntax. Currently this option is +available only for Thumb1 and has no effect on ARM state and Thumb2. However, this may change in future releases of GCC. Divided syntax -should be considered deprecated. This option has no effect when -generating Thumb2 code. Thumb2 assembly code always uses unified syntax. -This option has no effect for ARM state assembly code which will still -uses divided syntax. +should be considered deprecated. @item -mrestrict-it @opindex mrestrict-it
Re: [AARCH64, NEON] Any regression testcase for AARCH64 NEON intrinsics in GCC testsuite?
Hello, I have written a testsuite for AArch32 Neon intrinsics, available at https://gitorious.org/arm-neon-tests I am in the process of converting in into DejaGnu form for integration into GCC. My most recent submission was https://gcc.gnu.org/ml/gcc-patches/2014-07/msg00022.html but I plan to submit another version soon. As you'll notice, this first submission only covers a small subset of the original testsuite, but I do plan to convert it all. That being said, the current testsuite only covers AArch32 Neon intrinsics, and needs to be expanded to cover the AArch64. It is still useful to test the AArch32 subset on AArch64. Christophe. Hello Christophe, Is the testsuite https://gitorious.org/arm-neon-tests written for little-endian? I noticed that some testcases treat result_int8x8 as an array and access it by array indexing when checking the test result. And this will not work for big-endian. An example: exec_vzip { int8_t buffer_int8x8 [] = { (int8_t)-16, (int8_t)-15, (int8_t)-14, (int8_t)-13, (int8_t)-12, (int8_t)-11, (int8_t)-10, (int8_t)-9, }; int8x8_t vector1_int8x8; int8x8_t vector2_int8x8; vector1_int8x8 = vld1_s8(buffer_int8x8); vector2_int8x8 = vdup_n_s8(0x11); result_vec_int8x8x2 = vzip_s8(vector1_int8x8, vector2_int8x8); vst2_s8(result_bis_int8x8, result_vec_int8x8x2); memcpy(result_int8x8, result_bis_int8x8, sizeof(result_int8x8)); { { int i; for(i=0; i8 ; i++) { if (result_int8x8[i] != expected0_int8x8[i]) { fprintf(stderr, ERROR in %s (%s line %d in buffer '%s') at type %s index %d: got 0x% x != 0x% x %s\n, VZIP/VZIPQ, ./gcc.target/aarch64/advsimd-intrinsics/vzip.c, 232, expected0, int8x8, i, result_int8x8[i], expected0_int8x8[i], strlen((chunk 0)) 0 ? (chunk 0) : ); abort(); } } }; }; }
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 9:11 AM, Jakub Jelinek ja...@redhat.com wrote: do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā: ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... Yes, following patch also bootstraps: --cut here-- Index: sreal.c === --- sreal.c (revision 217338) +++ sreal.c (working copy) @@ -156,7 +156,7 @@ sreal::to_int () const if (m_exp = -SREAL_BITS) return 0; if (m_exp = SREAL_PART_BITS) -return INT64_MAX; +return INTTYPE_MAXIMUM (int64_t); if (m_exp 0) return m_sig m_exp; if (m_exp 0) --cut here-- Uros.
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 09:45:38AM +0100, Uros Bizjak wrote: On Tue, Nov 11, 2014 at 9:11 AM, Jakub Jelinek ja...@redhat.com wrote: do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā: ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... Yes, following patch also bootstraps: This is ok for trunk with appropriate ChangeLog entry. Thanks. --- sreal.c (revision 217338) +++ sreal.c (working copy) @@ -156,7 +156,7 @@ sreal::to_int () const if (m_exp = -SREAL_BITS) return 0; if (m_exp = SREAL_PART_BITS) -return INT64_MAX; +return INTTYPE_MAXIMUM (int64_t); if (m_exp 0) return m_sig m_exp; if (m_exp 0) --cut here-- Jakub
Re: [match-and-simplify] operator-lists in expression
On Mon, Nov 10, 2014 at 2:39 PM, Prathamesh Kulkarni bilbotheelffri...@gmail.com wrote: Hi, This patch adds support for operator-lists to be used in expression. I reuse operator-list as the iterator. This is not really valid since user-defined operator-lists cannot be iterator in 'for', but it was convenient to reuse operator-list as a 'for' iterator and lower_for doesn't care about that. eg: (define_operator_list list1 plus minus) (simplify (list1 @x integer_zerop) (non_lvalue @x)) is wrapped into 'for' as: (lower_operator_list): (for list1 (plus minus) (simplify (list1 @x integer_zerop) (non_lvalue @x))) this is not really valid since we reject list1 to be used as iterator if it were written by user. Is this okay or should I introduce an explicit temporary iterator ? No, it's ok to re-use it. I think you should get rid of the extra lowering step and instead in parse_simplify create the extra for directly when building a simplify (the multiple simplfy buildings really ask for factoring it out to a method in the parser class which has access to active_fors, active_ifs and friends). Also you use a vector to store operator_lists - this will gobble up duplicates. It's probably better to use a pointer_hash user_id * for this. Thanks for continuing to work on this! Richard. so it gets lowered to something like: (for tmp1 (list1) (simplify (tmp1 @x integer_zerop) (non_lvalue @x))) * genmatch.c (fatal_at): New overloaded function. (simplify::oper_lists): New member. (simplify::simplify): Add default argument. (lower_commutative): Adjust call to simplify::simplify. (lower_opt_convert): Likewise. (lower_operator_list): New function. (lower): Call lower_operator_list. (parser::parsing_for_p): New member function. (parser::oper_lists): New member. (parser::parse_operation): Check for operator-list. (parser::parse_c_expr): Likewise. (parser::parse_simplify): Reset parser::oper_lists. Adjust call to simplify::simplify. (parser::parser): Initialize parser::oper_lists. * match-builtin.pd: Adjust patten to use SQRTs and POWs. Thanks, Prathamesh
Fix libtool.m4 for Darwin = 10.10
libtool.m4 has a globbing pattern that assumes Mac OS version numbers 10.x are one digit for x. That’s unfortunate, especially now that Mac OS 10.10 was released :) libtool has released a new version to fix this bug. The attached patch, bootstrapped and regtested on x86_64-apple-darwin14 (Mac OS 10.10), incorporates this fix into our libtool.m4 and regenerates the configures under our control. OK to commit? This touches so many area it probably needs a build maintainer or global maintainer to approve it. FX PS: Let me know what the procedure is for the toplevel files (libtool.m4 and configure). libtool.ChangeLog Description: Binary data libtool.diff Description: Binary data
Re: Fix libtool.m4 for Darwin = 10.10
On Tue, Nov 11, 2014 at 09:58:45AM +0100, FX wrote: libtool.m4 has a globbing pattern that assumes Mac OS version numbers 10.x are one digit for x. That’s unfortunate, especially now that Mac OS 10.10 was released :) libtool has released a new version to fix this bug. The attached patch, bootstrapped and regtested on x86_64-apple-darwin14 (Mac OS 10.10), incorporates this fix into our libtool.m4 and regenerates the configures under our control. OK to commit? This touches so many area it probably needs a build maintainer or global maintainer to approve it. FX PS: Let me know what the procedure is for the toplevel files (libtool.m4 and configure). Your patch contains lots of other changes, not just the libtool.m4 change. Please filter those out. 2014-11-11 Francois-Xavier Coudert fxcoud...@gcc.gnu.org PR target/63610 * boehm-gc/configure: Regenerate. boehm-gc/ etc. have their own ChangeLog files, so the entries should say just * configure: Regenerate. 2014-11-11 Francois-Xavier Coudert fxcoud...@gcc.gnu.org PR target/63610 * libcc1/plugin.cc: ??? 2014-11-11 Francois-Xavier Coudert fxcoud...@gcc.gnu.org PR target/63610 * libvtv/configure (else): ?? Jakub
Re: [PATCH][Revisedx2] Fix PR63750
On Mon, Nov 10, 2014 at 3:58 PM, FX fxcoud...@gmail.com wrote: My knowledge of C++ is limited, but I think this additional patch to wide-int.h is the proper fix to the issue reported by Jack, no? I’m bootstrapping it right now, it already passed stage 2. Boostrapped succeeded on x86_64-apple-darwin14. OK to commit to trunk? Ok. Thanks, Richard.
Re: Fix libtool.m4 for Darwin = 10.10
Your patch contains lots of other changes, not just the libtool.m4 change. Please filter those out. Sorry about that. The patch attached should be clean, and the ChangeLog entries formatted as they should. OK to commit? This touches so many area it probably needs a build maintainer or global maintainer to approve it. FX libtool.diff Description: Binary data libtool.ChangeLog Description: Binary data
Re: [PATCH][Revisedx2] Fix PR63750
Ok. Committed as rev. 217342. Thanks for the review! FX
Re: [PATCH] c++ify sreal
On Tue, 11 Nov 2014, Jakub Jelinek wrote: On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote: Hello! do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function âint64_t sreal::to_int() constâ: ../../gcc/sreal.c:159: error: âINT64_MAXâ was not declared in this scope Index: system.h === --- system.h(revision 217338) +++ system.h(working copy) @@ -27,6 +27,7 @@ event inttypes.h gets pulled in by another header it is already defined. */ #define __STDC_FORMAT_MACROS +#define __STDC_LIMIT_MACROS /* We must include stdarg.h before stdio.h. */ #include stdarg.h Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... We could also start using the standard C++ mechanism (numeric_limits). (nothing wrong with INTTYPE_MAXIMUM, just an alternative) -- Marc Glisse
RE: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target
-Original Message- From: Terry Guo [mailto:terry@arm.com] Sent: Friday, November 07, 2014 6:01 PM To: 'Christian Bruel' Cc: gcc-patches@gcc.gnu.org Subject: RE: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target -Original Message- From: Christian Bruel [mailto:christian.br...@st.com] Sent: Friday, November 07, 2014 5:27 PM To: Terry Guo Cc: gcc-patches@gcc.gnu.org Subject: Re: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target hi, the ARM bootstrap seems to fail for libgcc2.c on the thumb multilib for libgcc2: muldi3 -mthumb -O2 -g /tmp/ccYrycUw.s: Assembler messages: /tmp/ccYrycUw.s:69: Error: MOV Rd, Rs with two low registers is not permitted on this architecture -- `mov r6,r7' preprocessed attached. Thanks Christian Many thanks. I am looking into it now. BR, Terry Fix is committed to trunk at https://gcc.gnu.org/viewcvs/gcc?view=revisionrevision=217341. BR, Terry
Re: [x86, 6/n] Replace builtins with vector extensions
Hello Marc, Uroš, On 10 Nov 21:33, Uros Bizjak wrote: On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, and == for integer vectors of size 128. I was surprised not to find _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not 512, there is no corresponding intrinsic, there are only _mask versions that return a mask. Let's ask Kirill (CC'd) about missing intrinsics. We have no `_mm_cmplt_epi64' intrinsic because there's no such instruction in Intel ISA. All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from AVX-512*. VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we have intrinsics for that (config/i386/avx512fintrin.h): extern __inline __mmask16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y) -- Thanks, K
Re: [x86, 6/n] Replace builtins with vector extensions
On Tue, 11 Nov 2014, Kirill Yukhin wrote: Hello Marc, Uroš, On 10 Nov 21:33, Uros Bizjak wrote: On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, and == for integer vectors of size 128. I was surprised not to find _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not 512, there is no corresponding intrinsic, there are only _mask versions that return a mask. Let's ask Kirill (CC'd) about missing intrinsics. We have no `_mm_cmplt_epi64' intrinsic because there's no such instruction in Intel ISA. We have _mm_cmplt_epi32 without a corresponding instruction though ;-) (yes, it is useless) All we have is [V]PCMP[EQ|GT] on pre-AVX-512* and VPCMP starting from AVX-512*. VPCMP is able to model VPCMPLT by specifiyng corresponding immediate and we have intrinsics for that (config/i386/avx512fintrin.h): extern __inline __mmask16 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm512_cmplt_epu32_mask (__m512i __X, __m512i __Y) -- Marc Glisse
Re: [PATCH, i386]: Use std::swap
On Tue, Nov 11, 2014 at 9:09 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 10 Nov 2014, Richard Biener wrote: No extra includes required? utility is already included in wide-int.h and rtl.h, should probably move those. Bah, we hit a problem. std::swap has been moved from algorithm to utility in C++11, and the patch breaks build on CentOS 5.11 (gcc-4.1.2). Short of reverting the i386.c patch, is there a quick solution by including some additional headers? Attached patch that implements both suggestions from Richi and Marc fixes the bootstrap. 2014-11-11 Uros Bizjak ubiz...@gmail.com * system.h: Include algorithm and utility. * rtl.h: Do not include utility here. * wide-int.h: Ditto. * tree-vect-data-refs.c (swap): Remove template. (vect_prune_runtime_alias_test_list): Use std::swap instead of swap. Bootstrapped on x86_64-linux-gnu (CentOS 5.11). OK for mainline? BTW: There are lots of places where std::swap can be used, a nice search-and-replace task for someone to start with gcc development. ;) Uros. Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 217340) +++ tree-vect-data-refs.c (working copy) @@ -2718,14 +2718,6 @@ return 0; } -template class T static void -swap (T a, T b) -{ - T c (a); - a = b; - b = c; -} - /* Function vect_vfa_segment_size. Create an expression that computes the size of segment @@ -2858,7 +2850,7 @@ vect_prune_runtime_alias_test_list (loop_vec_info dr_with_seg_len (dr_b, segment_length_b)); if (compare_tree (DR_BASE_ADDRESS (dr_a), DR_BASE_ADDRESS (dr_b)) 0) - swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); + std::swap (dr_with_seg_len_pair.first, dr_with_seg_len_pair.second); comp_alias_ddrs.safe_push (dr_with_seg_len_pair); } @@ -2908,8 +2900,8 @@ vect_prune_runtime_alias_test_list (loop_vec_info and DR_A1 and DR_A2 are two consecutive memrefs. */ if (*dr_a1 == *dr_a2) { - swap (dr_a1, dr_b1); - swap (dr_a2, dr_b2); + std::swap (dr_a1, dr_b1); + std::swap (dr_a2, dr_b2); } if (!operand_equal_p (DR_BASE_ADDRESS (dr_a1-dr), Index: wide-int.h === --- wide-int.h (revision 217340) +++ wide-int.h (working copy) @@ -216,8 +216,6 @@ the same result as X + X; the precision of the shift amount Y can be arbitrarily different from X. */ - -#include utility #include system.h #include hwint.h #include signop.h Index: rtl.h === --- rtl.h (revision 217340) +++ rtl.h (working copy) @@ -20,7 +20,6 @@ #ifndef GCC_RTL_H #define GCC_RTL_H -#include utility #include statistics.h #include machmode.h #include input.h Index: system.h === --- system.h(revision 217340) +++ system.h(working copy) @@ -208,7 +208,9 @@ #endif #ifdef __cplusplus +# include algorithm # include cstring +# include utility #endif /* Some of glibc's string inlines cause warnings. Plus we'd rather
Re: libstdc++ new deque failures
The patch below break bootstrap on darwin (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63811): Fix std::deque move construction with non-equal allocators. * include/bits/stl_deque.h (_Deque_base::_Deque_base(_Deque_base)): Dispatch according to whether allocators are always equal. (_Deque_base::_M_move_impl()): Implement move-from state. In file included from /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/deque:64:0, from /Users/fx/devel/gcc/trunk2/libstdc++-v3/include/precompiled/stdc++.h:67: /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h: In member function ‘std::_Deque_base_Tp, _Alloc::_Deque_impl std::_Deque_base_Tp, _Alloc::_M_move_impl()’: /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h:645:17: error: expected primary-expression before ‘__attribute’ _Tp_alloc_type __attribute((__unused__)) {std::move(__alloc)}; ^ make[2]: *** [x86_64-apple-darwin14.0.0/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1
Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code
This patch failed regtesting -- and on second thought I'm not too confident that the refactoring is strictly an improvement so I will try to fix the main issue (that is to make the test vrp-1.c fail to compile) in a more direct way.
Re: [PATCH 1/2] VRP: Simplify logic for checking if any asserts need to be inserted
On Tue, Nov 11, 2014 at 4:51 AM, Patrick Palka patr...@parcs.ath.cx wrote: Hi, This patch tweaks the VRP code to simply inspect the need_assert_for bitmap when determining whether any asserts need to be inserted. Consequently we no longer have to manually keep track of whether a call to register_new_assert_for() was made. This patch is an updated version of a patch that was approved a few months ago but was never committed. Bootstrapped and regtested on x86_64-unknown-linux-gnu with no new regressions. Is it OK to commit? Ok. Thanks, Richard. 2014-08-13 Patrick Palka ppa...@gcc.gnu.org * tree-vrp.c (register_edge_assert_for_2): Change return type to void and adjust accordingly. (register_edge_assert_for_1): Likewise. (register_edge_assert_for): Likewise. (find_conditional_asserts): Likewise. (find_switch_asserts): Likewise. (find_assert_locations_1): Likewise. (find_assert_locations): Likewise. (insert_range_insertions): Inspect the need_assert_for bitmap. --- gcc/tree-vrp.c | 157 ++--- 1 file changed, 49 insertions(+), 108 deletions(-) diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index 4e4ebe0..f0a4382 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -4977,32 +4977,27 @@ masked_increment (const wide_int val_in, const wide_int mask, /* Try to register an edge assertion for SSA name NAME on edge E for the condition COND contributing to the conditional jump pointed to by BSI. - Invert the condition COND if INVERT is true. - Return true if an assertion for NAME could be registered. */ + Invert the condition COND if INVERT is true. */ -static bool +static void register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, enum tree_code cond_code, tree cond_op0, tree cond_op1, bool invert) { tree val; enum tree_code comp_code; - bool retval = false; if (!extract_code_and_val_from_cond_with_ops (name, cond_code, cond_op0, cond_op1, invert, comp_code, val)) -return false; +return; /* Only register an ASSERT_EXPR if NAME was found in the sub-graph reachable from E. */ if (live_on_edge (e, name) !has_single_use (name)) -{ - register_new_assert_for (name, name, comp_code, val, NULL, e, bsi); - retval = true; -} +register_new_assert_for (name, name, comp_code, val, NULL, e, bsi); /* In the case of NAME = CST and NAME being defined as NAME = (unsigned) NAME2 + CST2 we can assert NAME2 = -CST2 @@ -5063,8 +5058,6 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, } register_new_assert_for (name3, tmp, comp_code, val, NULL, e, bsi); - - retval = true; } /* If name2 is used later, create an ASSERT_EXPR for it. */ @@ -5094,8 +5087,6 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, } register_new_assert_for (name2, tmp, comp_code, val, NULL, e, bsi); - - retval = true; } } @@ -5133,7 +5124,6 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, cst = int_const_binop (code, val, cst); register_new_assert_for (name2, name2, comp_code, cst, NULL, e, bsi); - retval = true; } } } @@ -5197,8 +5187,6 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, register_new_assert_for (name2, tmp, new_comp_code, cst, NULL, e, bsi); - - retval = true; } } @@ -5276,7 +5264,6 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, register_new_assert_for (name2, tmp, new_comp_code, new_val, NULL, e, bsi); - retval = true; } } @@ -5297,8 +5284,7 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, TREE_CODE (TREE_TYPE (val)) == INTEGER_TYPE TYPE_UNSIGNED (TREE_TYPE (val)) TYPE_PRECISION (TREE_TYPE (gimple_assign_rhs1 (def_stmt))) - prec - !retval)) + prec)) { name2 = gimple_assign_rhs1 (def_stmt); if (rhs_code == BIT_AND_EXPR) @@ -5522,13 +5508,10 @@ register_edge_assert_for_2 (tree name, edge e, gimple_stmt_iterator bsi, register_new_assert_for (names[i], tmp, LE_EXPR, new_val, NULL, e, bsi); -
[PATCH] [AArch64, RTL] Bics instruction generation for aarch64
From 98bb6d7323ce79e28be8ef892b919391ed857e1f Mon Sep 17 00:00:00 2001 From: Alex Velenko alex.vele...@arm.com Date: Fri, 31 Oct 2014 18:43:32 + Subject: [PATCH] [AArch64, RTL] Bics instruction generation for aarch64 Hi, This patch adds rtl patterns for aarch64 to generate bics instructions in cases when caputed value gets discarded and only only the status regester change of the instruction gets reused. Previously, bics would only be generated, if the value computed by bics would later be reused, which is not necessarily the case when computing this value for if statements. Is this patch ok? Thanks, Alex gcc/ 2014-11-10 Alex Velenko alex.vele...@arm.com * gcc/config/aarch64/aarch64.md (and_one_cmplmode3_compare0_no_reuse): New define_insn. * (and_one_cmpl_SHIFT:optabmode3_compare0_no_reuse): Likewise. gcc/testsuite/ 2014-11-10 Alex Velenko alex.vele...@arm.com * gcc.target/aarch64/bics1.c : New testcase. --- gcc/config/aarch64/aarch64.md | 26 gcc/testsuite/gcc.target/aarch64/bics_3.c | 69 +++ 2 files changed, 95 insertions(+) create mode 100644 gcc/testsuite/gcc.target/aarch64/bics_3.c diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 341c26f..6158d82 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2845,6 +2845,18 @@ [(set_attr type logics_reg)] ) +(define_insn *and_one_cmplmode3_compare0_no_reuse + [(set (reg:CC_NZ CC_REGNUM) +(compare:CC_NZ + (and:GPI (not:GPI + (match_operand:GPI 0 register_operand r)) + (match_operand:GPI 1 register_operand r)) + (const_int 0)))] + + bics\\twzr, %w1, %w0 + [(set_attr type logics_reg)] +) + (define_insn *LOGICAL:optab_one_cmpl_SHIFT:optabmode3 [(set (match_operand:GPI 0 register_operand =r) (LOGICAL:GPI (not:GPI @@ -2894,6 +2906,20 @@ [(set_attr type logics_shift_imm)] ) +(define_insn *and_one_cmpl_SHIFT:optabmode3_compare0_no_reuse + [(set (reg:CC_NZ CC_REGNUM) +(compare:CC_NZ + (and:GPI (not:GPI + (SHIFT:GPI +(match_operand:GPI 0 register_operand r) +(match_operand:QI 1 aarch64_shift_imm_mode n))) + (match_operand:GPI 2 register_operand r)) + (const_int 0)))] + + bics\\twzr, %w2, %w0, SHIFT:shift %1 + [(set_attr type logics_shift_imm)] +) + (define_insn clzmode2 [(set (match_operand:GPI 0 register_operand =r) (clz:GPI (match_operand:GPI 1 register_operand r)))] diff --git a/gcc/testsuite/gcc.target/aarch64/bics_3.c b/gcc/testsuite/gcc.target/aarch64/bics_3.c new file mode 100644 index 000..ecb53e9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bics_3.c @@ -0,0 +1,69 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps } */ + +extern void abort (void); + +int __attribute__ ((noinline)) +bics_si_test (int a, int b) +{ + if (a ~b) +return 1; + else +return 0; +} + +int __attribute__ ((noinline)) +bics_si_test2 (int a, int b) +{ + if (a ~ (b 2)) +return 1; + else +return 0; +} + +typedef long long s64; + +int __attribute__ ((noinline)) +bics_di_test (s64 a, s64 b) +{ + if (a ~b) +return 1; + else +return 0; +} + +int __attribute__ ((noinline)) +bics_di_test2 (s64 a, s64 b) +{ + if (a ~(b 2)) +return 1; + else +return 0; +} + +int +main (void) +{ + int a = 5; + int b = 5; + int c = 20; + s64 d = 5; + s64 e = 5; + s64 f = 20; + if (bics_si_test (a, b)) +abort (); + if (bics_si_test2 (c, b)) +abort (); + if (bics_di_test (d, e)) +abort (); + if (bics_di_test2 (f, e)) +abort (); + return 0; +} + +/* { dg-final { scan-assembler-times bics\twzr, w\[0-9\]+, w\[0-9\]+ 2 } } */ +/* { dg-final { scan-assembler-times bics\twzr, w\[0-9\]+, w\[0-9\]+, lsl 2 1 } } */ +/* { dg-final { scan-assembler-times bics\txzr, x\[0-9\]+, x\[0-9\]+ 2 } } */ +/* { dg-final { scan-assembler-times bics\txzr, x\[0-9\]+, x\[0-9\]+, lsl 2 1 } } */ + +/* { dg-final { cleanup-saved-temps } } */ -- 1.8.1.2
Re: libstdc++ new deque failures
On 11/11/14 10:49 +0100, FX wrote: The patch below break bootstrap on darwin (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63811): Fix std::deque move construction with non-equal allocators. * include/bits/stl_deque.h (_Deque_base::_Deque_base(_Deque_base)): Dispatch according to whether allocators are always equal. (_Deque_base::_M_move_impl()): Implement move-from state. In file included from /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/deque:64:0, from /Users/fx/devel/gcc/trunk2/libstdc++-v3/include/precompiled/stdc++.h:67: /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h: In member function ‘std::_Deque_base_Tp, _Alloc::_Deque_impl std::_Deque_base_Tp, _Alloc::_M_move_impl()’: /Users/fx/devel/gcc/ibin2/x86_64-apple-darwin14.0.0/libstdc++-v3/include/bits/stl_deque.h:645:17: error: expected primary-expression before ‘__attribute’ _Tp_alloc_type __attribute((__unused__)) {std::move(__alloc)}; ^ make[2]: *** [x86_64-apple-darwin14.0.0/bits/stdc++.h.gch/O2ggnu++0x.gch] Error 1 Should be fixed with this renaming. Tested x86_64-linux, committed to trunk. commit 3a81c243672bd721f15bc6320fc7a82e850fc3d8 Author: Jonathan Wakely jwak...@redhat.com Date: Tue Nov 11 10:11:09 2014 + PR libstdc++/63811 * include/bits/stl_deque.h (_Deque_base::_M_move_impl()): Avoid using badname. diff --git a/libstdc++-v3/include/bits/stl_deque.h b/libstdc++-v3/include/bits/stl_deque.h index c0052b3..3a1c85d 100644 --- a/libstdc++-v3/include/bits/stl_deque.h +++ b/libstdc++-v3/include/bits/stl_deque.h @@ -642,7 +642,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER // Create a copy of the current allocator. _Tp_alloc_type __alloc{_M_get_Tp_allocator()}; // Put that copy in a moved-from state. - _Tp_alloc_type __unused __attribute((__unused__)) {std::move(__alloc)}; + _Tp_alloc_type __sink __attribute((__unused__)) {std::move(__alloc)}; // Create an empty map that allocates using the moved-from allocator. _Deque_base __empty{__alloc}; // Now safe to modify current allocator and perform non-throwing swaps.
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a gassign * (s1); gassign *a2 = as_a gassign * (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. But even that looks like just adding extra work for all developers, with no gain. You only have to add extra code and extra temporaries, in switches typically also have to add {} because of the temporaries and thus extra indentation level, and it doesn't simplify anything in the code. The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from gimple to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. Can you e.g. compare the size of - lines in your patchset combined, and size of + lines in your patchset? As in, if your changes lead to less typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a gassign * (g); } into a gimple-compat.h header which you include in places that are not converted nicely. Both avoid manually making the compiler happy (which the explicit as_a stuff is! It doesn't add any checking - it's just placing the as_a at the callers and thus make the runtine ICE fire there). As much as I don't like global conversion operators I don't like adding overloads to all of the accessor functions even more. Whether you enable them generally or just for selected files via a gimple-compat.h will be up to you (but I'd rather get rid of them at some point). Note this allows seamless transform of random functions taking a gimple now but really only expecting a single kind. Note that we don't absolutely have to rush this all in for GCC 5. Being the very first for GCC 6 stage1 is another possibility. We just should get it right. Thanks, Richard. Jakub
Re: [PATCH, i386]: Use std::swap
On Tue, Nov 11, 2014 at 10:41 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Nov 11, 2014 at 9:09 AM, Uros Bizjak ubiz...@gmail.com wrote: On Mon, Nov 10, 2014 at 10:51 PM, Marc Glisse marc.gli...@inria.fr wrote: On Mon, 10 Nov 2014, Richard Biener wrote: No extra includes required? utility is already included in wide-int.h and rtl.h, should probably move those. Bah, we hit a problem. std::swap has been moved from algorithm to utility in C++11, and the patch breaks build on CentOS 5.11 (gcc-4.1.2). Short of reverting the i386.c patch, is there a quick solution by including some additional headers? Attached patch that implements both suggestions from Richi and Marc fixes the bootstrap. 2014-11-11 Uros Bizjak ubiz...@gmail.com * system.h: Include algorithm and utility. * rtl.h: Do not include utility here. * wide-int.h: Ditto. * tree-vect-data-refs.c (swap): Remove template. (vect_prune_runtime_alias_test_list): Use std::swap instead of swap. Bootstrapped on x86_64-linux-gnu (CentOS 5.11). OK for mainline? BTW: There are lots of places where std::swap can be used, a nice search-and-replace task for someone to start with gcc development. ;) Agreed ;) Note that we have to be careful to avoid pulling all of libstdc++ into all files via system.h (system.h is so a bad thing... :/). Ok. Thanks, Richard. Uros.
[PATCH][fortran] PR 63701 Make sure variable is always used initialised
Hi all, As this trivial PR says, found is not initialised, later conditionally set to true in the for loop that follows and gcc_asserted in the end. It is expected that the found = true; statement will always be hit, but in case something elsewhere goes wrong and it is not, we want the gcc_assert to use a properly initialised found = false value. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR fortran/63701 * trans-expr.c (gfc_get_tree_for_caf_expr): Initialise found to false.diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 18bc502..b36acbe 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -1406,7 +1406,7 @@ tree gfc_get_tree_for_caf_expr (gfc_expr *expr) { tree caf_decl; - bool found; + bool found = false; gfc_ref *ref; gcc_assert (expr expr-expr_type == EXPR_VARIABLE);
Re: [x86, 6/n] Replace builtins with vector extensions
On 11 Nov 10:28, Marc Glisse wrote: On Tue, 11 Nov 2014, Kirill Yukhin wrote: Hello Marc, Uroš, On 10 Nov 21:33, Uros Bizjak wrote: On Sun, Nov 9, 2014 at 5:26 PM, Marc Glisse marc.gli...@inria.fr wrote: Hello, and == for integer vectors of size 128. I was surprised not to find _mm_cmplt_epi64 anywhere. Note that I can do the same for size 256, but not 512, there is no corresponding intrinsic, there are only _mask versions that return a mask. Let's ask Kirill (CC'd) about missing intrinsics. We have no `_mm_cmplt_epi64' intrinsic because there's no such instruction in Intel ISA. We have _mm_cmplt_epi32 without a corresponding instruction though ;-) (yes, it is useless) Right, but not in official SDM [1]. I believe this extra intrinsics were added for compatibility w/ ICC which also features it. -- Marc Glisse [1] - http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-software-developer-manual-325462.pdf -- Thanks, K
Re: [C PATCH] warn for empty struct -Wc++-compat
On Tue, Nov 11, 2014 at 04:45:46AM +0530, Prathamesh Kulkarni wrote: Index: gcc/c/c-decl.c === --- gcc/c/c-decl.c(revision 217287) +++ gcc/c/c-decl.c(working copy) @@ -606,6 +606,8 @@ /* If warn_cxx_compat, a list of typedef names used when defining fields in this struct. */ vectree typedefs_seen; + /* code to distinguish between struct/union */ + enum tree_code code; I don't think this is desirable, you might just pass T down from finish_struct to warn_cxx_compat_finish_struct. @@ -7506,12 +7509,19 @@ /* Finish up struct info used by -Wc++-compat. */ static void -warn_cxx_compat_finish_struct (tree fieldlist) +warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc) { unsigned int ix; tree x; struct c_binding *b; + if (fieldlist == NULL_TREE) +{ + warning_at (record_loc, OPT_Wc___compat, + empty %s has size 0 in C, 1 in C++, + (struct_parse_info-code == RECORD_TYPE) ? struct : union); +} + I think this won't work well wrt translations, so you need to have an if here. See the pedwarns at the beginning of finish_struct. Index: gcc/testsuite/gcc.dg/Wcxx-compat-22.c === --- gcc/testsuite/gcc.dg/Wcxx-compat-22.c (revision 0) +++ gcc/testsuite/gcc.dg/Wcxx-compat-22.c (working copy) @@ -0,0 +1,4 @@ +/* { dg-do compile } */ +/* { dg-options -Wc++-compat } */ +struct A {}; /* { dg-warning empty struct has size 0 in C } */ +union B {}; /* { dg-warning empty union has size 0 in C } */ Please also test an empty struct in a struct. Thanks, Marek
Re: [C PATCH] warn for empty struct -Wc++-compat
On Tue, 11 Nov 2014, Marek Polacek wrote: @@ -7506,12 +7509,19 @@ /* Finish up struct info used by -Wc++-compat. */ static void -warn_cxx_compat_finish_struct (tree fieldlist) +warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc) { unsigned int ix; tree x; struct c_binding *b; + if (fieldlist == NULL_TREE) +{ + warning_at (record_loc, OPT_Wc___compat, + empty %s has size 0 in C, 1 in C++, + (struct_parse_info-code == RECORD_TYPE) ? struct : union); +} + I think this won't work well wrt translations, so you need to have an if here. See the pedwarns at the beginning of finish_struct. Do keywords like struct/union really require translation? -- Marc Glisse
Re: [PATCH][fortran] PR 63701 Make sure variable is always used initialised
2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com PR fortran/63701 * trans-expr.c (gfc_get_tree_for_caf_expr): Initialise found to false.init-found.patch OK, thanks for the patch. FX
[Patch AArch64] Fix up BSL expander for floating point types
Hi, As Ramana hinted here: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00607.html There are two issues with the way we've defined our BSL pattern. We pun types around in a way that is scary and quite likely unsafe, and we haven't canonicalized the pattern so combine is unlikely to pick it up. This patch fixes both of these issues and adds testcases to ensure we are picking up the combine opportunity. I've bootstrapped and tested this on aarch64-none-linux-gnu and cross-tested it for aarch64-none-elf. OK? Cheers, James --- gcc/ 2014-11-11 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-simd.md (aarch64_simd_bslmode_internal): Remove float cases, canonicalize. (aarch64_simd_bslmode): Add gen_lowpart expressions where we are punning between float vectors and integer vectors. gcc/testsuite/ 2014-11-11 James Greenhalgh james.greenha...@arm.com * gcc.target/aarch64/vbslq_f64_1.c: New. * gcc.target/aarch64/vbslq_f64_2.c: Likewise. * gcc.target/aarch64/vbslq_u64_1.c: Likewise. * gcc.target/aarch64/vbslq_u64_2.c: Likewise. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index ef196e4b6fb39c0d2fd9ebfee76abab8369b1e92..f7012ecab07c1b38836e949c2f4e5bd0c7939b5c 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -1924,15 +1924,15 @@ (define_insn aarch64_reduc_maxmin_uns ;; bif op0, op1, mask (define_insn aarch64_simd_bslmode_internal - [(set (match_operand:VALLDIF 0 register_operand =w,w,w) - (ior:VALLDIF - (and:VALLDIF - (match_operand:V_cmp_result 1 register_operand 0,w,w) - (match_operand:VALLDIF 2 register_operand w,w,0)) - (and:VALLDIF + [(set (match_operand:VSDQ_I_DI 0 register_operand =w,w,w) + (ior:VSDQ_I_DI + (and:VSDQ_I_DI (not:V_cmp_result - (match_dup:V_cmp_result 1)) - (match_operand:VALLDIF 3 register_operand w,0,w)) + (match_operand:V_cmp_result 1 register_operand 0,w,w)) + (match_operand:VSDQ_I_DI 3 register_operand w,0,w)) + (and:VSDQ_I_DI + (match_dup:V_cmp_result 1) + (match_operand:VSDQ_I_DI 2 register_operand w,w,0)) ))] TARGET_SIMD @ @@ -1950,9 +1950,21 @@ (define_expand aarch64_simd_bslmode TARGET_SIMD { /* We can't alias operands together if they have different modes. */ + rtx tmp = operands[0]; + if (FLOAT_MODE_P (MODEmode)) +{ + operands[2] = gen_lowpart (V_cmp_resultmode, operands[2]); + operands[3] = gen_lowpart (V_cmp_resultmode, operands[3]); + tmp = gen_reg_rtx (V_cmp_resultmode); +} operands[1] = gen_lowpart (V_cmp_resultmode, operands[1]); - emit_insn (gen_aarch64_simd_bslmode_internal (operands[0], operands[1], - operands[2], operands[3])); + emit_insn (gen_aarch64_simd_bslv_cmp_result_internal (tmp, + operands[1], + operands[2], + operands[3])); + if (tmp != operands[0]) +emit_move_insn (operands[0], gen_lowpart (MODEmode, tmp)); + DONE; }) diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c new file mode 100644 index 000..7b0e8f9 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_1.c @@ -0,0 +1,20 @@ +/* Test vbslq_f64 can be folded. */ +/* { dg-do assemble } */ +/* { dg-options --save-temps -O3 } */ + +#include arm_neon.h + +/* Folds to ret. */ + +float32x4_t +fold_me (float32x4_t a, float32x4_t b) +{ + uint32x4_t mask = {-1, -1, -1, -1}; + return vbslq_f32 (mask, a, b); +} + +/* { dg-final { scan-assembler-not bsl\\tv } } */ +/* { dg-final { scan-assembler-not bit\\tv } } */ +/* { dg-final { scan-assembler-not bif\\tv } } */ + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c new file mode 100644 index 000..1dca90d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vbslq_f64_2.c @@ -0,0 +1,23 @@ +/* Test vbslq_f64 can be folded. */ +/* { dg-do assemble } */ +/* { dg-options --save-temps -O3 } */ + +#include arm_neon.h + +/* Should fold out one half of the BSL, leaving just a BIC. */ + +float32x4_t +half_fold_me (uint32x4_t mask) +{ + float32x4_t a = {0.0, 0.0, 0.0, 0.0}; + float32x4_t b = {2.0, 4.0, 8.0, 16.0}; + return vbslq_f32 (mask, a, b); + +} + +/* { dg-final { scan-assembler-not bsl\\tv } } */ +/* { dg-final { scan-assembler-not bit\\tv } } */ +/* { dg-final { scan-assembler-not bif\\tv } } */ +/* { dg-final { scan-assembler bic\\tv } } */ + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c new file mode 100644 index 000..9c61d1a --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vbslq_u64_1.c @@ -0,0 +1,16 @@ +/* Test if a BSL-like instruction can be generated from a C idiom. */ +/* { dg-do assemble } */ +/* { dg-options --save-temps -O3 }
[x86, 7/n] Replace builtins with vector extensions
Hello, last patch, extending == and to size 256. Regtested as usual. Is the branch ready to be merged into trunk? -- Marc GlisseIndex: ChangeLog.x86-intrinsics-ext === --- ChangeLog.x86-intrinsics-ext(revision 217319) +++ ChangeLog.x86-intrinsics-ext(working copy) @@ -1,10 +1,17 @@ +2014-11-11 Marc Glisse marc.gli...@inria.fr + + * config/i386/avx2intrin.h (_mm256_cmpeq_epi8, _mm256_cmpeq_epi16, + _mm256_cmpeq_epi32, _mm256_cmpeq_epi64, _mm256_cmpgt_epi8, + _mm256_cmpgt_epi16, _mm256_cmpgt_epi32, _mm256_cmpgt_epi64): + Use vector extensions instead of builtins. + 2014-11-10 Marc Glisse marc.gli...@inria.fr * config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16, _mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32, _mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector extensions instead of builtins. * config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64): Likewise. 2014-11-10 Marc Glisse marc.gli...@inria.fr Index: config/i386/avx2intrin.h === --- config/i386/avx2intrin.h(revision 217318) +++ config/i386/avx2intrin.h(working copy) @@ -223,73 +223,70 @@ _mm256_blend_epi16 (__m256i __X, __m256i #else #define _mm256_blend_epi16(X, Y, M)\ ((__m256i) __builtin_ia32_pblendw256 ((__v16hi)(__m256i)(X), \ (__v16hi)(__m256i)(Y), (int)(M))) #endif extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi8 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpeqb256 ((__v32qi)__A, (__v32qi)__B); + return (__m256i) ((__v32qi)__A == (__v32qi)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi16 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpeqw256 ((__v16hi)__A, (__v16hi)__B); + return (__m256i) ((__v16hi)__A == (__v16hi)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi32 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpeqd256 ((__v8si)__A, (__v8si)__B); + return (__m256i) ((__v8si)__A == (__v8si)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi64 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpeqq256 ((__v4di)__A, (__v4di)__B); + return (__m256i) ((__v4di)__A == (__v4di)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi8 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpgtb256 ((__v32qi)__A, -(__v32qi)__B); + return (__m256i) ((__v32qi)__A (__v32qi)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi16 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpgtw256 ((__v16hi)__A, -(__v16hi)__B); + return (__m256i) ((__v16hi)__A (__v16hi)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi32 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpgtd256 ((__v8si)__A, -(__v8si)__B); + return (__m256i) ((__v8si)__A (__v8si)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpgt_epi64 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpgtq256 ((__v4di)__A, (__v4di)__B); + return (__m256i) ((__v4di)__A (__v4di)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_hadd_epi16 (__m256i __X, __m256i __Y) { return (__m256i) __builtin_ia32_phaddw256 ((__v16hi)__X, (__v16hi)__Y); }
Re: [C PATCH] warn for empty struct -Wc++-compat
On Tue, Nov 11, 2014 at 12:13:32PM +0100, Marc Glisse wrote: On Tue, 11 Nov 2014, Marek Polacek wrote: @@ -7506,12 +7509,19 @@ /* Finish up struct info used by -Wc++-compat. */ static void -warn_cxx_compat_finish_struct (tree fieldlist) +warn_cxx_compat_finish_struct (tree fieldlist, location_t record_loc) { unsigned int ix; tree x; struct c_binding *b; + if (fieldlist == NULL_TREE) +{ + warning_at (record_loc, OPT_Wc___compat, + empty %s has size 0 in C, 1 in C++, + (struct_parse_info-code == RECORD_TYPE) ? struct : union); +} + I think this won't work well wrt translations, so you need to have an if here. See the pedwarns at the beginning of finish_struct. Do keywords like struct/union really require translation? C keywords don't require translation, but you always need to have complete sentences in diagnostics so I better pointed it out. Joseph would know better than me though. Marek
Re: [PATCH] libstdc++ - Add xmethods for associative containers (ordered and unordered)
On 10/11/14 21:49 +, Jonathan Wakely wrote: On 09/11/14 16:00 -0800, Siva Chandra wrote: Hello, Attached is a patch which adds xmethods for the associative containers (set, map, multiset and multimap) and their unordered versions. I think the GDB Python API is not rich enough to implement xmethods for the more interesting methods like find, count etc. The attached patch only implements xmethods for size and empty. That way, it is a fairly straightforward patch. This looks fine, I'll commit it soon. Thanks. Committed to trunk.
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On 11/11/2014 09:30 AM, Eric Botcazou wrote: I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. IMO that's the sort of things some of us were afraid of when the C++ switch was being discussed and IIRC we were told this would not happen... I'm with both of you on this. Bernd
[PATCH][AArch64] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Hi all, This is the aarch64 implementation of the macro fusion hook, used to fuse mov+movk instructions together. A new field is declared in the tuning struct and as we add more fuseable ops in the future we will fill in more bits in the fuseable_ops field. Bootstrapped and tested on aarch64-none-linux-gnu. Ok for trunk? 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64-protos.h (struct tune_params): Add fuseable_ops field. * config/aarch64/aarch64.c (generic_tunings): Specify fuseable_ops. (cortexa53_tunings): Likewise. (cortexa57_tunings): Likewise. (thunderx_tunings): Likewise. (aarch64_macro_fusion_p): New function. (aarch_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (AARCH64_FUSE_MOV_MOVK): Likewise. (AARCH64_FUSE_NOTHING): Likewise.commit 3181b0988eed091c8b1ead7a6381c6f9aee7774e Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Tue Oct 21 10:36:48 2014 +0100 [AArch64] Implement TARGET_MACRO_FUSION diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 810644c..d3d295d 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -170,6 +170,7 @@ struct tune_params const struct cpu_vector_cost *const vec_costs; const int memmov_cost; const int issue_rate; + const unsigned int fuseable_ops; }; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9aeac7c..96f6c47 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -299,6 +299,9 @@ static const struct cpu_vector_cost cortexa57_vector_cost = NAMED_PARAM (cond_not_taken_branch_cost, 1) }; +#define AARCH64_FUSE_NOTHING (0) +#define AARCH64_FUSE_MOV_MOVK (1 0) + #if HAVE_DESIGNATED_INITIALIZERS GCC_VERSION = 2007 __extension__ #endif @@ -309,7 +312,8 @@ static const struct tune_params generic_tunings = generic_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) }; static const struct tune_params cortexa53_tunings = @@ -319,7 +323,8 @@ static const struct tune_params cortexa53_tunings = cortexa53_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) }; static const struct tune_params cortexa57_tunings = @@ -329,7 +334,8 @@ static const struct tune_params cortexa57_tunings = cortexa57_regmove_cost, cortexa57_vector_cost, NAMED_PARAM (memmov_cost, 4), - NAMED_PARAM (issue_rate, 3) + NAMED_PARAM (issue_rate, 3), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_MOV_MOVK) }; static const struct tune_params thunderx_tunings = @@ -339,7 +345,8 @@ static const struct tune_params thunderx_tunings = thunderx_regmove_cost, generic_vector_cost, NAMED_PARAM (memmov_cost, 6), - NAMED_PARAM (issue_rate, 2) + NAMED_PARAM (issue_rate, 2), + NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING) }; /* A processor implementing AArch64. */ @@ -10017,6 +10024,48 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size, return default_use_by_pieces_infrastructure_p (size, align, op, speed_p); } +static bool +aarch64_macro_fusion_p (void) +{ + return aarch64_tune_params-fuseable_ops != AARCH64_FUSE_NOTHING; +} + +static bool +aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr) +{ + rtx set_dest; + rtx prev_set = single_set (prev); + rtx curr_set = single_set (curr); + + if (!prev_set + || !curr_set) +return false; + + if (any_condjump_p (curr)) +return false; + + if (!aarch64_macro_fusion_p ()) +return false; + + if (aarch64_tune_params-fuseable_ops AARCH64_FUSE_MOV_MOVK) +{ + /* We are trying to fuse + mov imm / movk imm + instructions as a group that gets scheduled together. */ + + set_dest = SET_DEST (curr_set); + + return GET_CODE (set_dest) == ZERO_EXTRACT + CONST_INT_P (SET_SRC (curr_set)) + CONST_INT_P (SET_SRC (prev_set)) + REG_P (XEXP (set_dest, 0)) + REG_P (SET_DEST (prev_set)) + REGNO (XEXP (set_dest, 0)) == REGNO (SET_DEST (prev_set)); +} + + return false; +} + #undef TARGET_ADDRESS_COST #define TARGET_ADDRESS_COST aarch64_address_cost @@ -10273,6 +10322,12 @@ aarch64_use_by_pieces_infrastructure_p (unsigned int size, #define TARGET_USE_BY_PIECES_INFRASTRUCTURE_P \ aarch64_use_by_pieces_infrastructure_p +#undef TARGET_SCHED_MACRO_FUSION_P +#define TARGET_SCHED_MACRO_FUSION_P aarch64_macro_fusion_p + +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P +#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p + struct gcc_target targetm =
[PATCH][ARM] Implement TARGET_SCHED_MACRO_FUSION_PAIR_P
Hi all, This is the arm implementation of the macro fusion hook. It tries to fuse movw+movt operations together. It also tries to take lo_sum RTXs into account since those generate movt instructions as well. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm-protos.h (tune_params): Add fuseable_ops field. * config/arm/arm.c (arm_macro_fusion_p): New function. (arm_macro_fusion_pair_p): Likewise. (TARGET_SCHED_MACRO_FUSION_P): Define. (TARGET_SCHED_MACRO_FUSION_PAIR_P): Likewise. (ARM_FUSE_NOTHING): Likewise. (ARM_FUSE_MOVW_MOVT): Likewise. (arm_slowmul_tune, arm_fastmul_tune, arm_strongarm_tune, arm_xscale_tune, arm_9e_tune, arm_v6t2_tune, arm_cortex_tune, arm_cortex_a8_tune, arm_cortex_a7_tune, arm_cortex_a15_tune, arm_cortex_a53_tune, arm_cortex_a57_tune, arm_cortex_a9_tune, arm_cortex_a12_tune, arm_v7m_tune, arm_v6m_tune, arm_fa726te_tune arm_cortex_a5_tune): Specify fuseable_ops value.diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index a37aa80..98e3cf0 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -281,6 +281,8 @@ struct tune_params bool string_ops_prefer_neon; /* Maximum number of instructions to inline calls to memset. */ int max_insns_inline_memset; + /* Bitfield encoding the fuseable pairs of instructions. */ + unsigned int fuseable_ops; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 3f2ddd4..40df4c0 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -258,6 +258,7 @@ static tree arm_build_builtin_va_list (void); static void arm_expand_builtin_va_start (tree, rtx); static tree arm_gimplify_va_arg_expr (tree, tree, gimple_seq *, gimple_seq *); static void arm_option_override (void); +static bool arm_macro_fusion_p (void); static unsigned HOST_WIDE_INT arm_shift_truncation_mask (machine_mode); static bool arm_cannot_copy_insn_p (rtx_insn *); static int arm_issue_rate (void); @@ -296,6 +297,7 @@ static int arm_default_branch_cost (bool, bool); static int arm_cortex_a5_branch_cost (bool, bool); static int arm_cortex_m_branch_cost (bool, bool); +static bool aarch_macro_fusion_pair_p (rtx_insn*, rtx_insn*); static bool arm_vectorize_vec_perm_const_ok (machine_mode vmode, const unsigned char *sel); @@ -404,6 +406,12 @@ static const struct attribute_spec arm_attribute_table[] = #undef TARGET_COMP_TYPE_ATTRIBUTES #define TARGET_COMP_TYPE_ATTRIBUTES arm_comp_type_attributes +#undef TARGET_SCHED_MACRO_FUSION_P +#define TARGET_SCHED_MACRO_FUSION_P arm_macro_fusion_p + +#undef TARGET_SCHED_MACRO_FUSION_PAIR_P +#define TARGET_SCHED_MACRO_FUSION_PAIR_P aarch_macro_fusion_pair_p + #undef TARGET_SET_DEFAULT_TYPE_ATTRIBUTES #define TARGET_SET_DEFAULT_TYPE_ATTRIBUTES arm_set_default_type_attributes @@ -1710,6 +1718,9 @@ const struct cpu_cost_table v7m_extra_costs = } }; +#define ARM_FUSE_NOTHING (0) +#define ARM_FUSE_MOVW_MOVT (1 0) + const struct tune_params arm_slowmul_tune = { arm_slowmul_rtx_costs, @@ -1726,7 +1737,8 @@ const struct tune_params arm_slowmul_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; const struct tune_params arm_fastmul_tune = @@ -1745,7 +1757,8 @@ const struct tune_params arm_fastmul_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -1767,7 +1780,8 @@ const struct tune_params arm_strongarm_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */ - 8 /* Maximum insns to inline memset. */ + 8, /* Maximum insns to inline memset. */ + ARM_FUSE_NOTHING/* Fuseable pairs of instructions. */ }; const struct tune_params arm_xscale_tune = @@ -1786,7 +1800,8 @@ const struct tune_params arm_xscale_tune = false,/* Prefer Neon for 64-bits bitops. */ false, false, /* Prefer 32-bit encodings. */ false, /* Prefer Neon for stringops. */
Re: [7/7] nvptx testsuite patches: Return addresses
On 11/10/2014 09:19 PM, H.J. Lu wrote: I checked in this patch to revert the accidental checkin. Sorry about that, and thanks for fixing it. Bernd
Fix PR ada/42978
This makes it so that gnatmake echoes the full command line passed to ranlib. Tested on x86_64-suse-linux, applied on the mainline. 2014-11-11 Simon Wright si...@pushface.org PR ada/42978 * mlib-utl.adb (ar): Output the options passed to ranlib. -- Eric BotcazouIndex: mlib-utl.adb === --- mlib-utl.adb (revision 217259) +++ mlib-utl.adb (working copy) @@ -282,6 +282,10 @@ package body MLib.Utl is if not Opt.Quiet_Output then Write_Str (Ranlib_Name.all); Write_Char (' '); +for J in Ranlib_Options'Range loop + Write_Str (Ranlib_Options (J).all); + Write_Char (' '); +end loop; Write_Line (Arguments (Ar_Options'Length + 1).all); end if;
[PATCH][ARM/AArch64] Improve modeled latency between FP operations and FP-GP register moves
Hi all, This patch models the latency of moves between FP and GP registers on the A15 and A57 a bit more accurately by splitting the reservations for FP-GP and GP-FP moves and adding an appropriate bypass. Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-none-linux-gnu. Ok for trunk? Thanks, Kyrill 2014-11-11 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/cortex-a15-neon.md (cortex_a15_vfp_to_from_gp): Split into... (cortex_a15_gp_to_vfp): ...This. (cortex_a15_fp_to_gp): ...And this. Define and comment bypass from vfp operations to fp-gp moves.commit c176d3e691f470598a02507fa75a8294da954c3f Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Fri Jun 13 11:11:16 2014 +0100 [ARM/AArch64] Model FP-GP move stalls diff --git a/gcc/config/arm/cortex-a15-neon.md b/gcc/config/arm/cortex-a15-neon.md index 02d4a53..bc09cd6 100644 --- a/gcc/config/arm/cortex-a15-neon.md +++ b/gcc/config/arm/cortex-a15-neon.md @@ -655,10 +655,20 @@ (define_insn_reservation cortex_a15_vfp_cpys 4 (eq_attr type fmov)) ca15_issue1,ca15_cx_perm) -(define_insn_reservation cortex_a15_vfp_to_from_gp 5 +(define_insn_reservation cortex_a15_gp_to_vfp 5 (and (eq_attr tune cortexa15) - (eq_attr type f_mcr, f_mcrr, f_mrc, f_mrrc)) - ca15_issue1,ca15_ls1+ca15_ls2) + (eq_attr type f_mcr, f_mcrr)) + ca15_issue1,ca15_ls) + +(define_insn_reservation cortex_a15_mov_vfp_to_gp 5 + (and (eq_attr tune cortexa15) + (eq_attr type f_mrc, f_mrrc)) + ca15_issue1,ca15_ls) + +;; Moves from floating point registers to general purpose registers +;; induce additional latency. +(define_bypass 10 cortex_a15_vfp*, cortex_a15_neon*, cortex_a15_gp_to_vfp cortex_a15_mov_vfp_to_gp) + (define_insn_reservation cortex_a15_vfp_ariths 7 (and (eq_attr tune cortexa15)
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 11 Nov 2014, Jakub Jelinek wrote: On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote: Hello! do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā: ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope Index: system.h === --- system.h(revision 217338) +++ system.h(working copy) @@ -27,6 +27,7 @@ event inttypes.h gets pulled in by another header it is already defined. */ #define __STDC_FORMAT_MACROS +#define __STDC_LIMIT_MACROS /* We must include stdarg.h before stdio.h. */ #include stdarg.h Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... We could also start using the standard C++ mechanism (numeric_limits). Except int64_t does not have to be defined for a C++ implementation. Thanks, Andrew (nothing wrong with INTTYPE_MAXIMUM, just an alternative) -- Marc Glisse
[PATCH] Look through widening type conversions for possible edge assertions
This patch is a replacement for the 2nd VRP refactoring patch. It simply teaches VRP to look through widening type conversions when finding suitable edge assertions, e.g. bool p = x != y; int q = (int) p; if (q == 0) // new edge assert: p == 0 and therefore x == y The new testcase requires that such an edge assertion be inserted. Full bootstrap + regtest on x86_64-unknown-linux-gnu in progress. Does the patch look OK for trunk if no new regressions? 2014-11-11 Patrick Palka ppa...@gcc.gnu.org gcc/ * tree-vrp.c (register_edge_assert_for): Look through widening type conversions for posible edge assertions. gcc/testsuite/ * gcc.dg/vrp-1.c: New testcase. --- gcc/testsuite/gcc.dg/vrp-1.c | 31 +++ gcc/tree-vrp.c | 22 ++ 2 files changed, 53 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c new file mode 100644 index 000..df5334e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-1.c @@ -0,0 +1,31 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +static void +compiletime_check_equals_1 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); + if (__p) +runtime_error (); +} + +static void +compiletime_check_equals_2 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); /* { dg-error call to } */ + if (__p) +runtime_error (); +} + +void +foo (int *x) +{ + compiletime_check_equals_1 (x, 5); + compiletime_check_equals_2 (x, 10); +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index f0a4382..979ab44 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -5634,6 +5634,7 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si, the value zero or one, then we may be able to assert values for SSA_NAMEs which flow into COND. */ + /* In the case of NAME == 1 or NAME != 0, for BIT_AND_EXPR defining statement of NAME we can assert both operands of the BIT_AND_EXPR have nonzero value. */ @@ -5673,6 +5674,27 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si, register_edge_assert_for_1 (op1, EQ_EXPR, e, si); } } + + /* In the case of NAME != 0 or NAME == 0, if NAME's defining statement + is a widening type conversion then we can assert that NAME's + RHS is accordingly nonzero or zero. */ + if ((comp_code == EQ_EXPR || comp_code == NE_EXPR) + integer_zerop (val)) +{ + gimple def_stmt = SSA_NAME_DEF_STMT (name); + if (is_gimple_assign (def_stmt)) + { + enum tree_code def_code = gimple_assign_rhs_code (def_stmt); + if (CONVERT_EXPR_CODE_P (def_code)) + { + tree lhs = gimple_assign_lhs (def_stmt); + tree rhs = gimple_assign_rhs1 (def_stmt); + if (TYPE_PRECISION (TREE_TYPE (lhs)) + = TYPE_PRECISION (TREE_TYPE (rhs))) + register_edge_assert_for_1 (rhs, comp_code, e, si); + } + } +} } -- 2.2.0.rc1.16.g6066a7e
Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code
On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote: This patch refactors the VRP edge-assertion code to make it always traverse SSA-name definitions in order to find suitable edge assertions to insert. Currently SSA-name definitions get traversed only when the LHS of the original conditional is a bitwise AND or OR operation which seems like a strange restriction. We should always try to traverse the SSA-name definitions inside the conditional, in particular for conditionals with the form: int p = x COMP y; if (p != 0) -- edge assertion: x COMP y Of course this specific case should have been simplified to if (x COMP y) if that comparison cannot trap and -fnon-call-exceptions is in effect. To achieve this the patch merges the mutually recursive functions register_edge_assert_for_1() and register_edge_assert_for_2() into a single recursive function, register_edge_assert_for_1(). In doing so, code duplication can be reduced and at the same time the more general logic allows VRP to detect more useful edge assertions. The recursion of the function register_edge_assert_for_1() is bounded by a new 'limit' argument which is arbitrarily set to 4 so that at most 4 levels of SSA-name definitions will be traversed per conditional. (Incidentally this hard recursion limit makes the related fix for PR 57685 unnecessary.) A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP (correctly) transforms the statement # prephitmp_35 = PHI pretmp_9(8), _28(10) into # prephitmp_35 = PHI pretmp_9(8), 1(10) and the uninit pass doesn't properly handle such PHIs containing a constant value as one of its arguments -- so a bogus uninit warning is now emitted. Did you try fixing that? It seems to me a constant should be easy to handle? Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress. Is it OK to commit if testing finishes with no new regressions? Ok. Thanks, Richard. 2014-11-11 Patrick Palka patr...@parcs.ath.cx gcc/ * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure that NAME always equals COND_OP0 or COND_OP1. (register_edge_assert_for, register_edge_assert_for_1, register_edge_assert_for_2): Refactor and consolidate edge-assertion logic into ... (register_edge_assert_for_2): ... here. Add LIMIT parameter. Rename to ... (register_edge_assert_for_1): ... this. gcc/testsuite/ * gcc.dg/vrp-1.c: New testcase. * gcc.dg/vrp-2.c: New testcase. * gcc.dg/uninit-pred-9_b.c: xfail test on line 24. --- gcc/testsuite/gcc.dg/uninit-pred-9_b.c | 2 +- gcc/testsuite/gcc.dg/vrp-1.c | 31 gcc/testsuite/gcc.dg/vrp-2.c | 78 ++ gcc/tree-vrp.c | 261 +++-- 4 files changed, 231 insertions(+), 141 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c index d9ae75e..555ec20 100644 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r) blah(v); /* { dg-bogus uninitialized bogus warning } */ if ( (n = 8) (m 99) (r 19) ) - blah(v); /* { dg-bogus uninitialized bogus warning } */ + blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* } } */ return 0; } diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c new file mode 100644 index 000..df5334e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-1.c @@ -0,0 +1,31 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +static void +compiletime_check_equals_1 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); + if (__p) +runtime_error (); +} + +static void +compiletime_check_equals_2 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); /* { dg-error call to } */ + if (__p) +runtime_error (); +} + +void +foo (int *x) +{ + compiletime_check_equals_1 (x, 5); + compiletime_check_equals_2 (x, 10); +} diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c new file mode 100644 index 000..5757c2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-2.c @@ -0,0 +1,78 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +void dummy (int x); + +void +bar (int x, int y, int z) +{ + int p = ~(x y z) == 37; + if (p) +{ + if (!x || !y || !z) + compiletime_error (); /* {
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 1:08 PM, Andrew Pinski pins...@gmail.com wrote: On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 11 Nov 2014, Jakub Jelinek wrote: On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote: Hello! do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā: ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope Index: system.h === --- system.h(revision 217338) +++ system.h(working copy) @@ -27,6 +27,7 @@ event inttypes.h gets pulled in by another header it is already defined. */ #define __STDC_FORMAT_MACROS +#define __STDC_LIMIT_MACROS /* We must include stdarg.h before stdio.h. */ #include stdarg.h Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... We could also start using the standard C++ mechanism (numeric_limits). Except int64_t does not have to be defined for a C++ implementation. Also not through stdint.h / cstdint? Note that we should only care for what happens in practice here. I hope that at least for more recent standards than C++04 (which is what we require IIRC) they are on parity with C99. Richard. Thanks, Andrew (nothing wrong with INTTYPE_MAXIMUM, just an alternative) -- Marc Glisse
Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code
On Tue, Nov 11, 2014 at 4:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote: This patch refactors the VRP edge-assertion code to make it always traverse SSA-name definitions in order to find suitable edge assertions to insert. Currently SSA-name definitions get traversed only when the LHS of the original conditional is a bitwise AND or OR operation which seems like a strange restriction. We should always try to traverse the SSA-name definitions inside the conditional, in particular for conditionals with the form: int p = x COMP y; if (p != 0) -- edge assertion: x COMP y Of course this specific case should have been simplified to if (x COMP y) if that comparison cannot trap and -fnon-call-exceptions is in effect. Except I have found that if p was used below also. We still have if(p != 0). I just saw that recently when I was working on enhancing PHI-opt. Thanks, Andrew Pinski To achieve this the patch merges the mutually recursive functions register_edge_assert_for_1() and register_edge_assert_for_2() into a single recursive function, register_edge_assert_for_1(). In doing so, code duplication can be reduced and at the same time the more general logic allows VRP to detect more useful edge assertions. The recursion of the function register_edge_assert_for_1() is bounded by a new 'limit' argument which is arbitrarily set to 4 so that at most 4 levels of SSA-name definitions will be traversed per conditional. (Incidentally this hard recursion limit makes the related fix for PR 57685 unnecessary.) A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP (correctly) transforms the statement # prephitmp_35 = PHI pretmp_9(8), _28(10) into # prephitmp_35 = PHI pretmp_9(8), 1(10) and the uninit pass doesn't properly handle such PHIs containing a constant value as one of its arguments -- so a bogus uninit warning is now emitted. Did you try fixing that? It seems to me a constant should be easy to handle? Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress. Is it OK to commit if testing finishes with no new regressions? Ok. Thanks, Richard. 2014-11-11 Patrick Palka patr...@parcs.ath.cx gcc/ * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure that NAME always equals COND_OP0 or COND_OP1. (register_edge_assert_for, register_edge_assert_for_1, register_edge_assert_for_2): Refactor and consolidate edge-assertion logic into ... (register_edge_assert_for_2): ... here. Add LIMIT parameter. Rename to ... (register_edge_assert_for_1): ... this. gcc/testsuite/ * gcc.dg/vrp-1.c: New testcase. * gcc.dg/vrp-2.c: New testcase. * gcc.dg/uninit-pred-9_b.c: xfail test on line 24. --- gcc/testsuite/gcc.dg/uninit-pred-9_b.c | 2 +- gcc/testsuite/gcc.dg/vrp-1.c | 31 gcc/testsuite/gcc.dg/vrp-2.c | 78 ++ gcc/tree-vrp.c | 261 +++-- 4 files changed, 231 insertions(+), 141 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c index d9ae75e..555ec20 100644 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r) blah(v); /* { dg-bogus uninitialized bogus warning } */ if ( (n = 8) (m 99) (r 19) ) - blah(v); /* { dg-bogus uninitialized bogus warning } */ + blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* } } */ return 0; } diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c new file mode 100644 index 000..df5334e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-1.c @@ -0,0 +1,31 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +static void +compiletime_check_equals_1 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); + if (__p) +runtime_error (); +} + +static void +compiletime_check_equals_2 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); /* { dg-error call to } */ + if (__p) +runtime_error (); +} + +void +foo (int *x) +{ + compiletime_check_equals_1 (x, 5); + compiletime_check_equals_2 (x, 10); +} diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c new file mode 100644 index 000..5757c2f --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-2.c @@ -0,0 +1,78 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__
Re: [PATCH] c++ify sreal
On Tue, Nov 11, 2014 at 4:54 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 11, 2014 at 1:08 PM, Andrew Pinski pins...@gmail.com wrote: On Tue, Nov 11, 2014 at 1:23 AM, Marc Glisse marc.gli...@inria.fr wrote: On Tue, 11 Nov 2014, Jakub Jelinek wrote: On Tue, Nov 11, 2014 at 08:51:41AM +0100, Uros Bizjak wrote: Hello! do $subject, and cleanup for always 64 bit hwi. bootstrapped + regtested x86_64-unknown-linux-gnu, ok? Ok. Can you please replace remaining HOST_WIDE_INT vestiges in there with [u]int64_t please? This patch breaks the build on debian 6.0: ../../gcc/sreal.c: In member function āint64_t sreal::to_int() constā: ../../gcc/sreal.c:159: error: āINT64_MAXā was not declared in this scope Index: system.h === --- system.h(revision 217338) +++ system.h(working copy) @@ -27,6 +27,7 @@ event inttypes.h gets pulled in by another header it is already defined. */ #define __STDC_FORMAT_MACROS +#define __STDC_LIMIT_MACROS /* We must include stdarg.h before stdio.h. */ #include stdarg.h Still, I don't believe it will be portable everywhere. Can't you use INTTYPE_MAXIMUM (int64_t) instead of INT64_MAX? We already use that in GCC... We could also start using the standard C++ mechanism (numeric_limits). Except int64_t does not have to be defined for a C++ implementation. Also not through stdint.h / cstdint? Note that we should only care for what happens in practice here. I hope that at least for more recent standards than C++04 (which is what we require IIRC) they are on parity with C99. C++03 did not add long long, only C++11 did. Thanks, Andrew Pinski Richard. Thanks, Andrew (nothing wrong with INTTYPE_MAXIMUM, just an alternative) -- Marc Glisse
Re: [C PATCH] warn for empty struct -Wc++-compat
On Tue, 11 Nov 2014, Marek Polacek wrote: + if (fieldlist == NULL_TREE) +{ + warning_at (record_loc, OPT_Wc___compat, + empty %s has size 0 in C, 1 in C++, + (struct_parse_info-code == RECORD_TYPE) ? struct : union); +} + I think this won't work well wrt translations, so you need to have an if here. See the pedwarns at the beginning of finish_struct. Do keywords like struct/union really require translation? C keywords don't require translation, but you always need to have complete sentences in diagnostics so I better pointed it out. Joseph would know better than me though. The situation where the above code would cause problems for translation is a language where struct and union have different grammatical gender and the translation of some other bit of the sentence (empty or has) needs to agree with that gender. -- Joseph S. Myers jos...@codesourcery.com
[PATCH][SPARC] default with_cpu to ultrasparc in sparc64-*-linux* targets
Hi. If no --with-cpu is specified at configure time gcc/config.gcc sets the cpu option in configure_default_options to `v9' in sparc64 targets. This leads to the usage of the following spec by the driver: %{!m32:%{!mcpu=*:-mcpu=v9}} Which in turn triggers the usage of -Av9 by default when invoking the assembler. This leads to failures when VIS instructions are used in inline assembly or .s files: [jemarch@install2 gcc]$ echo 'int main () { asm (fzero %f0); return 0; }' | gcc -xc - /tmp/cc1F9iJm.s: Assembler messages: /tmp/cc1F9iJm.s:11: Error: Architecture mismatch on fzero. /tmp/cc1F9iJm.s:11: (Requires v9a|v9b; requested architecture is v9.) This prevents building upstream glibc with a gcc configured with not --with-cpu option, for example. I think it would be reasonable to have gcc targetting ultrasparc extensions by default in sparc64-*-linux*. WDYT? Thanks. 2014-11-11 Jose E. Marchesi jose.march...@oracle.com * config.gcc: Use ultrasparc as the default with_cpu option in sparc64-*-linux* targets. Index: gcc/config.gcc === --- gcc/config.gcc (revision 217346) +++ gcc/config.gcc (working copy) @@ -2709,6 +2709,7 @@ tm_file=sparc/biarch64.h ${tm_file} dbxelf.h elfos.h sparc/sysv4.h gnu-user.h linux.h glibc-stdint.h sparc/default-64.h sparc/linux64.h sparc/tso.h extra_options=${extra_options} sparc/long-double-switch.opt tmake_file=${tmake_file} sparc/t-sparc sparc/t-linux64 + test x$with_cpu != x || with_cpu=ultrasparc ;; sparc64-*-freebsd*|ultrasparc-*-freebsd*) tm_file=${tm_file} ${fbsd_tm_file} dbxelf.h elfos.h sparc/sysv4.h sparc/freebsd.h
Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code
On Tue, Nov 11, 2014 at 1:56 PM, Andrew Pinski pins...@gmail.com wrote: On Tue, Nov 11, 2014 at 4:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote: This patch refactors the VRP edge-assertion code to make it always traverse SSA-name definitions in order to find suitable edge assertions to insert. Currently SSA-name definitions get traversed only when the LHS of the original conditional is a bitwise AND or OR operation which seems like a strange restriction. We should always try to traverse the SSA-name definitions inside the conditional, in particular for conditionals with the form: int p = x COMP y; if (p != 0) -- edge assertion: x COMP y Of course this specific case should have been simplified to if (x COMP y) if that comparison cannot trap and -fnon-call-exceptions is in effect. Except I have found that if p was used below also. We still have if(p != 0). I just saw that recently when I was working on enhancing PHI-opt. Yeah - one of forwprop's single-use restrictions. Definitely one we don't want to preserve though. Richard. Thanks, Andrew Pinski To achieve this the patch merges the mutually recursive functions register_edge_assert_for_1() and register_edge_assert_for_2() into a single recursive function, register_edge_assert_for_1(). In doing so, code duplication can be reduced and at the same time the more general logic allows VRP to detect more useful edge assertions. The recursion of the function register_edge_assert_for_1() is bounded by a new 'limit' argument which is arbitrarily set to 4 so that at most 4 levels of SSA-name definitions will be traversed per conditional. (Incidentally this hard recursion limit makes the related fix for PR 57685 unnecessary.) A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP (correctly) transforms the statement # prephitmp_35 = PHI pretmp_9(8), _28(10) into # prephitmp_35 = PHI pretmp_9(8), 1(10) and the uninit pass doesn't properly handle such PHIs containing a constant value as one of its arguments -- so a bogus uninit warning is now emitted. Did you try fixing that? It seems to me a constant should be easy to handle? Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress. Is it OK to commit if testing finishes with no new regressions? Ok. Thanks, Richard. 2014-11-11 Patrick Palka patr...@parcs.ath.cx gcc/ * tree-vrp.c (extract_code_and_val_from_cond_with_ops): Ensure that NAME always equals COND_OP0 or COND_OP1. (register_edge_assert_for, register_edge_assert_for_1, register_edge_assert_for_2): Refactor and consolidate edge-assertion logic into ... (register_edge_assert_for_2): ... here. Add LIMIT parameter. Rename to ... (register_edge_assert_for_1): ... this. gcc/testsuite/ * gcc.dg/vrp-1.c: New testcase. * gcc.dg/vrp-2.c: New testcase. * gcc.dg/uninit-pred-9_b.c: xfail test on line 24. --- gcc/testsuite/gcc.dg/uninit-pred-9_b.c | 2 +- gcc/testsuite/gcc.dg/vrp-1.c | 31 gcc/testsuite/gcc.dg/vrp-2.c | 78 ++ gcc/tree-vrp.c | 261 +++-- 4 files changed, 231 insertions(+), 141 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c create mode 100644 gcc/testsuite/gcc.dg/vrp-2.c diff --git a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c index d9ae75e..555ec20 100644 --- a/gcc/testsuite/gcc.dg/uninit-pred-9_b.c +++ b/gcc/testsuite/gcc.dg/uninit-pred-9_b.c @@ -21,7 +21,7 @@ int foo (int n, int l, int m, int r) blah(v); /* { dg-bogus uninitialized bogus warning } */ if ( (n = 8) (m 99) (r 19) ) - blah(v); /* { dg-bogus uninitialized bogus warning } */ + blah(v); /* { dg-bogus uninitialized bogus warning { xfail *-*-* } } */ return 0; } diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c new file mode 100644 index 000..df5334e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-1.c @@ -0,0 +1,31 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +static void +compiletime_check_equals_1 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); + if (__p) +runtime_error (); +} + +static void +compiletime_check_equals_2 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); /* { dg-error call to } */ + if (__p) +runtime_error (); +} + +void +foo (int *x) +{ + compiletime_check_equals_1 (x, 5); + compiletime_check_equals_2 (x, 10); +} diff --git a/gcc/testsuite/gcc.dg/vrp-2.c b/gcc/testsuite/gcc.dg/vrp-2.c
[Build, patch] Remove CLooG from the main configure.ac
Now that CLooG is no longer used by GCC, it makes sense to also remove it from the main configure file. Especially as the in-tree build currently only works if also CLooG is available. Build on x86-64-gnu-linux - and tested that Graphite still works.* OK for the trunk? [* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to be independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64, it's PR62630.] Tobias 2014-11-11 Tobias Burnus bur...@net-b.de * config/cloog.m4: Remove. * Makefile.def: Remove CLooG. * Makefile.tpl: Ditto. * configure.ac: Ditto. * configure: Regenerate. * Makefile.in: Ditto. Makefile.def|8 --- Makefile.tpl|6 -- config/cloog.m4 | 152 --- configure.ac| 47 ++--- 4 files changed, 6 insertions(+), 207 deletions(-) diff --git a/Makefile.def b/Makefile.def index dcbcd08..24dfb0b 100644 --- a/Makefile.def +++ b/Makefile.def @@ -66,11 +66,6 @@ host_modules= { module= isl; lib_path=.libs; bootstrap=true; extra_configure_flags='--disable-shared @extra_isl_gmp_configure_flags@'; extra_make_flags='V=1'; no_install= true; }; -host_modules= { module= cloog; lib_path=.libs; bootstrap=true; - extra_configure_flags='--disable-shared --with-gmp=system --with-bits=gmp --with-isl=system'; - extra_exports='CPPFLAGS=$(HOST_GMPINC) $(HOST_ISLINC) $$CPPFLAGS; export CPPFLAGS; LDFLAGS=-L$$r/$(HOST_SUBDIR)/gmp/.libs -L$$r/$(HOST_SUBDIR)/isl/.libs $$LDFLAGS; export LDFLAGS; '; - extra_make_flags='CPPFLAGS=$$CPPFLAGS LDFLAGS=$$LDFLAGS V=1'; - no_install= true; }; host_modules= { module= libelf; lib_path=.libs; bootstrap=true; extra_configure_flags='--disable-shared'; no_install= true; }; @@ -319,7 +314,6 @@ dependencies = { module=all-gcc; on=all-libiberty; hard=true; }; dependencies = { module=all-gcc; on=all-intl; }; dependencies = { module=all-gcc; on=all-mpfr; }; dependencies = { module=all-gcc; on=all-mpc; }; -dependencies = { module=all-gcc; on=all-cloog; }; dependencies = { module=all-gcc; on=all-build-texinfo; }; dependencies = { module=all-gcc; on=all-build-bison; }; dependencies = { module=all-gcc; on=all-build-flex; }; @@ -365,8 +359,6 @@ dependencies = { module=all-utils; on=all-libiberty; }; dependencies = { module=configure-mpfr; on=all-gmp; }; dependencies = { module=configure-mpc; on=all-mpfr; }; dependencies = { module=configure-isl; on=all-gmp; }; -dependencies = { module=configure-cloog; on=all-isl; }; -dependencies = { module=configure-cloog; on=all-gmp; }; // Host modules specific to gdb. dependencies = { module=configure-gdb; on=all-intl; }; diff --git a/Makefile.tpl b/Makefile.tpl index f7c7e38..884e02d 100644 --- a/Makefile.tpl +++ b/Makefile.tpl @@ -224,8 +224,6 @@ HOST_EXPORTS = \ GMPINC=$(HOST_GMPINC); export GMPINC; \ ISLLIBS=$(HOST_ISLLIBS); export ISLLIBS; \ ISLINC=$(HOST_ISLINC); export ISLINC; \ - CLOOGLIBS=$(HOST_CLOOGLIBS); export CLOOGLIBS; \ - CLOOGINC=$(HOST_CLOOGINC); export CLOOGINC; \ LIBELFLIBS=$(HOST_LIBELFLIBS) ; export LIBELFLIBS; \ LIBELFINC=$(HOST_LIBELFINC) ; export LIBELFINC; \ @if gcc-bootstrap @@ -318,10 +316,6 @@ HOST_GMPINC = @gmpinc@ HOST_ISLLIBS = @isllibs@ HOST_ISLINC = @islinc@ -# Where to find CLOOG -HOST_CLOOGLIBS = @clooglibs@ -HOST_CLOOGINC = @clooginc@ - # Where to find libelf HOST_LIBELFLIBS = @libelflibs@ HOST_LIBELFINC = @libelfinc@ diff --git a/config/cloog.m4 b/config/cloog.m4 deleted file mode 100644 index b80ac27..000 --- a/config/cloog.m4 +++ /dev/null @@ -1,152 +0,0 @@ -# This file is part of GCC. -# -# GCC is free software; you can redistribute it and/or modify it under -# the terms of the GNU General Public License as published by the Free -# Software Foundation; either version 3, or (at your option) any later -# version. -# -# GCC is distributed in the hope that it will be useful, but WITHOUT -# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or -# FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License -# for more details. -# -# You should have received a copy of the GNU General Public License -# along with GCC; see the file COPYING3. If not see -# http://www.gnu.org/licenses/. -# -# Contributed by Andreas Simbuerger simbu...@fim.uni-passau.de - -# CLOOG_INIT_FLAGS () -# - -# Provide configure switches for CLooG support. -# Initialize clooglibs/clooginc according to the user input. -AC_DEFUN([CLOOG_INIT_FLAGS], -[ - AC_ARG_WITH([cloog-include], -[AS_HELP_STRING( - [--with-cloog-include=PATH], - [Specify directory for installed CLooG include files])]) - AC_ARG_WITH([cloog-lib], -[AS_HELP_STRING( - [--with-cloog-lib=PATH], - [Specify the directory for the installed CLooG library])]) - - AC_ARG_ENABLE(cloog-version-check, -[AS_HELP_STRING( - [--disable-cloog-version-check], - [disable check for CLooG
Re: [PATCH] libstdc++ - Add xmethods for associative containers (ordered and unordered)
On Tue, Nov 11, 2014 at 5:03 AM, Siva Chandra sivachan...@google.com wrote: On Tue, Nov 11, 2014 at 3:38 AM, Jonathan Wakely jwak...@redhat.com wrote: On 10/11/14 21:49 +, Jonathan Wakely wrote: On 09/11/14 16:00 -0800, Siva Chandra wrote: Hello, Attached is a patch which adds xmethods for the associative containers (set, map, multiset and multimap) and their unordered versions. I think the GDB Python API is not rich enough to implement xmethods for the more interesting methods like find, count etc. The attached patch only implements xmethods for size and empty. That way, it is a fairly straightforward patch. This looks fine, I'll commit it soon. Thanks. Committed to trunk. Thanks for the quick review and commit. (Sorry for the premature Send earlier.)
Re: [Build, patch] Remove CLooG from the main configure.ac
On 11.11.2014 14:01, Tobias Burnus wrote: Now that CLooG is no longer used by GCC, it makes sense to also remove it from the main configure file. Especially as the in-tree build currently only works if also CLooG is available. Build on x86-64-gnu-linux - and tested that Graphite still works.* OK for the trunk? [* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to be independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64, it's PR62630.] Conceptually that is the right way to go. This requires however the OK from a autoconf maintainer. Tobias
Re: [C++ Patch] PR 63265
Hi, On 11/10/2014 06:16 PM, Jason Merrill wrote: I don't think we want to suppress this warning in general. The problem in this PR is that the warning code is failing to recognize that the first operand is constant false. Thanks. Then, shall we do something like the below? Passes testing. Thanks, Paolo. // /cp 2014-11-11 Paolo Carlini paolo.carl...@oracle.com PR c++/63265 * pt.c (tsubst_copy_and_build, case COND_EXPR): Maybe fold to const the condition. /testsuite 2014-11-11 Paolo Carlini paolo.carl...@oracle.com PR c++/63265 * g++.dg/cpp0x/constexpr-63265.C: New. Index: cp/pt.c === --- cp/pt.c (revision 217342) +++ cp/pt.c (working copy) @@ -15137,7 +15137,9 @@ tsubst_copy_and_build (tree t, case COND_EXPR: { - tree cond = RECUR (TREE_OPERAND (t, 0)); + tree cond + = maybe_constant_value (fold_non_dependent_expr_sfinae + (RECUR (TREE_OPERAND (t, 0)), tf_none)); tree exp1, exp2; if (TREE_CODE (cond) == INTEGER_CST) Index: testsuite/g++.dg/cpp0x/constexpr-63265.C === --- testsuite/g++.dg/cpp0x/constexpr-63265.C(revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-63265.C(working copy) @@ -0,0 +1,19 @@ +// PR c++/63265 +// { dg-do compile { target c++11 } } + +#define LSHIFT (sizeof(unsigned int) * __CHAR_BIT__) + +template int lshift +struct SpuriouslyWarns1 { +static constexpr unsigned int v = lshift LSHIFT ? 1U lshift : 0; +}; + +static_assert(SpuriouslyWarns1LSHIFT::v == 0, Impossible occurred); + +template int lshift +struct SpuriouslyWarns2 { +static constexpr bool okay = lshift LSHIFT; +static constexpr unsigned int v = okay ? 1U lshift : 0; +}; + +static_assert(SpuriouslyWarns2LSHIFT::v == 0, Impossible occurred);
Re: [patch] OpenACC fortran front end
Hi, On 11 Nov 08:10, Jakub Jelinek wrote: For the middle-end and libgomp changes, can you talk to the Intel folks to update their git branch to latest trunk (so that you have the nvptx bits in there) and send middle-end and libgomp diffs against that? As far as I remember, most of the changes from the branch are now approved, they are just waiting for review of the LTO related changes in the middle-end (please, correct me if I've missed something). The updated branch is here: https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/kyukhin/gomp4-offload It contains 7 common patches. Patches 2-4 are waiting for LTO review, the others are approved. -- Ilya
Re: [x86, 7/n] Replace builtins with vector extensions
On Tue, Nov 11, 2014 at 12:35 PM, Marc Glisse marc.gli...@inria.fr wrote: last patch, extending == and to size 256. Regtested as usual. Is the branch ready to be merged into trunk? 2014-11-10 Marc Glisse marc.gli...@inria.fr * config/i386/emmintrin.h (_mm_cmpeq_epi8, _mm_cmpeq_epi16, _mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32, _mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32): Use vector extensions instead of builtins. * config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64): Likewise. OK. Please post the complete ChangeLog and complete patch for a review and merge. Thanks, Uros.
Re: [C++ Patch] PR 63265
On 11/11/2014 08:04 AM, Paolo Carlini wrote: - tree cond = RECUR (TREE_OPERAND (t, 0)); + tree cond + = maybe_constant_value (fold_non_dependent_expr_sfinae + (RECUR (TREE_OPERAND (t, 0)), tf_none)); I like this approach, but if the result of maybe_constant_value doesn't turn out to be an INTEGER_CST, we want to end up with the result of RECUR rather than the result of fold_non_dependent_expr, as the latter might not be suitable for subsequent tsubsting. Jason
[PATCH][17/n] Merge from match-and-simplify, plus/minus association patterns
This merges patterns from associate_plusminus and adjusts them with details from their fold-const.c pendants. It also fixes missing flag_sanitize checks on negate contraction on the way. This shows places where folds STRIP_NOPs was important (but also shows where it may create wrong code - sth the patch doesn't fix yet). Without the conditonal convert handling on the negate contraction we regress quite a few GENERIC folding testcases. Note that the other explicit reassocation patterns are handled by folds associate: piece which I am sure we don't implement fully by the few patterns (OTOH on GIMPLE we have a reassoc pass for that anyway). So not too many patterns were removed from fold. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2014-11-11 Richard Biener rguent...@suse.de * match.pd: Implement patterns from associate_plusminus and factor in differences from the fold-const.c implementation. * fold-const.c (fold_binary_loc): Remove patterns here. * tree-ssa-forwprop.c (associate_plusminus): Remove. (pass_forwprop::execute): Don't call it. * tree.c (tree_nop_conversion_p): New function, factored from tree_nop_conversion. * tree.h (tree_nop_conversion_p): Declare. Index: trunk/gcc/fold-const.c === *** trunk.orig/gcc/fold-const.c 2014-11-11 09:54:58.840824189 +0100 --- trunk/gcc/fold-const.c 2014-11-11 10:06:29.274793976 +0100 *** fold_binary_loc (location_t loc, *** 9939,9997 return NULL_TREE; case PLUS_EXPR: - /* A + (-B) - A - B */ - if (TREE_CODE (arg1) == NEGATE_EXPR - (flag_sanitize SANITIZE_SI_OVERFLOW) == 0) - return fold_build2_loc (loc, MINUS_EXPR, type, - fold_convert_loc (loc, type, arg0), - fold_convert_loc (loc, type, - TREE_OPERAND (arg1, 0))); - /* (-A) + B - B - A */ - if (TREE_CODE (arg0) == NEGATE_EXPR - reorder_operands_p (TREE_OPERAND (arg0, 0), arg1) - (flag_sanitize SANITIZE_SI_OVERFLOW) == 0) - return fold_build2_loc (loc, MINUS_EXPR, type, - fold_convert_loc (loc, type, arg1), - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 0))); - if (INTEGRAL_TYPE_P (type) || VECTOR_INTEGER_TYPE_P (type)) { - /* Convert ~A + 1 to -A. */ - if (TREE_CODE (arg0) == BIT_NOT_EXPR - integer_each_onep (arg1)) - return fold_build1_loc (loc, NEGATE_EXPR, type, - fold_convert_loc (loc, type, - TREE_OPERAND (arg0, 0))); - - /* ~X + X is -1. */ - if (TREE_CODE (arg0) == BIT_NOT_EXPR - !TYPE_OVERFLOW_TRAPS (type)) - { - tree tem = TREE_OPERAND (arg0, 0); - - STRIP_NOPS (tem); - if (operand_equal_p (tem, arg1, 0)) - { - t1 = build_all_ones_cst (type); - return omit_one_operand_loc (loc, type, t1, arg1); - } - } - - /* X + ~X is -1. */ - if (TREE_CODE (arg1) == BIT_NOT_EXPR - !TYPE_OVERFLOW_TRAPS (type)) - { - tree tem = TREE_OPERAND (arg1, 0); - - STRIP_NOPS (tem); - if (operand_equal_p (arg0, tem, 0)) - { - t1 = build_all_ones_cst (type); - return omit_one_operand_loc (loc, type, t1, arg0); - } - } - /* X + (X / CST) * -CST is X % CST. */ if (TREE_CODE (arg1) == MULT_EXPR TREE_CODE (TREE_OPERAND (arg1, 0)) == TRUNC_DIV_EXPR --- 9939,9946 *** fold_binary_loc (location_t loc, *** 10469,10479 return fold_build2_loc (loc, MINUS_EXPR, type, tmp, arg11); } } - /* A - (-B) - A + B */ - if (TREE_CODE (arg1) == NEGATE_EXPR) - return fold_build2_loc (loc, PLUS_EXPR, type, op0, - fold_convert_loc (loc, type, - TREE_OPERAND (arg1, 0))); /* (-A) - B - (-B) - A where B is easily negated and we can swap. */ if (TREE_CODE (arg0) == NEGATE_EXPR negate_expr_p (arg1) --- 10418,10423 Index: trunk/gcc/match.pd === *** trunk.orig/gcc/match.pd 2014-11-11 09:54:58.840824189 +0100 --- trunk/gcc/match.pd 2014-11-11 11:55:27.283507870 +0100 *** along with GCC; see the file COPYING3. *** 25,32 /* Generic tree predicates we inherit. */ (define_predicates !integer_onep integer_zerop integer_all_onesp !real_zerop real_onep
Re: Add the latest C++ SD-6 additions.
On 11/11/2014 12:52 AM, Ed Smith-Rowland wrote: I'll might put this to the SD-6 list because it would be nice to have clarity - even if it's implementation defined. Sounds good. I was thinking that defining to 0 tells the user this isn't supported which seems more useful than this may or may not be supported. Jason
Re: [Build, patch] Remove CLooG from the main configure.ac
On Tue, Nov 11, 2014 at 2:01 PM, Tobias Burnus tobias.bur...@physik.fu-berlin.de wrote: Now that CLooG is no longer used by GCC, it makes sense to also remove it from the main configure file. Especially as the in-tree build currently only works if also CLooG is available. Build on x86-64-gnu-linux - and tested that Graphite still works.* OK for the trunk? Ok. Thanks, Richard. [* I did see a failure for gcc.dg/graphite/vect-pr43423.c, but that seems to be independent as I see it also with the yesterday's GCC; for Sparc/arm/aarch64, it's PR62630.] Yeah, happens since quite some time for me as well. Tobias 2014-11-11 Tobias Burnus bur...@net-b.de * config/cloog.m4: Remove. * Makefile.def: Remove CLooG. * Makefile.tpl: Ditto. * configure.ac: Ditto. * configure: Regenerate. * Makefile.in: Ditto.
Re: [PATCH 2/2] Simplify and extend VRP edge-assertion code
On Tue, Nov 11, 2014 at 7:52 AM, Richard Biener richard.guent...@gmail.com wrote: On Tue, Nov 11, 2014 at 4:52 AM, Patrick Palka patr...@parcs.ath.cx wrote: This patch refactors the VRP edge-assertion code to make it always traverse SSA-name definitions in order to find suitable edge assertions to insert. Currently SSA-name definitions get traversed only when the LHS of the original conditional is a bitwise AND or OR operation which seems like a strange restriction. We should always try to traverse the SSA-name definitions inside the conditional, in particular for conditionals with the form: int p = x COMP y; if (p != 0) -- edge assertion: x COMP y Of course this specific case should have been simplified to if (x COMP y) if that comparison cannot trap and -fnon-call-exceptions is in effect. Like Andrew said, I noticed that if p is shared then such comparisons don't get simplified. And like in the case of uninit-pred-9_b.c it seems that the compiler sometimes implicitly CSEs duplicate conditionals. To achieve this the patch merges the mutually recursive functions register_edge_assert_for_1() and register_edge_assert_for_2() into a single recursive function, register_edge_assert_for_1(). In doing so, code duplication can be reduced and at the same time the more general logic allows VRP to detect more useful edge assertions. The recursion of the function register_edge_assert_for_1() is bounded by a new 'limit' argument which is arbitrarily set to 4 so that at most 4 levels of SSA-name definitions will be traversed per conditional. (Incidentally this hard recursion limit makes the related fix for PR 57685 unnecessary.) A test in uninit-pred-9_b.c now has to be marked xfail because in it VRP (correctly) transforms the statement # prephitmp_35 = PHI pretmp_9(8), _28(10) into # prephitmp_35 = PHI pretmp_9(8), 1(10) and the uninit pass doesn't properly handle such PHIs containing a constant value as one of its arguments -- so a bogus uninit warning is now emitted. Did you try fixing that? It seems to me a constant should be easy to handle? I tried a couple months ago and I failed. I might try again. Full bootstrap + regtesting on x86_64-unknown-linux-gnu is in progress. Is it OK to commit if testing finishes with no new regressions? Ok. I decided to replace this refactoring patch with a simpler patch (sent to the ML) that just changes (adds) about 20LOC to tree-vrp.c. The patch is not as extensive (a few of the tests in vrp-2.c still fail) but I am more comfortable about the patch's correctness and its impact on compile time. Sorry, I should've been more clear about that.
Re: [PATCH] Look through widening type conversions for possible edge assertions
On Tue, Nov 11, 2014 at 1:10 PM, Patrick Palka patr...@parcs.ath.cx wrote: This patch is a replacement for the 2nd VRP refactoring patch. It simply teaches VRP to look through widening type conversions when finding suitable edge assertions, e.g. bool p = x != y; int q = (int) p; if (q == 0) // new edge assert: p == 0 and therefore x == y I think the proper fix is to forward x != y to q == 0 instead of this one. That said - the tree-ssa-forwprop.c restriction on only forwarding single-uses into conditions is clearly bogus here. I suggest to relax it for conversions and compares. Like with Index: tree-ssa-forwprop.c === --- tree-ssa-forwprop.c (revision 217349) +++ tree-ssa-forwprop.c (working copy) @@ -476,7 +476,7 @@ forward_propagate_into_comparison_1 (gim { rhs0 = rhs_to_tree (TREE_TYPE (op1), def_stmt); tmp = combine_cond_expr_cond (stmt, code, type, - rhs0, op1, !single_use0_p); + rhs0, op1, false); if (tmp) return tmp; } Thanks, Richard. The new testcase requires that such an edge assertion be inserted. Full bootstrap + regtest on x86_64-unknown-linux-gnu in progress. Does the patch look OK for trunk if no new regressions? 2014-11-11 Patrick Palka ppa...@gcc.gnu.org gcc/ * tree-vrp.c (register_edge_assert_for): Look through widening type conversions for posible edge assertions. gcc/testsuite/ * gcc.dg/vrp-1.c: New testcase. --- gcc/testsuite/gcc.dg/vrp-1.c | 31 +++ gcc/tree-vrp.c | 22 ++ 2 files changed, 53 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/vrp-1.c diff --git a/gcc/testsuite/gcc.dg/vrp-1.c b/gcc/testsuite/gcc.dg/vrp-1.c new file mode 100644 index 000..df5334e --- /dev/null +++ b/gcc/testsuite/gcc.dg/vrp-1.c @@ -0,0 +1,31 @@ +/* { dg-options -O2 } */ + +void runtime_error (void) __attribute__ ((noreturn)); +void compiletime_error (void) __attribute__ ((noreturn, error ())); + +static void +compiletime_check_equals_1 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); + if (__p) +runtime_error (); +} + +static void +compiletime_check_equals_2 (int *x, int y) +{ + int __p = *x != y; + if (__builtin_constant_p (__p) __p) +compiletime_error (); /* { dg-error call to } */ + if (__p) +runtime_error (); +} + +void +foo (int *x) +{ + compiletime_check_equals_1 (x, 5); + compiletime_check_equals_2 (x, 10); +} diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c index f0a4382..979ab44 100644 --- a/gcc/tree-vrp.c +++ b/gcc/tree-vrp.c @@ -5634,6 +5634,7 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si, the value zero or one, then we may be able to assert values for SSA_NAMEs which flow into COND. */ + /* In the case of NAME == 1 or NAME != 0, for BIT_AND_EXPR defining statement of NAME we can assert both operands of the BIT_AND_EXPR have nonzero value. */ @@ -5673,6 +5674,27 @@ register_edge_assert_for (tree name, edge e, gimple_stmt_iterator si, register_edge_assert_for_1 (op1, EQ_EXPR, e, si); } } + + /* In the case of NAME != 0 or NAME == 0, if NAME's defining statement + is a widening type conversion then we can assert that NAME's + RHS is accordingly nonzero or zero. */ + if ((comp_code == EQ_EXPR || comp_code == NE_EXPR) + integer_zerop (val)) +{ + gimple def_stmt = SSA_NAME_DEF_STMT (name); + if (is_gimple_assign (def_stmt)) + { + enum tree_code def_code = gimple_assign_rhs_code (def_stmt); + if (CONVERT_EXPR_CODE_P (def_code)) + { + tree lhs = gimple_assign_lhs (def_stmt); + tree rhs = gimple_assign_rhs1 (def_stmt); + if (TYPE_PRECISION (TREE_TYPE (lhs)) + = TYPE_PRECISION (TREE_TYPE (rhs))) + register_edge_assert_for_1 (rhs, comp_code, e, si); + } + } +} } -- 2.2.0.rc1.16.g6066a7e
Pending LTO review for OpenACC trunk-merge patches (was: Re: [patch] OpenACC fortran front end)
Ilya Verbin wrote: On 11 Nov 08:10, Jakub Jelinek wrote: For the middle-end and libgomp changes, can you talk to the Intel folks to update their git branch to latest trunk (so that you have the nvptx bits in there) and send middle-end and libgomp diffs against that? As far as I remember, most of the changes from the branch are now approved, they are just waiting for review of the LTO related changes in the middle-end (please, correct me if I've missed something). The updated branch is here: https://gcc.gnu.org/git/?p=gcc.git;a=shortlog;h=refs/heads/kyukhin/gomp4-offload It contains 7 common patches. Patches 2-4 are waiting for LTO review, the others are approved. Those are: * [PATCH 2] OpenMP 4.0 offloading infrastructure: LTO streaming https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=5d1dfefd7cd529998968751a46f4daf87d8300a1 Which is identical except for re-diffing to: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00299.html * [PATCH 3] OpenMP 4.0 offloading infrastructure: Offload tables https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=06ffd7482ef4bf2b038a3a0d203b7bec586c6d17 Which has been posted at https://gcc.gnu.org/ml/gcc-patches/2014-11/msg00308.html * [PATCH 4] OpenMP 4.0 offloading infrastructure: lto-wrapper https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=41d2ad0d52fb7c6cc78f6ee4fbec7781fa226c70 See https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01535.html or rather: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01531.html Tobias
[PATCH 2/5] OpenACC 2.0 support for libgomp - temporarily work around missing __builtin_acc_on_device (repost)
On Tue, 23 Sep 2014 19:19:55 +0100 Julian Brown jul...@codesourcery.com wrote: The patches implementing __builtin_acc_on_device are still in processing. For the time being this patch removes the dependency on that builtin in the OpenACC runtime. Julian -xx-xx Julian Brown jul...@codesourcery.com libgomp/ * oacc-init.c (acc_on_device): Temporarily hard-code for host instead of using __builtin_acc_on_device. This patch remains unchanged from the last posting. OK to apply? JulianFrom 99e76023ff0759925403b43e19612fb859c3759e Mon Sep 17 00:00:00 2001 From: Julian Brown jul...@codesourcery.com Date: Fri, 19 Sep 2014 11:28:11 -0700 Subject: [PATCH 2/5] Work around lack of __builtin_acc_on_device for now -xx-xx Julian Brown jul...@codesourcery.com libgomp/ * oacc-init.c (acc_on_device): Temporarily hard-code for host instead of using __builtin_acc_on_device. --- libgomp/oacc-init.c | 12 1 file changed, 12 insertions(+) diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c index 8c91ea7..1cbb4d7 100644 --- a/libgomp/oacc-init.c +++ b/libgomp/oacc-init.c @@ -545,8 +545,20 @@ acc_on_device (acc_device_t dev) acc_device_type (thr-dev-type) == acc_device_host_nonshm) return dev == acc_device_host_nonshm || dev == acc_device_not_host; +#if 1 + /* Support for __builtin_acc_on_device comes in later patches. */ + switch (dev) +{ +case acc_device_none: +case acc_device_host: + return 1; +default: + return 0; +} +#else /* Just rely on the compiler builtin. */ return __builtin_acc_on_device (dev); +#endif } ialias (acc_on_device) -- 1.7.10.4
[PATCH 3/5] OpenACC 2.0 support for libgomp - outline documentation (repost)
On Tue, 23 Sep 2014 19:20:14 +0100 Julian Brown jul...@codesourcery.com wrote: This patch provides some documentation for the new OpenACC bits in libgomp. Julian -xx-xx Thomas Schwinge tho...@codesourcery.com James Norris jnor...@codesourcery.com libgomp/ * libgomp.texi: Outline documentation for OpenACC. This patch also remains unchanged from the last posting. OK to apply? JulianFrom 1f17beb70b5607d1884fad1cb4734857f0e7846f Mon Sep 17 00:00:00 2001 From: Julian Brown jul...@codesourcery.com Date: Mon, 22 Sep 2014 02:45:29 -0700 Subject: [PATCH 3/5] OpenACC documentation. -xx-xx Thomas Schwinge tho...@codesourcery.com James Norris jnor...@codesourcery.com libgomp/ * libgomp.texi: Outline documentation for OpenACC. --- libgomp/libgomp.texi | 661 -- 1 file changed, 636 insertions(+), 25 deletions(-) diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi index 254be57..9530a2b 100644 --- a/libgomp/libgomp.texi +++ b/libgomp/libgomp.texi @@ -31,10 +31,12 @@ texts being (a) (see below), and with the Back-Cover Texts being (b) @ifinfo @dircategory GNU Libraries @direntry -* libgomp: (libgomp).GNU OpenMP runtime library +* libgomp: (libgomp).GNU OpenACC and OpenMP runtime library @end direntry -This manual documents the GNU implementation of the OpenMP API for +This manual documents the GNU implementation of the OpenACC API for +offloading of code to accelerator devices in C/C++ and Fortran and +the GNU implementation of the OpenMP API for multi-platform shared-memory parallel programming in C/C++ and Fortran. Published by the Free Software Foundation @@ -48,7 +50,7 @@ Boston, MA 02110-1301 USA @setchapternewpage odd @titlepage -@title The GNU OpenMP Implementation +@title The GNU OpenACC and OpenMP Implementation @page @vskip 0pt plus 1filll @comment For the @value{version-GCC} Version* @@ -69,7 +71,10 @@ Boston, MA 02110-1301, USA@* @top Introduction @cindex Introduction -This manual documents the usage of libgomp, the GNU implementation of the +This manual documents the usage of libgomp, the GNU implementation of the +@uref{http://www.openacc.org/, OpenACC} Application Programming Interface (API) +for offloading of code to accelerator devices in C/C++ and Fortran, and +the GNU implementation of the @uref{http://www.openmp.org, OpenMP} Application Programming Interface (API) for multi-platform shared-memory parallel programming in C/C++ and Fortran. @@ -81,23 +86,619 @@ for multi-platform shared-memory parallel programming in C/C++ and Fortran. @comment better formatting. @comment @menu -* Enabling OpenMP::How to enable OpenMP for your applications. -* Runtime Library Routines:: The OpenMP runtime application programming - interface. -* Environment Variables:: Influencing runtime behavior with environment - variables. -* The libgomp ABI::Notes on the external ABI presented by libgomp. -* Reporting Bugs:: How to report bugs in GNU OpenMP. -* Copying::GNU general public license says - how you can copy and share libgomp. -* GNU Free Documentation License:: - How you can copy and share this manual. -* Funding::How to help assure continued work for free - software. -* Library Index:: Index of this documentation. +* Enabling OpenACC:: How to enable OpenACC for your + applications. +* OpenACC Runtime Library Routines:: The OpenACC runtime application + programming interface. +* OpenACC Environment Variables::Influencing OpenACC runtime behavior with + environment variables. +* OpenACC Library Interoperability:: OpenACC library interoperability with the + NVIDIA CUBLAS library. +* Enabling OpenMP:: How to enable OpenMP for your + applications. +* OpenMP Runtime Library Routines: Runtime Library Routines. + The OpenMP runtime application programming + interface. +* OpenMP Environment Variables: Environment Variables. + Influencing OpenMP runtime behavior with + environment variables. +* The libgomp ABI:: Notes on the external libgomp ABI. +* Reporting Bugs:: How to report bugs. +* Copying:: GNU general public license says how you + can copy and share libgomp. +* GNU Free Documentation License:: How you can copy and share this
[PATCH 5/5] OpenACC 2.0 support for libgomp - temporary test harness tweaks
Hi, As mentioned in the previous mail in this series, testing the OpenACC runtime support in libgomp is going to be awkward until the associated middle-end pieces are ready. This stop-gap patch helps to allow tests (that don't use any of the pragmas, only calling the run-time library directly) to run successfully. OK to apply? Thanks, Julian ChangeLog libgomp/ * testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily replace -fopenacc with -lgomp -lpthread, until -fopenacc support lands upstream. * testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Similar, but without -lpthread. From c70f2aca94bc306e4600282aa81bc1a758ad81fa Mon Sep 17 00:00:00 2001 From: Julian Brown jul...@codesourcery.com Date: Tue, 11 Nov 2014 02:54:09 -0800 Subject: [PATCH 5/5] Temporary testing tweaks libgomp/ * testsuite/libgomp.oacc-c++/c++.exp (ALWAYS_CFLAGS): Temporarily replace -fopenacc with -lgomp -lpthread, until -fopenacc support lands upstream. * testsuite/libgomp.oacc-c/c.exp (ALWAYS_CFLAGS): Likewise. * testsuite/libgomp.oacc-fortran/fortran.exp (ALWAYS_CFLAGS): Similar, but without -lpthread. --- libgomp/testsuite/libgomp.oacc-c++/c++.exp |4 +++- libgomp/testsuite/libgomp.oacc-c/c.exp |4 +++- libgomp/testsuite/libgomp.oacc-fortran/fortran.exp |4 +++- 3 files changed, 9 insertions(+), 3 deletions(-) diff --git a/libgomp/testsuite/libgomp.oacc-c++/c++.exp b/libgomp/testsuite/libgomp.oacc-c++/c++.exp index b8b3e85..1060344 100644 --- a/libgomp/testsuite/libgomp.oacc-c++/c++.exp +++ b/libgomp/testsuite/libgomp.oacc-c++/c++.exp @@ -23,7 +23,9 @@ dg-init # Turn on OpenACC. # XXX (TEMPORARY): Remove the -flto once that's properly integrated. -lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted. +lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread set blddir [lookfor_file [get_multilibs] libgomp] diff --git a/libgomp/testsuite/libgomp.oacc-c/c.exp b/libgomp/testsuite/libgomp.oacc-c/c.exp index 5558ec8..85528aa 100644 --- a/libgomp/testsuite/libgomp.oacc-c/c.exp +++ b/libgomp/testsuite/libgomp.oacc-c/c.exp @@ -28,7 +28,9 @@ dg-init # Turn on OpenACC. # XXX (TEMPORARY): Remove the -flto once that's properly integrated. -lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +# TODO: Revert temporary hack when OpenACC middle-end pieces are submitted. +lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto -lpthread lappend libgomp_compile_options compiler=$GCC_UNDER_TEST diff --git a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp index 0ada038..27cf4d5 100644 --- a/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp +++ b/libgomp/testsuite/libgomp.oacc-fortran/fortran.exp @@ -23,7 +23,9 @@ dg-init # Turn on OpenACC. # XXX (TEMPORARY): Remove the -flto once that's properly integrated. -lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +#lappend ALWAYS_CFLAGS additional_flags=-fopenacc -flto +# TODO: Revert this temporary hack when OpenACC middle-end pieces are submitted. +lappend ALWAYS_CFLAGS additional_flags=-lgomp -flto if { $blddir != } { set lang_source_re {^.*\.[fF](|90|95|03|08)$} -- 1.7.10.4
[PATCH] Extend shift permutations on power of 2 cases
Hi, The patch extends shift permutations technique on power of 2 cases (previously even/odd transformations was used unconditionally). Basically the patch just add loop for load group of length 2, like it is done in vect_permute_load_chain function. For Silvermont it reduces insn sequence for load group of length 4 from 31 to 20 insns. Performance for the test in the patch improved by ~20%. Bootstrap passed. Make check in progress. Is it ok? 2014-11-11 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite * gcc.target/i386/pr52252-atom-1.c: New. gcc/ * tree-vect-data-refs.c (vect_shift_permute_load_chain): Extend shift permutations on power of 2 cases. diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c new file mode 100644 index 000..1fbd258 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ssse3 } */ +/* { dg-options -O2 -ftree-vectorize -mssse3 -mtune=slm } */ +#define byte unsigned char + +void +pair_mul_sum(byte *in, byte *out, int size) +{ + int j; + for(j = 0; j size; j++) +{ + byte a = in[0]; + byte b = in[1]; + byte c = in[2]; + byte d = in[3]; + out[0] = (byte)(a * b) + (byte)(b * c) + (byte)(c * d) + (byte)(d * a); + in += 4; + out += 1; +} +} + +/* { dg-final { scan-assembler palignr } } */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 0bc0356..d2e0e93 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -5379,8 +5379,9 @@ vect_shift_permute_load_chain (vectree dr_chain, memcpy (result_chain-address (), dr_chain.address (), length * sizeof (tree)); - if (length == 2 LOOP_VINFO_VECT_FACTOR (loop_vinfo) 4) + if (exact_log2 (length) != -1 LOOP_VINFO_VECT_FACTOR (loop_vinfo) 4) { + unsigned int j, log_length = exact_log2 (length); for (i = 0; i nelt / 2; ++i) sel[i] = i * 2; for (i = 0; i nelt / 2; ++i) @@ -5441,37 +5442,44 @@ vect_shift_permute_load_chain (vectree dr_chain, select_mask = vect_gen_perm_mask (vectype, sel); gcc_assert (select_mask != NULL); - first_vect = dr_chain[0]; - second_vect = dr_chain[1]; - - data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - first_vect, first_vect, - perm2_mask1); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - vect[0] = data_ref; + for (i = 0; i log_length; i++) + { + for (j = 0; j length; j += 2) + { + first_vect = dr_chain[j]; + second_vect = dr_chain[j + 1]; - data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - second_vect, second_vect, - perm2_mask2); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - vect[1] = data_ref; + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, + first_vect, first_vect, + perm2_mask1); + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + vect[0] = data_ref; - data_ref = make_temp_ssa_name (vectype, NULL, vect_shift); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - vect[0], vect[1], - shift1_mask); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[1] = data_ref; + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, + second_vect, second_vect, + perm2_mask2); + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + vect[1] = data_ref; - data_ref = make_temp_ssa_name (vectype, NULL, vect_select); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - vect[0], vect[1], - select_mask); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[0] = data_ref; + data_ref = make_temp_ssa_name (vectype, NULL, vect_shift); + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, +
[PATCH] Fix for ipa/63795, ipa/63622
Hello. Following patch adds checking for aliasing support. Patch can bootstrap on x86_64-apple-darwin1 and is part of patches needed for bootstrap restory on the target. I plan to introduce additional patch that will cover testsuite failures for the target. Ready for trunk? Thanks, Martin gcc/ChangeLog: 2014-11-11 Martin Liska mli...@suse.cz * ipa-icf.c (sem_function::merge): Add new target aliasing support guide. (sem_variable::merge): Likewise. * ipa-icf.h (target_supports_aliasing_p): New function. gcc/testsuite/ChangeLog: 2014-11-11 Martin Liska mli...@suse.cz * g++.dg/ipa/ipa-icf-4.C: Add more precise dump scan. * g++.dg/ipa/ipa-icf-5.C: Add condition for targets with aliasing support. diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c index 84cc0ca..f19c3c1 100644 --- a/gcc/ipa-icf.c +++ b/gcc/ipa-icf.c @@ -191,6 +191,18 @@ sem_item::dump (void) } } +/* Return true if target supports aliasing. */ + +bool +sem_item::target_supports_aliasing_p (void) +{ +#if !defined (ASM_OUTPUT_DEF) || (!defined(ASM_OUTPUT_WEAK_ALIAS) !defined (ASM_WEAKEN_DECL)) + return false; +#else + return true; +#endif +} + /* Semantic function constructor that uses STACK as bitmap memory stack. */ sem_function::sem_function (bitmap_obstack *stack): sem_item (FUNC, stack), @@ -589,7 +601,8 @@ sem_function::merge (sem_item *alias_item) redirect_callers = false; } - if (create_alias DECL_COMDAT_GROUP (alias-decl)) + if (create_alias (DECL_COMDAT_GROUP (alias-decl) + || !sem_item::target_supports_aliasing_p ())) { create_alias = false; create_thunk = true; @@ -605,6 +618,14 @@ sem_function::merge (sem_item *alias_item) local_original = dyn_cast cgraph_node * (original-noninterposable_alias ()); +if (!local_original) + { + if (dump_file) + fprintf (dump_file, Noninterposable alias cannot be created.\n\n); + + return false; + } + if (redirect_callers) { /* If alias is non-overwritable then @@ -649,7 +670,7 @@ sem_function::merge (sem_item *alias_item) alias-resolve_alias (original); /* Workaround for PR63566 that forces equal calling convention - to be used. */ + to be used. */ alias-local.local = false; original-local.local = false; @@ -1155,6 +1176,13 @@ sem_variable::merge (sem_item *alias_item) { gcc_assert (alias_item-type == VAR); + if (!sem_item::target_supports_aliasing_p ()) +{ + if (dump_file) + fprintf (dump_file, Aliasing is not supported by target\n\n); + return false; +} + sem_variable *alias_var = static_castsem_variable * (alias_item); varpool_node *original = get_node (); diff --git a/gcc/ipa-icf.h b/gcc/ipa-icf.h index d8e7b16..6e15166 100644 --- a/gcc/ipa-icf.h +++ b/gcc/ipa-icf.h @@ -138,9 +138,11 @@ public: /* Return base tree that can be used for compatible_types_p and contains_polymorphic_type_p comparison. */ - static bool get_base_types (tree *t1, tree *t2); + /* Return true if target supports aliasing. */ + static bool target_supports_aliasing_p (void); + /* Item type. */ sem_item_type type; diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C index 9434289..67f2744 100644 --- a/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C +++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-4.C @@ -43,6 +43,6 @@ int main() return 123; } -/* { dg-final { scan-ipa-dump Varpool alias has been created icf } } */ +/* { dg-final { scan-ipa-dump \(Varpool alias has been created\)|\(Aliasing is not supported by target\) icf } } */ /* { dg-final { scan-ipa-dump Equal symbols: 6 icf } } */ /* { dg-final { cleanup-ipa-dump icf } } */ diff --git a/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C b/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C index f835814..57dcb78 100644 --- a/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C +++ b/gcc/testsuite/g++.dg/ipa/ipa-icf-5.C @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-require-visibility } */ +/* { dg-require-alias } */ /* { dg-options -O2 -fdump-ipa-icf } */ struct test
Re: [PATCH] Extend shift permutations on power of 2 cases
On Tue, Nov 11, 2014 at 3:21 PM, Evgeny Stupachenko evstu...@gmail.com wrote: Hi, The patch extends shift permutations technique on power of 2 cases (previously even/odd transformations was used unconditionally). Basically the patch just add loop for load group of length 2, like it is done in vect_permute_load_chain function. For Silvermont it reduces insn sequence for load group of length 4 from 31 to 20 insns. Performance for the test in the patch improved by ~20%. Bootstrap passed. Make check in progress. Is it ok? Ok. Thanks, Richard. 2014-11-11 Evgeny Stupachenko evstu...@gmail.com gcc/testsuite * gcc.target/i386/pr52252-atom-1.c: New. gcc/ * tree-vect-data-refs.c (vect_shift_permute_load_chain): Extend shift permutations on power of 2 cases. diff --git a/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c new file mode 100644 index 000..1fbd258 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr52252-atom-1.c @@ -0,0 +1,22 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target ssse3 } */ +/* { dg-options -O2 -ftree-vectorize -mssse3 -mtune=slm } */ +#define byte unsigned char + +void +pair_mul_sum(byte *in, byte *out, int size) +{ + int j; + for(j = 0; j size; j++) +{ + byte a = in[0]; + byte b = in[1]; + byte c = in[2]; + byte d = in[3]; + out[0] = (byte)(a * b) + (byte)(b * c) + (byte)(c * d) + (byte)(d * a); + in += 4; + out += 1; +} +} + +/* { dg-final { scan-assembler palignr } } */ diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 0bc0356..d2e0e93 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -5379,8 +5379,9 @@ vect_shift_permute_load_chain (vectree dr_chain, memcpy (result_chain-address (), dr_chain.address (), length * sizeof (tree)); - if (length == 2 LOOP_VINFO_VECT_FACTOR (loop_vinfo) 4) + if (exact_log2 (length) != -1 LOOP_VINFO_VECT_FACTOR (loop_vinfo) 4) { + unsigned int j, log_length = exact_log2 (length); for (i = 0; i nelt / 2; ++i) sel[i] = i * 2; for (i = 0; i nelt / 2; ++i) @@ -5441,37 +5442,44 @@ vect_shift_permute_load_chain (vectree dr_chain, select_mask = vect_gen_perm_mask (vectype, sel); gcc_assert (select_mask != NULL); - first_vect = dr_chain[0]; - second_vect = dr_chain[1]; - - data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - first_vect, first_vect, - perm2_mask1); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - vect[0] = data_ref; + for (i = 0; i log_length; i++) + { + for (j = 0; j length; j += 2) + { + first_vect = dr_chain[j]; + second_vect = dr_chain[j + 1]; - data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - second_vect, second_vect, - perm2_mask2); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - vect[1] = data_ref; + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, + first_vect, first_vect, + perm2_mask1); + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + vect[0] = data_ref; - data_ref = make_temp_ssa_name (vectype, NULL, vect_shift); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - vect[0], vect[1], - shift1_mask); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[1] = data_ref; + data_ref = make_temp_ssa_name (vectype, NULL, vect_shuffle2); + perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, + second_vect, second_vect, + perm2_mask2); + vect_finish_stmt_generation (stmt, perm_stmt, gsi); + vect[1] = data_ref; - data_ref = make_temp_ssa_name (vectype, NULL, vect_select); - perm_stmt = gimple_build_assign_with_ops (VEC_PERM_EXPR, data_ref, - vect[0], vect[1], - select_mask); - vect_finish_stmt_generation (stmt, perm_stmt, gsi); - (*result_chain)[0] = data_ref;
Re: [PATCH] Fix for ipa/63795, ipa/63622
On Tue, Nov 11, 2014 at 3:22 PM, Martin Liška mli...@suse.cz wrote: Hello. Following patch adds checking for aliasing support. Patch can bootstrap on x86_64-apple-darwin1 and is part of patches needed for bootstrap restory on the target. I plan to introduce additional patch that will cover testsuite failures for the target. Ready for trunk? Aliasing sounds odd here. I'd expand it to Symbol aliases, likewise rename target_supports_aliasing_p to target_supports_symbol_aliases_p. Ok with that change. Thanks, Richard. Thanks, Martin
Re: Fix libtool.m4 for Darwin = 10.10
FX, It looks like you missed patching a few configure files... libjava/classpath/configure libjava/configure are definitely needed while libgo/configure zlib/configure should be added for completeness. Jack On Tue, Nov 11, 2014 at 4:15 AM, FX fxcoud...@gmail.com wrote: Your patch contains lots of other changes, not just the libtool.m4 change. Please filter those out. Sorry about that. The patch attached should be clean, and the ChangeLog entries formatted as they should. OK to commit? This touches so many area it probably needs a build maintainer or global maintainer to approve it. FX
Re: [PATCH] Fix some ICF gimple_call handling issues
On 11/11/2014 12:11 AM, Jakub Jelinek wrote: On Mon, Nov 10, 2014 at 10:08:54PM +0100, Richard Biener wrote: @@ -662,9 +662,49 @@ func_checker::compare_gimple_call (gimpl t1 = gimple_call_fndecl (s1); t2 = gimple_call_fndecl (s2); Just drop these and compare gimple_call_fn only. + tree chain1 = gimple_call_chain (s1); + tree chain2 = gimple_call_chain (s2); + + if ((chain1 !chain2) || (!chain1 chain2)) +return return_false_with_msg (Tree call chains are different); I miss a compare_operands for the call chain. Otherwise OK. Here is what I've committed after another bootstrap/regtest. Note, I've tried: __attribute__ ((noinline, noclone)) int f1 (int x) { int y = 3, z = 4; __attribute__ ((noinline, noclone)) int f2 (int a) { return a + x + y + z; } return f2 (5); } __attribute__ ((noinline, noclone)) int f3 (int x) { int y = 3, z = 4; __attribute__ ((noinline, noclone)) int f4 (int a) { return a + x + y + z; } return f4 (5); } int main () { if (f1 (9) != 21 || f3 (9) != 21) __builtin_abort (); return 0; } but ICF doesn't optimize this with or without the patch, as the structs aren't the same type (supposedly different alias set?), even when they have the same members laid out the same. Hello Jakub. You are right, more precisely types_compatible_p return false for these two structures. I'll write this situation to my TODO list. Thank you for sending the patch. Martin 2014-11-11 Jakub Jelinek ja...@redhat.com Martin Liska mli...@suse.cz * ipa-icf-gimple.c (func_checker::compare_bb): Fix comment typo. (func_checker::compare_gimple_call): Compare gimple_call_fn, gimple_call_chain, gimple_call_fntype and call flags. testsuite/ * gcc.dg/ubsan/ipa-icf-1.c: New test. * gcc.dg/ipa/ipa-icf-31.c: New test. --- gcc/ipa-icf-gimple.c.jj 2014-10-30 14:42:20.0 +0100 +++ gcc/ipa-icf-gimple.c2014-11-10 19:08:38.339986360 +0100 @@ -554,7 +554,7 @@ func_checker::parse_labels (sem_bb *bb) In general, a collection of equivalence dictionaries is built for types like SSA names, declarations (VAR_DECL, PARM_DECL, ..). This infrastructure - is utilized by every statement-by-stament comparison function. */ + is utilized by every statement-by-statement comparison function. */ bool func_checker::compare_bb (sem_bb *bb1, sem_bb *bb2) @@ -659,12 +659,39 @@ func_checker::compare_gimple_call (gimpl if (gimple_call_num_args (s1) != gimple_call_num_args (s2)) return false; - t1 = gimple_call_fndecl (s1); - t2 = gimple_call_fndecl (s2); - - /* Function pointer variables are not supported yet. */ + t1 = gimple_call_fn (s1); + t2 = gimple_call_fn (s2); if (!compare_operand (t1, t2)) -return return_false(); +return return_false (); + + /* Compare flags. */ + if (gimple_call_internal_p (s1) != gimple_call_internal_p (s2) + || gimple_call_ctrl_altering_p (s1) != gimple_call_ctrl_altering_p (s2) + || gimple_call_tail_p (s1) != gimple_call_tail_p (s2) + || gimple_call_return_slot_opt_p (s1) != gimple_call_return_slot_opt_p (s2) + || gimple_call_from_thunk_p (s1) != gimple_call_from_thunk_p (s2) + || gimple_call_va_arg_pack_p (s1) != gimple_call_va_arg_pack_p (s2) + || gimple_call_alloca_for_var_p (s1) != gimple_call_alloca_for_var_p (s2) + || gimple_call_with_bounds_p (s1) != gimple_call_with_bounds_p (s2)) +return false; + + if (gimple_call_internal_p (s1) + gimple_call_internal_fn (s1) != gimple_call_internal_fn (s2)) +return false; + + tree fntype1 = gimple_call_fntype (s1); + tree fntype2 = gimple_call_fntype (s2); + if ((fntype1 !fntype2) + || (!fntype1 fntype2) + || (fntype1 !types_compatible_p (fntype1, fntype2))) +return return_false_with_msg (call function types are not compatible); + + tree chain1 = gimple_call_chain (s1); + tree chain2 = gimple_call_chain (s2); + if ((chain1 !chain2) + || (!chain1 chain2) + || !compare_operand (chain1, chain2)) +return return_false_with_msg (static call chains are different); /* Checking of argument. */ for (i = 0; i gimple_call_num_args (s1); ++i) --- gcc/testsuite/gcc.dg/ubsan/ipa-icf-1.c.jj 2014-11-10 19:00:53.509525071 +0100 +++ gcc/testsuite/gcc.dg/ubsan/ipa-icf-1.c 2014-11-10 19:02:21.836925806 +0100 @@ -0,0 +1,23 @@ +/* { dg-do run } */ +/* { dg-skip-if { *-*-* } { * } { -O2 } } */ +/* { dg-options -fsanitize=undefined -fipa-icf } */ + +__attribute__ ((noinline, noclone)) +int f1 (int x, int y) +{ + return x + y; +} + +__attribute__ ((noinline, noclone)) +int f2 (int x, int y) +{ + return x - y; +} + +int +main () +{ + if (f1 (5, 6) != 11 || f2 (5, 6) != -1) +__builtin_abort (); + return 0; +} --- gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c.jj2014-11-10 18:59:16.604294652 +0100 +++ gcc/testsuite/gcc.dg/ipa/ipa-icf-31.c 2014-11-10 18:59:59.690519616 +0100 @@ -0,0 +1,41 @@ +/* {
Re: Fix libtool.m4 for Darwin = 10.10
It looks like you missed patching a few configure files... libjava/classpath/configure libjava/configure Aren’t those under external control? i.e. maintained out of GCC tree? libgo/configure zlib/configure Those are maintained upstream, and we import them directly. I’ve filed a bug for libgo (https://code.google.com/p/go/issues/detail?id=9089). FX
[PATCH] Fix for mklog
Hi all! I found another issue of mklog. Example: --- a/gcc/asan.h +++ b/gcc/asan.h @@ -103,4 +103,14 @@ asan_intercepted_p (enum built_in_function fcode) || fcode == BUILT_IN_STRNCMP || fcode == BUILT_IN_STRNCPY; } + +/* Convert LEN to HOST_WIDE_INT if possible. + Returns -1 otherwise. */ + +static inline HOST_WIDE_INT +maybe_tree_to_shwi (tree len) +{ + return tree_fits_shwi_p (len) ? tree_to_shwi (len) : -1; +} + #endif /* TREE_ASAN */ mklog output: gcc/ChangeLog: DATE * asan.h (asan_intercepted_p): (maybe_tree_to_shwi): Currently mklog finds some changes for asan_intercepted_p which are do not exist. Patched mklog output: gcc/ChangeLog: DATE * asan.h (maybe_tree_to_shwi): Attached patch make mklog to stop search for changes inside function once '}' occur. Ok, to commit? --Marat contrib/ChangeLog: 2014-11-06 Marat Zakirov m.zaki...@samsung.com * mklog: Symbol '}' stops search for changes. diff --git a/contrib/mklog b/contrib/mklog index 6ed4c6e..7de485d 100755 --- a/contrib/mklog +++ b/contrib/mklog @@ -117,7 +117,7 @@ sub is_top_level { } else { $function =~ s/^.//; } - return $function $function !~ /^[\s{}]/; + return $function $function !~ /^[\s{]/; } # For every file in the .diff print all the function names in ChangeLog
Re: [PATCH, aarch64] Add prefetch support
On 30 October 2014 08:54, Gopalasubramanian, Ganesh ganesh.gopalasubraman...@amd.com wrote: 2014-10-30 Ganesh Gopalasubramanian ganesh.gopalasubraman...@amd.com Check the whitespace in your ChangeLog line. * config/arm/types.md (define_attr type): Add prefetch. The existing schedulers use 'load1'. We can of course split that into two introducing prefetch and update all of the existing schedulers to reflect the change. However I suggest we do that as a separate activity when someone actually needs the distinction, note this change will require updating the schedulers for both ARM and AArch64 backends not just those relevant to AArch64. For this prefetch patch I suggest we go with the existing load1. The inline patch has been munged by your mailer, I tried applying the patch to my tree but it is full of escape sequences. Can you either fix your mailer or submit patches as attachments? diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 74b554e..12a3f170 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -320,6 +320,38 @@ [(set_attr type no_insn)] ) + +(define_insn prefetch + [(prefetch (match_operand:DI 0 address_operand r) +(match_operand:QI 1 const_int_operand ) +(match_operand:QI 2 const_int_operand ))] + + * +{ Use {} instead of *{, then all of the extra quoting in the C below goes away. + const char * pftype[2][10] += { {\PLDL1STRM\, \PLDL3KEEP\, \PLDL2KEEP\, \PLDL1KEEP\}, + {\PSTL1STRM\, \PSTL3KEEP\, \PSTL2KEEP\, \PSTL1KEEP\}, + }; + + int locality = INTVAL (operands[2]); + char pattern[100]; + + gcc_assert (IN_RANGE (locality, 0, 3)); + + strcpy (pattern, \prfm\\t\); + strcat (pattern, (const char*)pftype[INTVAL(operands[1])][locality]); + strcat (pattern, \, %a0\); Use sprintf() rather that multiple calls to cpy and cat. I suspect the cast in front of pftype is superflous? + + output_asm_insn (pattern, + operands); Unnecessary line break. Cheers /Marcus
Re: [PATCH][AArch64] LR register not used in leaf functions
On 30 September 2014 16:00, Jiong Wang jiong.w...@arm.com wrote: gcc/ * config/aarch64/aarch64.h (CALL_USED_REGISTERS): Mark LR as caller-save. (EPILOGUE_USES): Guard the check by epilogue_completed. * config/aarch64/aarch64.c (aarch64_layout_frame): Explictly check for LR. (aarch64_can_eliminate): Check LR_REGNUM liveness. gcc/testsuite/ * gcc.target/aarch64/lr_free_1.c: New testcase for -fomit-frame-pointer. * gcc.target/aarch64/lr_free_2.c: New testcase for leaf -fno-omit-frame-pointer. OK /Marcus
Re: [PATCH][AArch64] Properly guard CUMULATIVE_ARGS definition and remove 'enum' from machine_mode in aarch64.h
On 31 October 2014 11:21, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: Hi all, Following up from https://gcc.gnu.org/ml/gcc-patches/2014-10/msg03153.html this fixes up the aarch64 port accordingly to guard CUMULATIVE_ARGS properly so that we can remove the enum keyword from machine_mode. OK /Marcus
Re: [PATCH] Fix for mklog
On 11/11/14 09:46, Marat Zakirov wrote: Attached patch make mklog to stop search for changes inside function once '}' occur. Ok, to commit? OK. Thanks. Diego.
Re: [patch] OpenACC fortran front end
On Tue, 11 Nov 2014 08:10:29 +0100 Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 10, 2014 at 02:43:38PM -0800, Cesar Philippidis wrote: I'll post a separate patch with the fortran tests later. If anyone wants to test this patch, please use gomp-4_0-branch instead. You don't need a CUDA accelerator to use OpenACC, and some of the runtime tests will fail because that branch doesn't include the nvptx backend. Now that the first series of PTX target patches have been committed: I assume it is still true that nvptx doesn't work because the libgomp bits aren't in yes, isn't it? That's correct. The nvptx backend also depends on the offloading changes that a team from Intel is working on for the MIC target. But Julian should be posting the libgomp patches tomorrow, I think, since his changes are somewhat self-contained. For the middle-end and libgomp changes, can you talk to the Intel folks to update their git branch to latest trunk (so that you have the nvptx bits in there) and send middle-end and libgomp diffs against that? As far as I remember, most of the changes from the branch are now approved, they are just waiting for review of the LTO related changes in the middle-end (please, correct me if I've missed something). We've been preparing new patches against trunk for the libgomp and middle-end bits: I've now posted the former, and the latter are on their way soon, I believe. The middle-end bits are also present on the gomp-4_0-branch SVN branch (likewise, the libgomp pieces), and I believe we're planning to merge the PTX bits there also now they've been committed to trunk. Is it really worthwhile merging our patches to yet another branch at this stage? Thanks, Julian
[gomp4] Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end
Hi! On Thu, 24 Jul 2014 15:11:08 +0200, I wrote: On Wed, 23 Jul 2014 17:42:32 -0700, Cesar Philippidis ce...@codesourcery.com wrote: On 07/11/2014 03:29 AM, Jakub Jelinek wrote: On Fri, Jul 11, 2014 at 12:11:10PM +0200, Thomas Schwinge wrote: To avoid duplication of work: with Jakub's Fortran OpenMP 4 target changes recently committed to trunk, and now merged into gomp-4_0-branch, I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4 target changes removed, and TODO markers added to integrate into that. Resolving the TODO markers would be nice, indeed. This patch has the openacc data clauses use the new openmp maps. In the process of doing so, I removed a lot of the old OMP_LIST_ enums and added a few OMP_MAP enums to match what the c frontend currently supports. Thanks! OMP_LIST_DEVICEPTR remains to be converted, which can be done as a later follow-up patch. I have now committed the following to gomp-4_0-branch in r217352: commit 779291a1fe21b3c0b0c0c615a0557f070f495d14 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Nov 11 14:52:04 2014 + OpenACC deviceptr clause: Fix handling in Fortran. With two gcc_asserts restored, and not handling OpenACC deviceptr clauses in the same data paths as other OpenACC data clauses, we'd run into an internal compiler error, when the deviceptr clause is used with (non-offloaded) OpenACC data regions: FAIL: gfortran.dg/goacc/data-tree.f95 -O (internal compiler error) FAIL: gfortran.dg/goacc/data-tree.f95 -O (test for excess errors) gcc/fortran/ * gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead... (enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here. * dump-parse-tree.c (show_omp_clauses): Update. * openmp.c (gfc_match_omp_clauses, resolve_omp_clauses) (gfc_resolve_oacc_declare): Likewise. * trans-openmp.c (gfc_trans_omp_clauses): Likewise. gcc/ * omp-low.c (lower_omp_target): Restore two gcc_asserts. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217352 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp| 4 gcc/fortran/ChangeLog.gomp| 9 + gcc/fortran/dump-parse-tree.c | 1 - gcc/fortran/gfortran.h| 6 +++--- gcc/fortran/openmp.c | 38 ++ gcc/fortran/trans-openmp.c| 6 +++--- gcc/omp-low.c | 2 ++ 7 files changed, 43 insertions(+), 23 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index 9c997ce..dacfad8 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,3 +1,7 @@ +2014-11-11 Thomas Schwinge tho...@codesourcery.com + + * omp-low.c (lower_omp_target): Restore two gcc_asserts. + 2014-11-06 Thomas Schwinge tho...@codesourcery.com * gimple.h (is_gimple_omp_oacc_specifically): Return true for diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp index d10560e..1ae1d31 100644 --- gcc/fortran/ChangeLog.gomp +++ gcc/fortran/ChangeLog.gomp @@ -1,3 +1,12 @@ +2014-11-11 Thomas Schwinge tho...@codesourcery.com + + * gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead... + (enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here. + * dump-parse-tree.c (show_omp_clauses): Update. + * openmp.c (gfc_match_omp_clauses, resolve_omp_clauses) + (gfc_resolve_oacc_declare): Likewise. + * trans-openmp.c (gfc_trans_omp_clauses): Likewise. + 2014-11-05 Thomas Schwinge tho...@codesourcery.com * openmp.c (OMP_CLAUSE_HOST, OMP_CLAUSE_SELF): Merge into the new diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c index 57af730..e7aff22 100644 --- gcc/fortran/dump-parse-tree.c +++ gcc/fortran/dump-parse-tree.c @@ -1252,7 +1252,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) switch (list_type) { case OMP_LIST_COPY: type = COPY; break; - case OMP_LIST_DEVICEPTR: type = DEVICEPTR; break; case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break; case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break; case OMP_LIST_CACHE: type = ; break; diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h index 6bd131c..18adbee 100644 --- gcc/fortran/gfortran.h +++ gcc/fortran/gfortran.h @@ -1141,7 +1141,8 @@ typedef enum OMP_MAP_FORCE_TO, OMP_MAP_FORCE_FROM, OMP_MAP_FORCE_TOFROM, - OMP_MAP_FORCE_PRESENT + OMP_MAP_FORCE_PRESENT, + OMP_MAP_FORCE_DEVICEPTR } gfc_omp_map_op; @@ -1184,8 +1185,7 @@ enum OMP_LIST_REDUCTION, OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY, - OMP_LIST_DEVICEPTR, - OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DEVICEPTR, + OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DATA_CLAUSE_FIRST, OMP_LIST_DEVICE_RESIDENT, OMP_LIST_USE_DEVICE, OMP_LIST_CACHE, diff --git gcc/fortran/openmp.c gcc/fortran/openmp.c
Re: Fix libtool.m4 for Darwin = 10.10
On Tue, Nov 11, 2014 at 9:45 AM, FX fxcoud...@gmail.com wrote: It looks like you missed patching a few configure files... libjava/classpath/configure libjava/configure Aren’t those under external control? i.e. maintained out of GCC tree? However these are maintained, the libjava configure files still need to be patched to prevent their associated shared libraries from being inappropriately linked with -flat_namespace on darwin14 and later. Since you are simply patching all the configure files, the question seems academic unless you switch to properly regenerating all of the configure files using a fixed libtool.m4. Jack libgo/configure zlib/configure Those are maintained upstream, and we import them directly. I’ve filed a bug for libgo (https://code.google.com/p/go/issues/detail?id=9089). FX
Re: [PATCH] __builtin_*_overflow builtins (PR c/59708)
Hello! This patch implements what I understood from Joseph's https://gcc.gnu.org/ml/gcc/2013-10/msg00280.html and also adds clang compatible builtins (which implement small subset of the typegeneric ones). Besides the clang compatibility builtins, there are 3 new type-generic builtins, __builtin_{add,sub,mul}_overflow, which have 3 arguments, two arbitrary integral arguments, and pointer to some integer type. These builtins extend both arguments to infinite precision signed arguments, perform {+,-,*} operations in the infinite precision and finally cast the result to the type pointed by the third argument and store the result there (modulo 2^precision of the type). If the infinite precision result is equal to the stored value, the built-ins return false (no overflow), otherwise they return true. The built-ins are folded immediately into internal functions that return both results (integer result and boolean overflow flag) as _Complex integer result, so that the integer result doesn't have to be addressable. It partly reuses code to emit -fsanitize=signed-integer-overflow internal functions, for signed overflows on e.g. i?86 will use jo/jno/seto/setno instructions after the arithmetic instructions; for imsogmed arithmetic overflow, combiner manages to transform what is emitted into jc/jnc/setc/setnc where possible. After discussions with Richard on IRC, the internal functions have arbitrary integral arguments, which can have different or same signs, different or same precisions, and the result type is _Complex integer derived from the call's third argument. gimple-fold.c and tree-vrp.c is tought to perform some optimizations on these, and most of the smarts are performed during expansion (many of the 16 different +/- signarg1/signarg2/signresult cases require different code, and for * there are also a couple of different cases). If somebody can come up with some shorter sequence how to test for the less common cases, I'd appreciate hints (internal-fn.c has big comments which explain how it now computes the integral result and especially the overflow flag). Bootstrapped/regtested on x86_64-linux and i686-linux (on top of the ICF gimple_call fix I've mailed a few minutes ago), ok for trunk? 2014-11-10 Jakub Jelinek ja...@redhat.com PR c/59708 * builtin-attrs.def (ATTR_NOTHROW_TYPEGENERIC_LEAF): New attribute. * builtins.c (fold_builtin_arith_overflow): New function. (fold_builtin_3): Use it. * builtins.def (BUILT_IN_ADD_OVERFLOW, BUILT_IN_SUB_OVERFLOW, BUILT_IN_MUL_OVERFLOW, BUILT_IN_SADD_OVERFLOW, BUILT_IN_SADDL_OVERFLOW, BUILT_IN_SADDLL_OVERFLOW, BUILT_IN_SSUB_OVERFLOW, BUILT_IN_SSUBL_OVERFLOW, BUILT_IN_SSUBLL_OVERFLOW, BUILT_IN_SMUL_OVERFLOW, BUILT_IN_SMULL_OVERFLOW, BUILT_IN_SMULLL_OVERFLOW, BUILT_IN_UADDL_OVERFLOW, BUILT_IN_UADDLL_OVERFLOW, BUILT_IN_USUB_OVERFLOW, BUILT_IN_USUBL_OVERFLOW, BUILT_IN_USUBLL_OVERFLOW, BUILT_IN_UMUL_OVERFLOW, BUILT_IN_UMULL_OVERFLOW, BUILT_IN_UMULLL_OVERFLOW): New built-in functions. * builtin-types.def (BT_PTR_UINT, BT_PTR_ULONG, BT_PTR_LONGLONG, BT_FN_BOOL_INT_INT_INTPTR, BT_FN_BOOL_LONG_LONG_LONGPTR, BT_FN_BOOL_LONGLONG_LONGLONG_LONGLONGPTR, BT_FN_BOOL_UINT_UINT_UINTPTR, BT_FN_BOOL_ULONG_ULONG_ULONGPTR, BT_FN_BOOL_ULONGLONG_ULONGLONG_ULONGLONGPTR, BT_FN_BOOL_VAR): New. * expr.c (write_complex_part): Remove prototype, no longer static. * expr.h (write_complex_part): New prototype. * function.c (aggregate_value_p): For internal functions return 0. * gimple-fold.c (arith_overflowed_p, find_non_realpart_uses): New functions. (gimple_fold_call): Fold {ADD,SUB,MUL}_OVERFLOW internal calls. * gimple-fold.h (arith_overflowed_p): New prototype. * gimplify.c (gimplify_call_expr): Handle gimplification of internal calls with lhs. * internal-fn.c (get_range_pos_neg, get_min_precision, expand_arith_overflow_result_store): New functions. (ubsan_expand_si_overflow_addsub_check): Renamed to ... (expand_addsub_overflow): ... this. Add LOC, LHS, ARG0, ARG1, UNSR_P, UNS0_P, UNS1_P, IS_UBSAN arguments, remove STMT argument. Handle ADD_OVERFLOW and SUB_OVERFLOW expansion. (ubsan_expand_si_overflow_neg_check): Renamed to ... (expand_neg_overflow): ... this. Add LOC, LHS, ARG1, IS_UBSAN arguments, remove STMT argument. Handle SUB_OVERFLOW with 0 as first argument expansion. (ubsan_expand_si_overflow_mul_check): Renamed to ... (expand_mul_overflow): ... this. Add LOC, LHS, ARG0, ARG1, UNSR_P, UNS0_P, UNS1_P, IS_UBSAN arguments, remove STMT argument. Handle MUL_OVERFLOW expansion. (expand_UBSAN_CHECK_ADD): Use expand_addsub_overflow, prepare arguments for it. (expand_UBSAN_CHECK_SUB): Use expand_addsub_overflow or expand_neg_overflow, prepare arguments for it. (expand_UBSAN_CHECK_MUL): Use expand_mul_overflow, prepare arguments for it. (expand_arith_overflow, expand_ADD_OVERFLOW, expand_SUB_OVERFLOW, expand_MUL_OVERFLOW): New functions. * internal-fn.def (ADD_OVERFLOW, SUB_OVERFLOW,
Re: [C++ Patch] PR 63265
Hi, On 11/11/2014 02:19 PM, Jason Merrill wrote: On 11/11/2014 08:04 AM, Paolo Carlini wrote: -tree cond = RECUR (TREE_OPERAND (t, 0)); +tree cond + = maybe_constant_value (fold_non_dependent_expr_sfinae + (RECUR (TREE_OPERAND (t, 0)), tf_none)); I like this approach, but if the result of maybe_constant_value doesn't turn out to be an INTEGER_CST, we want to end up with the result of RECUR rather than the result of fold_non_dependent_expr, as the latter might not be suitable for subsequent tsubsting. I see. Something like the below, then? Thanks, Paolo. / Index: cp/pt.c === --- cp/pt.c (revision 217342) +++ cp/pt.c (working copy) @@ -15138,11 +15138,13 @@ tsubst_copy_and_build (tree t, case COND_EXPR: { tree cond = RECUR (TREE_OPERAND (t, 0)); + tree folded_cond = (maybe_constant_value + (fold_non_dependent_expr_sfinae (cond, tf_none))); tree exp1, exp2; - if (TREE_CODE (cond) == INTEGER_CST) + if (TREE_CODE (folded_cond) == INTEGER_CST) { - if (integer_zerop (cond)) + if (integer_zerop (folded_cond)) { ++c_inhibit_evaluation_warnings; exp1 = RECUR (TREE_OPERAND (t, 1)); @@ -15156,6 +15158,7 @@ tsubst_copy_and_build (tree t, exp2 = RECUR (TREE_OPERAND (t, 2)); --c_inhibit_evaluation_warnings; } + cond = folded_cond; } else { Index: testsuite/g++.dg/cpp0x/constexpr-63265.C === --- testsuite/g++.dg/cpp0x/constexpr-63265.C(revision 0) +++ testsuite/g++.dg/cpp0x/constexpr-63265.C(working copy) @@ -0,0 +1,19 @@ +// PR c++/63265 +// { dg-do compile { target c++11 } } + +#define LSHIFT (sizeof(unsigned int) * __CHAR_BIT__) + +template int lshift +struct SpuriouslyWarns1 { +static constexpr unsigned int v = lshift LSHIFT ? 1U lshift : 0; +}; + +static_assert(SpuriouslyWarns1LSHIFT::v == 0, Impossible occurred); + +template int lshift +struct SpuriouslyWarns2 { +static constexpr bool okay = lshift LSHIFT; +static constexpr unsigned int v = okay ? 1U lshift : 0; +}; + +static_assert(SpuriouslyWarns2LSHIFT::v == 0, Impossible occurred);
Re: Fix libtool.m4 for Darwin = 10.10
Since you are simply patching all the configure files, the question seems academic unless you switch to properly regenerating all of the configure files using a fixed libtool.m4. I am actually proposing to fix libtool.m4 and regenerate the configure scripts (which gives the same result as patching, as expected). However these are maintained, the libjava configure files still need to be patched to prevent their associated shared libraries from being inappropriately linked with -flat_namespace on darwin14 and later. Yes, but I don’t know whether libjava and classpath should be patched in GCC, or whether I should report them to be patched somewhere else (like libgo and zlib, for example). It’s important to do it properly, otherwise codebases diverge and maintance becomes difficult. FX
[gomp4] Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end
Hi! On Mon, 28 Jul 2014 10:00:46 -0700, Cesar Philippidis ce...@codesourcery.com wrote: On 07/25/2014 09:01 AM, Thomas Schwinge wrote: [...] you may directly fold in the following patch to nuke the unused OMP_LIST_COPY (or do that later). --- gcc/fortran/dump-parse-tree.c +++ gcc/fortran/dump-parse-tree.c @@ -1257,7 +1257,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) const char *type = NULL; switch (list_type) { - case OMP_LIST_COPY: type = COPY; break; case OMP_LIST_DEVICEPTR: type = DEVICEPTR; break; case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break; case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break; --- gcc/fortran/gfortran.h +++ gcc/fortran/gfortran.h @@ -1157,9 +1157,8 @@ enum OMP_LIST_TO, OMP_LIST_FROM, OMP_LIST_REDUCTION, - OMP_LIST_COPY, - OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY, OMP_LIST_DEVICEPTR, + OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_DEVICEPTR, OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DEVICEPTR, OMP_LIST_DEVICE_RESIDENT, OMP_LIST_USE_DEVICE, I'll take care of this separately. I have now committed the following to gomp-4_0-branch in r217353: commit 782a3dab5694d561f80bda7a29000250a681781a Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Nov 11 14:52:16 2014 + Fortran OMP_LIST_* maintenance. gcc/fortran/ * gfortran.h (OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST) (OMP_LIST_DATA_CLAUSE_LAST, OMP_LIST_LAST): Remove. * dump-parse-tree.c (show_omp_clauses): Update. * openmp.c (resolve_omp_clauses, gfc_resolve_oacc_declare): Likewise. * trans-openmp.c (gfc_trans_omp_clauses): Likewise. (gfc_trans_omp_map_clause_list): Remove. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217353 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/fortran/ChangeLog.gomp| 8 gcc/fortran/dump-parse-tree.c | 1 - gcc/fortran/gfortran.h| 6 +- gcc/fortran/openmp.c | 42 ++ gcc/fortran/trans-openmp.c| 31 --- 5 files changed, 19 insertions(+), 69 deletions(-) diff --git gcc/fortran/ChangeLog.gomp gcc/fortran/ChangeLog.gomp index 1ae1d31..f846890 100644 --- gcc/fortran/ChangeLog.gomp +++ gcc/fortran/ChangeLog.gomp @@ -1,5 +1,13 @@ 2014-11-11 Thomas Schwinge tho...@codesourcery.com + * gfortran.h (OMP_LIST_COPY, OMP_LIST_DATA_CLAUSE_FIRST) + (OMP_LIST_DATA_CLAUSE_LAST, OMP_LIST_LAST): Remove. + * dump-parse-tree.c (show_omp_clauses): Update. + * openmp.c (resolve_omp_clauses, gfc_resolve_oacc_declare): + Likewise. + * trans-openmp.c (gfc_trans_omp_clauses): Likewise. + (gfc_trans_omp_map_clause_list): Remove. + * gfortran.h (OMP_LIST_DEVICEPTR): Remove, and instead... (enum gfc_omp_map_op): ... add OMP_MAP_FORCE_DEVICEPTR here. * dump-parse-tree.c (show_omp_clauses): Update. diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c index e7aff22..e9d04e7 100644 --- gcc/fortran/dump-parse-tree.c +++ gcc/fortran/dump-parse-tree.c @@ -1251,7 +1251,6 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses) const char *type = NULL; switch (list_type) { - case OMP_LIST_COPY: type = COPY; break; case OMP_LIST_USE_DEVICE: type = USE_DEVICE; break; case OMP_LIST_DEVICE_RESIDENT: type = USE_DEVICE; break; case OMP_LIST_CACHE: type = ; break; diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h index 18adbee..aed37d3 100644 --- gcc/fortran/gfortran.h +++ gcc/fortran/gfortran.h @@ -1183,14 +1183,10 @@ enum OMP_LIST_TO, OMP_LIST_FROM, OMP_LIST_REDUCTION, - OMP_LIST_COPY, - OMP_LIST_DATA_CLAUSE_FIRST = OMP_LIST_COPY, - OMP_LIST_DATA_CLAUSE_LAST = OMP_LIST_DATA_CLAUSE_FIRST, OMP_LIST_DEVICE_RESIDENT, OMP_LIST_USE_DEVICE, OMP_LIST_CACHE, - OMP_LIST_NUM, - OMP_LIST_LAST = OMP_LIST_NUM + OMP_LIST_NUM }; /* Because a symbol can belong to multiple namelists, they must be diff --git gcc/fortran/openmp.c gcc/fortran/openmp.c index 82726b8..47c146e 100644 --- gcc/fortran/openmp.c +++ gcc/fortran/openmp.c @@ -2870,11 +2870,8 @@ resolve_omp_clauses (gfc_code *code, locus *where, static const char *clause_names[] = { PRIVATE, FIRSTPRIVATE, LASTPRIVATE, COPYPRIVATE, SHARED, COPYIN, UNIFORM, ALIGNED, LINEAR, DEPEND, MAP, - TO, FROM, REDUCTION, - COPY, COPYIN, COPYOUT, CREATE, DELETE, PRESENT, - PRESENT_OR_COPY, PRESENT_OR_COPYIN, PRESENT_OR_COPYOUT, - PRESENT_OR_CREATE, DEVICE_RESIDENT, USE_DEVICE, - HOST, DEVICE, CACHE }; + TO, FROM, REDUCTION, DEVICE_RESIDENT, USE_DEVICE, + CACHE }; if (omp_clauses == NULL) return; @@ -3231,15 +3228,6 @@ resolve_omp_clauses (gfc_code *code, locus *where, break; } -
Re: [PATCH] AIX: Filename-based shared library versioning for libgcc_s
Michael, Why does the configure change match with p*-*-aix... instead of power* or powerpc*? Yes, it's unique and will match, but why make it as short as possible, which doesn't match other uses? In your documentation, how are you distinguishing between Dynamic Linking and Runtime Linking? Thanks, David On Mon, Nov 10, 2014 at 12:41 PM, Michael Haubenwallner michael.haubenwall...@ssi-schaefer.com wrote: Am 2014-11-10 17:06, schrieb David Edelsohn: On Mon, Nov 10, 2014 at 4:59 AM, Michael Haubenwallner michael.haubenwall...@ssi-schaefer.com wrote: Am 2014-11-07 20:52, schrieb David Edelsohn: First, please explicitly copy me on AIX or PowerPC patches sent to gcc-patches. I don't have a fundamental objection to including this option, but note that Richi, Honza and I have discovered that using AIX runtime linking option interacts badly with some GCC optimizations and can result in applications that hang in a loop. Feels like adding the aix-soname linking procedure becomes more important: All code on AIX is position independent (PIC) by default. Executables and shared libraries essentially are PIE. Because of this, AIX does not provide separate static libraries and one can link statically with a shared library. Creating a library enabled for runtime linking with -G (-brtl), causes a lot of problems, including a newly recognized failure mode. Without careful control over AIX symbol export, all global calls with use glink code (equivalent to ELF PLTs). This also creates a TOC entry for every global call, possibly overflowing the TOC. About to define careful control over AIX symbol export: The symbols listed in the import file are those found in the object files only, not the ones created at linktime (like __GLOBAL*) or from the static objects found in libc.a. While I do this in libtool from the beginning here, I have had a helper script wrapping ld to support '--soname=' for non-libtool packages, where creating the import file from the final shared object also included static libc-provided symbols, which turned out as dependency pitfall. AIX added ELF-like visibility support to XCOFF, which would be preferred. Except it was not added in a formal release, like AIX 8.1 and apparently was back-ported to at least AIX 6.1, so its difficult to phase in the support. One would need to add a configure test for the feature and not all users are upgrading the system. So one cannot build and distribute GCC for AIX 7.1 and know the feature is available in the system tools. GCC builds would be incompatible and object files, libraries, executables created by GCC would be incompatible. Basically, a mess. As I've seen the weak information on an older AIX 5.3 TL8 already: Is this visibility support something different than what nm -l or nm -P shows? While I haven't focussed on nor explicitly tested, I do believe that this also solves problems with global C++ constructor/destructor call orders. Why? There still is the problem of the AIX kernel runtime loader ordering dependent shared objects. Feels like I indeed haven't digged deep enough into that topic yet: To be ignored here then. But the main problem is GCC uses aliases and functions declared as weak to support some C++ features. This is another reason why I do force runtime linking for our application, which uses these C++ features while its main target platform is Linux. You have not explained how this has any fix / benefit affecting the problem, other than separate shared and static libraries. Forcing runtime linking seems irrelevant. It was linking shared before and linking shared after your patch (with runtime linking) so the net effect is zero. My main reason here is to allow for *filename*-based sharedlib versioning, which I haven't been able to achive without import files. In-archive versioning is a pita from a package manager's point of view. For a second reason: Due to its Linux-centric history (well, HP-UX and Solaris before), our application architecture does rely on runtime linking in some corner cases. This is why I force that for AIX in our development- and runtime-platform, which is similar to /opt/freeware/, but based on Gentoo Prefix. For a third reason (maybe I don't have deep enough insight as well): If I understand correctly, you switched to build libstdc++ without runtime linking, because of problems when linking statically against the rtl-enabled libstdc++, no? For this case, by incident aix-soname does prevent shared objects built with runtime linking from being statically linked. For another reason: I can imaging to provide an rtl_enable'd libc.so as well, to allow for easier use of memory debuggers that intercept the malloc/free co libc calls... But AFAICT these rely on every sharedlib to be built with runtime linking enabled. Again, runtime linking of all global symbols affects performance and bloats the TOC, making TOC overflow more
Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
Hi! On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis cesar_philippi...@mentor.com wrote: On 07/07/2014 02:55 AM, Thomas Schwinge wrote: On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis cesar_philippi...@mentor.com wrote: This patch is the first step to enabling parallel reductions in openacc. I've committed this updated version of the patch. In r217354, I just applied the following cleanup to gomp-4_0-branch: commit 4fe8b3620b258ac904d9eade5f76dede69a80c98 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Nov 11 14:52:26 2014 + OpenACC reductions maintenance. gcc/ * omp-low.c (maybe_lookup_reduction): Don't require an OpenACC context. (lower_oacc_offload): Simplify use of maybe_lookup_reduction. gcc/ * omp-low.c (delete_omp_context): Dispose of reduction_map. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog.gomp | 6 ++ gcc/omp-low.c | 56 +- 2 files changed, 36 insertions(+), 26 deletions(-) diff --git gcc/ChangeLog.gomp gcc/ChangeLog.gomp index dacfad8..94a7f8c 100644 --- gcc/ChangeLog.gomp +++ gcc/ChangeLog.gomp @@ -1,5 +1,11 @@ 2014-11-11 Thomas Schwinge tho...@codesourcery.com + * omp-low.c (delete_omp_context): Dispose of reduction_map. + + * omp-low.c (maybe_lookup_reduction): Don't require an OpenACC + context. + (lower_oacc_offload): Simplify use of maybe_lookup_reduction. + * omp-low.c (lower_omp_target): Restore two gcc_asserts. 2014-11-06 Thomas Schwinge tho...@codesourcery.com diff --git gcc/omp-low.c gcc/omp-low.c index c63ec4e..5695ec3 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -938,7 +938,7 @@ get_base_type (tree decl) return type; } -/* Lookup variables in the decl or field splay trees. The maybe form +/* Lookup variables. The maybe form allows for the variable form to not have been entered, otherwise we assert that the variable must have been entered. */ @@ -975,17 +975,6 @@ lookup_sfield (tree var, omp_context *ctx) } static inline tree -lookup_reduction (const char *id, omp_context *ctx) -{ - gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt)); - - splay_tree_node n; - n = splay_tree_lookup (ctx-reduction_map, -(splay_tree_key) id); - return (tree) n-value; -} - -static inline tree maybe_lookup_field (tree var, omp_context *ctx) { splay_tree_node n; @@ -994,14 +983,22 @@ maybe_lookup_field (tree var, omp_context *ctx) } static inline tree +lookup_reduction (const char *id, omp_context *ctx) +{ + gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt)); + + splay_tree_node n; + n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id); + return (tree) n-value; +} + +static inline tree maybe_lookup_reduction (tree var, omp_context *ctx) { - gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt)); - - splay_tree_node n; - n = splay_tree_lookup (ctx-reduction_map, -(splay_tree_key) var); - return n ?(tree) n-value : NULL_TREE; + splay_tree_node n = NULL; + if (ctx-reduction_map) +n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var); + return n ? (tree) n-value : NULL_TREE; } /* Return true if DECL should be copied by pointer. SHARED_CTX is @@ -1574,6 +1571,11 @@ delete_omp_context (splay_tree_value value) splay_tree_delete (ctx-field_map); if (ctx-sfield_map) splay_tree_delete (ctx-sfield_map); + if (ctx-reduction_map + /* Shared over several omp_contexts. */ + (ctx-outer == NULL + || ctx-reduction_map != ctx-outer-reduction_map)) +splay_tree_delete (ctx-reduction_map); /* We hijacked DECL_ABSTRACT_ORIGIN earlier. We need to clear it before it produces corrupt debug information. */ @@ -10481,10 +10483,14 @@ lower_oacc_offload (gimple_stmt_iterator *gsi_p, omp_context *ctx) || (OMP_CLAUSE_MAP_KIND (c) != OMP_CLAUSE_MAP_FORCE_DEVICEPTR) || TREE_CODE (TREE_TYPE (ovar)) != ARRAY_TYPE); - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP -OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER -!OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c) -TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE) + if (maybe_lookup_reduction (var, ctx)) + { + gimplify_assign (x, var, ilist); + } + else if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP + OMP_CLAUSE_MAP_KIND (c) == OMP_CLAUSE_MAP_POINTER + !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c) + TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE) { tree avar
Re: Fix libtool.m4 for Darwin = 10.10
On Tue, Nov 11, 2014 at 03:59:03PM +0100, FX wrote: Since you are simply patching all the configure files, the question seems academic unless you switch to properly regenerating all of the configure files using a fixed libtool.m4. I am actually proposing to fix libtool.m4 and regenerate the configure scripts (which gives the same result as patching, as expected). However these are maintained, the libjava configure files still need to be patched to prevent their associated shared libraries from being inappropriately linked with -flat_namespace on darwin14 and later. Yes, but I don’t know whether libjava and classpath should be patched in GCC, or whether I should report them to be patched somewhere else (like libgo and zlib, for example). It’s important to do it properly, otherwise codebases diverge and maintance becomes difficult. libjava is maintained in GCC, libjava/classpath, while imported occassionally from upstream, would upon merge result in regenerating the generated files and thus should be patched too. For the latter, you should put it into libjava/classpath/ChangeLog.gcj. Jakub
Re: RFC: Update ISL under gcc/infrastructure/ ? // Remove CLooG?
Tobias, The only new regression seen in gcc trunk when using isl 0.14 with my mockup isl_0.14.diff patch is the failure... UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor (int c1 = 0; c1 n - 1; c1 += 1)\\n for (int c3 = 0; c3 n; c3 += 1)\\nS_4(c1, c3); 1 at both -m32/-m64. Is this really a regression or simply detection of a change in the tree-dump generated by isl 0.14? Jack ps I was under the impression that these later versions of isl were supposed to have improved performance that potentially would result in changes in such tree-dumps. On Mon, Nov 10, 2014 at 8:40 PM, Jack Howarth howarth.at@gmail.com wrote: On x86_64-apple-darwin14, the attached patch allows gcc trunk to build against isl 0.14. I assume if we want to retain the... #if defined(__cplusplus) extern C { #endif #if defined(__cplusplus) } #endif wrappers around the include of isl/val_gmp.h, to continue to support isl 0.12.2, isl.m4 will need to test for isl = 0.12.2 and set a define in autohost.h that can be added to the conditional on _cplusplus. The same define would have to be used in a conditional for selecting code changes required for using... if (isl_band_member_is_zero_distance (Band, i)) in gcc/graphite-optimize-isl.c for isl = 0.12.2 rather than... if (isl_band_member_is_coincident (Band, i)) and the other associated changes for isl 0.12.2. Jack ps The changes in gcc/graphite-optimize-isl.c are modelled on those in https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=191650#c6. pps The test suite results for make -k check RUNTESTFLAGS=graphite.exp --target_board=unix'{-m32,-m64}' are... LAST_UPDATED: Obtained from SVN: trunk revision 217269 Native configuration is x86_64-apple-darwin13.4.0 === g++ tests === Running target unix/-m32 === g++ Summary for unix/-m32 === # of expected passes 27 Running target unix/-m64 === g++ Summary for unix/-m64 === # of expected passes 27 === g++ Summary === # of expected passes 54 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/testsuite/g++/../../xg++ version 5.0.0 20141109 (experimental) (GCC) === gcc tests === Running target unix/-m32 FAIL: gcc.dg/graphite/vect-pr43423.c scan-tree-dump-times vect vectorized 2 loops 1 UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor (int c1 = 0; c1 n - 1; c1 += 1)\\n for (int c3 = 0; c3 n; c3 += 1)\\nS_4(c1, c3); 1 === gcc Summary for unix/-m32 === # of expected passes 299 # of unexpected failures 1 # of expected failures 4 # of unresolved testcases 1 # of unsupported tests 5 Running target unix/-m64 FAIL: gcc.dg/graphite/vect-pr43423.c scan-tree-dump-times vect vectorized 2 loops 1 UNRESOLVED: gcc.dg/graphite/isl-codegen-loop-dumping.c scan-tree-dump-times graphite ISL AST generated by ISL: \\nfor (int c1 = 0; c1 n - 1; c1 += 1)\\n for (int c3 = 0; c3 n; c3 += 1)\\nS_4(c1, c3); 1 === gcc Summary for unix/-m64 === # of expected passes 299 # of unexpected failures 1 # of expected failures 4 # of unresolved testcases 1 # of unsupported tests 5 === gcc Summary === # of expected passes 598 # of unexpected failures 2 # of expected failures 8 # of unresolved testcases 2 # of unsupported tests 10 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/xgcc version 5.0.0 20141109 (experimental) (GCC) === gfortran tests === Running target unix/-m32 === gfortran Summary for unix/-m32 === # of expected passes 112 # of expected failures 14 Running target unix/-m64 === gfortran Summary for unix/-m64 === # of expected passes 110 # of expected failures 14 # of unsupported tests 2 === gfortran Summary === # of expected passes 222 # of expected failures 28 # of unsupported tests 2 /sw/src/fink.build/gcc50-5.0.0-1000/darwin_objdir/gcc/testsuite/gfortran/../../gfortran version 5.0.0 20141109 (experimental) (GCC) === libgomp tests === Running target unix/-m32 === libgomp Summary for unix/-m32 === # of expected passes 49 Running target unix/-m64 === libgomp Summary for unix/-m64 === # of expected passes 49 === libgomp Summary === # of expected passes 98 Compiler version: 5.0.0 20141109 (experimental) (GCC) Platform: x86_64-apple-darwin13.4.0 configure flags: --prefix=/sw --prefix=/sw/lib/gcc5.0 --mandir=/sw/share/man --infodir=/sw/lib/gcc5.0/info --enable-languages=c,c++,fortran,lto,objc,obj-c++,java --with-gmp=/sw --with-libiconv-prefix=/sw --with-isl=/sw --without-cloog --with-mpc=/sw --with-system-zlib --x-includes=/usr/X11R6/include --x-libraries=/usr/X11R6/lib --program-suffix=-fsf-5.0 On Mon, Nov 10, 2014 at 2:27 PM, Jack Howarth howarth.at@gmail.com wrote: Is the current isl 0.12.2 in infrastructure
Re: [C++ Patch] PR 63265
OK. Jason
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
Many thanks. Here is the new patch that fixes the main problem of the previous one (i.e separation of the loop after unroll and jam) as well as the problems raised by you (see comments below). Now the code with the separation class option looks: ISL AST generated by ISL: { for (int c0 = 0; c0 HEIGHT - 4; c0 += 4) for (int c1 = 0; c1 LENGTH - 3; c1 += 1) for (int c2 = c0; c2 = c0 + 3; c2 += 1) S_4(c2, c1); for (int c1 = 0; c1 LENGTH - 3; c1 += 1) for (int c2 = -((HEIGHT - 1) % 4) + HEIGHT - 1; c2 HEIGHT; c2 += 1) S_4(c2, c1); } I tried the unroll option for AST, two loops are unrolled and the code looks like: ISL AST generated by ISL: { for (int c0 = 0; c0 HEIGHT - 4; c0 += 4) for (int c1 = 0; c1 LENGTH - 3; c1 += 1) { S_4(c0, c1); S_4(c0 + 1, c1); S_4(c0 + 2, c1); S_4(c0 + 3, c1); } for (int c1 = 0; c1 LENGTH - 3; c1 += 1) { S_4(-((HEIGHT - 1) % 4) + HEIGHT - 1, c1); if ((HEIGHT - 1) % 4 = 1) { S_4(-((HEIGHT - 1) % 4) + HEIGHT, c1); if ((HEIGHT - 1) % 4 = 2) { S_4(-((HEIGHT - 1) % 4) + HEIGHT + 1, c1); if (HEIGHT % 4 == 0) S_4(HEIGHT - 1, c1); } } } } As I don't quite like the unrolling of the second loop, and the GCC standard unrolling is able to unroll the first one, decided to not perfrom the unrolling within graphite. But if desirable, it could be done. The patch remains basically the same, two maps are build, one for the regular unroll and jam (i.e. stride mining) and the other for computing the separating class (i.e. its image is the image of the full tiles on strided dimension). In graphite-isl-ast-to-gimple.c, these two maps are used to build the separation class option and fix the scheduling. The main differences from the previous path is that the option separating class is set on a different dimension and a contraint was added to to the map used to build the separating_class. Now some comments to your message: I'm not sure if Tobi or Albert have told you, but the separation_class option is going to be phased out since its design is fundamentally flawed. If you can't wait until isl-0.15, then I guess you have no choice but to use this option, but you should realize that it will remain frozen in its current broken state (until it is removed at some point). No, didn't know about the phase out of separation_class option. Anyway, for the time being is the best solution available. My understanding is that this option should always generate correct code, of course as long as the scheduling is correct, but think that had some cases when setting the separating_class leads to incorrect code. For isl_0.15, do you intend to provide some option with a similar functionality ? + /* Extract the original and auxiliar maps from pbb-transformed. + Set pbb-transformed to the original map. */ + psmap = smap; + psmap-n = 0; + res = isl_map_foreach_basic_map (pbb-transformed, separate_map, (void *)psmap); + gcc_assert (res == 0); + + isl_map_free(pbb-transformed); + pbb-transformed = isl_map_copy(psmap-map_arr[0]); + I have no idea what this pbb-transformed is supposed to represent, but you appear to be assuming that it has exactly two disjuncts and that they appear in some order. Now, perhaps you have explicitly checked that this map has two disjuncts, but then you should probably bring the check closer since any operation on sets that you perform could change the internal representation (even of other sets). However, in no way can you assume that isl_map_foreach_basic_map will iterate over these disjuncts in any specific order. At this point pbb-transformed has two basic maps, one is the mapping for unroll and jam, and one for the full tile for the striped dimension. Introduce a check that differentiate between them as the image of one maps should be included in the other. In fact to prevent any isl side-effects, thought to introduce a new field pbb-transformed_full in the pbb structure to be on the safe side. Index: gcc/toplev.c === --- gcc/toplev.c (revision 217013) +++ gcc/toplev.c (working copy) @@ -1302,11 +1302,12 @@ || flag_loop_block || flag_loop_interchange || flag_loop_strip_mine - || flag_loop_parallelize_all) + || flag_loop_parallelize_all + || flag_loop_unroll_jam) sorry (Graphite loop optimizations cannot be used (ISL is not available) (-fgraphite, -fgraphite-identity, -floop-block, -floop-interchange, -floop-strip-mine, -floop-parallelize-all, - and -ftree-loop-linear)); + -floop-unroll-and-jam, and -ftree-loop-linear)); #endif /* One region RA really helps to decrease the code size. */ Index: gcc/graphite-optimize-isl.c === ---
Re: [GRAPHITE, PATCH] Loop unroll and jam optimization
Changed the option to -floop-unroll-and jam as you suggested. The patch takes advantage of the new isl based code generator introduced recently in GCC (in fact of the possible options for building the AST). The code generated for this optimization in the case of non-constant loop bounds initially looks as below. This is not very useful because the standard GCC unrolling don't succeed to unroll the most inner loop. ISL AST generated by ISL: for (int c0 = 0; c0 HEIGHT; c0 += 4) for (int c1 = 0; c1 LENGTH - 3; c1 += 1) for (int c2 = c0; c2 = min(HEIGHT - 1, c0 + 3); c2 += 1) Hmm, so this iterates at most 4 times, right? Eventually the body is considered too large by GCC or it fails to compute an upper bound for the number of iterations. Is that (an upper bound for the number of iterations) available readily from ISL at code-generation time? If so you can transfer this knowledge to the GCC loop information. The problem was not explained well. It is not only the unrolling, it is also the loop separation (which the latest version of the patch does). Even if the gcc unrolling succeeds to unroll the inner loop you will get a code similar with the one obtained by the previous version of this patch, which is not what is wanted. Last time when checked, GCC unrolling was not able to unroll the inner loop. In my opinion it is the min and max that prevent it (graphite for blocking, strip-mine, unroll and jam emits such code). The bounds of the iteration domain are expressed in min, max terms. I'm curious to see a testcase (and a way to generate the above form) to see what is actually the problem. Of course. Take the code from the unroll-and-jam patch and the attached test case (but as said other graphite options will generate similar code). But somehow it seems that the new isl based code generator could handle more easily such transformations. Mircea Thanks, Richard. S_4(c2, c1); Now, the separating class option (set for unroll and jam) produces this nice loop structure: ISL AST generated by ISL: for (int c0 = 0; c0 HEIGHT; c0 += 4) for (int c1 = 0; c1 LENGTH - 3; c1 += 1) if (HEIGHT = c0 + 4) { for (int c2 = c0; c2 = c0 + 3; c2 += 1) S_4(c2, c1); } else for (int c2 = c0; c2 HEIGHT; c2 += 1) S_4(c2, c1); The unroll option (set for unroll and jam) produces: ISL AST generated by ISL: for (int c0 = 0; c0 HEIGHT; c0 += 4) for (int c1 = 0; c1 LENGTH - 3; c1 += 1) if (HEIGHT = c0 + 4) { S_4(c0, c1); S_4(c0 + 1, c1); S_4(c0 + 2, c1); S_4(c0 + 3, c1); } else { S_4(c0, c1); if (HEIGHT = c0 + 2) { S_4(c0 + 1, c1); if (4 * floord(HEIGHT - 3, 4) + 3 == HEIGHT c0 + 3 == HEIGHT) S_4(HEIGHT - 1, c1); } } The separate option (set by default for all dimensions for the new isl based code generator) don't succeed to remove the ifs from the loops and generate two loop structures (this would have been highly desirable). As the stage 1 is going to close soon, quick feedback to this patch is greatly appreciated. Many thanks, Mircea Namolaru int f1(int v[1024][1024], int HEIGHT, int LENGTH) { int i, j; for (i=0; iHEIGHT; i++) { for (j=3; j LENGTH; j++) { v[i][j] = v[i][j-3] + v[i][j-2] + v[i][j]; } } }
[x86, merge] Replace builtins with vector extensions
Hello, here is the combined patch+ChangeLog. I'll run a last regtest just before committing. Ok for trunk? 2014-11-12 Marc Glisse marc.gli...@inria.fr gcc/ * config/i386/xmmintrin.h (_mm_add_ps, _mm_sub_ps, _mm_mul_ps, _mm_div_ps, _mm_store_ss, _mm_cvtss_f32): Use vector extensions instead of builtins. * config/i386/emmintrin.h (__v2du, __v4su, __v8hu, __v16qu): New typedefs. (_mm_sqrt_sd): Fix comment. (_mm_add_epi8, _mm_add_epi16, _mm_add_epi32, _mm_add_epi64, _mm_sub_epi8, _mm_sub_epi16, _mm_sub_epi32, _mm_sub_epi64, _mm_mullo_epi16, _mm_cmpeq_epi8, _mm_cmpeq_epi16, _mm_cmpeq_epi32, _mm_cmplt_epi8, _mm_cmplt_epi16, _mm_cmplt_epi32, _mm_cmpgt_epi8, _mm_cmpgt_epi16, _mm_cmpgt_epi32, _mm_and_si128, _mm_or_si128, _mm_xor_si128, _mm_store_sd, _mm_cvtsd_f64, _mm_storeh_pd, _mm_cvtsi128_si64, _mm_cvtsi128_si64x, _mm_add_pd, _mm_sub_pd, _mm_mul_pd, _mm_div_pd, _mm_storel_epi64, _mm_movepi64_pi64): Use vector extensions instead of builtins. * config/i386/smmintrin.h (_mm_cmpeq_epi64, _mm_cmpgt_epi64, _mm_mullo_epi32): Likewise. * config/i386/avxintrin.h (__v4du, __v8su, __v16hu, __v32qu): New typedefs. (_mm256_add_pd, _mm256_add_ps, _mm256_div_pd, _mm256_div_ps, _mm256_mul_pd, _mm256_mul_ps, _mm256_sub_pd, _mm256_sub_ps): Use vector extensions instead of builtins. * config/i386/avx2intrin.h (_mm256_cmpeq_epi8, _mm256_cmpeq_epi16, _mm256_cmpeq_epi32, _mm256_cmpeq_epi64, _mm256_cmpgt_epi8, _mm256_cmpgt_epi16, _mm256_cmpgt_epi32, _mm256_cmpgt_epi64, _mm256_and_si256, _mm256_or_si256, _mm256_xor_si256, _mm256_add_epi8, _mm256_add_epi16, _mm256_add_epi32, _mm256_add_epi64, _mm256_mullo_epi16, _mm256_mullo_epi32, _mm256_sub_epi8, _mm256_sub_epi16, _mm256_sub_epi32, _mm256_sub_epi64): Likewise. * config/i386/avx512fintrin.h (__v8du, __v16su, __v32hu, __v64qu): New typedefs. (_mm512_or_si512, _mm512_or_epi32, _mm512_or_epi64, _mm512_xor_si512, _mm512_xor_epi32, _mm512_xor_epi64, _mm512_and_si512, _mm512_and_epi32, _mm512_and_epi64, _mm512_mullo_epi32, _mm512_add_epi64, _mm512_sub_epi64, _mm512_add_epi32, _mm512_sub_epi32, _mm512_add_pd, _mm512_add_ps, _mm512_sub_pd, _mm512_sub_ps, _mm512_mul_pd, _mm512_mul_ps, _mm512_div_pd, _mm512_div_ps): Use vector extensions instead of builtins. * config/i386/avx512bwintrin.h (_mm512_mullo_epi16, _mm512_add_epi8, _mm512_sub_epi8, _mm512_sub_epi16, _mm512_add_epi16): Likewise. * config/i386/avx512dqintrin.h (_mm512_mullo_epi64): Likewise. * config/i386/avx512vldqintrin.h (_mm256_mullo_epi64, _mm_mullo_epi64): Likewise. gcc/testsuite/ * gcc.target/i386/intrinsics_opt-1.c: New testcase. * gcc.target/i386/intrinsics_opt-2.c: Likewise. * gcc.target/i386/intrinsics_opt-3.c: Likewise. * gcc.target/i386/intrinsics_opt-4.c: Likewise. -- Marc Glissediff -ru -N -x .svn trunk/gcc/config/i386/avx2intrin.h intrin/gcc/config/i386/avx2intrin.h --- trunk/gcc/config/i386/avx2intrin.h 2014-04-01 07:34:06.335878860 +0200 +++ intrin/gcc/config/i386/avx2intrin.h 2014-11-10 21:56:37.040719810 +0100 @@ -104,28 +104,28 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi8 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_paddb256 ((__v32qi)__A, (__v32qi)__B); + return (__m256i) ((__v32qu)__A + (__v32qu)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi16 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_paddw256 ((__v16hi)__A, (__v16hi)__B); + return (__m256i) ((__v16hu)__A + (__v16hu)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi32 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_paddd256 ((__v8si)__A, (__v8si)__B); + return (__m256i) ((__v8su)__A + (__v8su)__B); } extern __inline __m256i __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_add_epi64 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_paddq256 ((__v4di)__A, (__v4di)__B); + return (__m256i) ((__v4du)__A + (__v4du)__B); } extern __inline __m256i @@ -178,7 +178,7 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_and_si256 (__m256i __A, __m256i __B) { - return (__m256i) __builtin_ia32_andsi256 ((__v4di)__A, (__v4di)__B); + return (__m256i) ((__v4du)__A (__v4du)__B); } extern __inline __m256i @@ -230,59 +230,56 @@ __attribute__ ((__gnu_inline__, __always_inline__, __artificial__)) _mm256_cmpeq_epi8 (__m256i __A, __m256i __B) { - return (__m256i)__builtin_ia32_pcmpeqb256 ((__v32qi)__A, (__v32qi)__B); + return (__m256i) ((__v32qi)__A == (__v32qi)__B); }
Re: [PATCH] Fix PR56480 aka DR374. Allow explicit specialization in enclosing namespace.
On 11/08/2014 06:57 AM, Markus Trippelsdorf wrote: +++ b/gcc/testsuite/g++.old-deja/g++.pt/explicit73.C @@ -7,9 +7,9 @@ // the template namespace N { - template class T class foo; // { dg-error } referenced below + template class T class foo; // { dg-error { target { ! c++11 } } } referenced below } using namespace N; -template class foovoid; // { dg-error } invalid specialization +template class foovoid; // { dg-error { target { ! c++11 } } } invalid specialization This should still get an error in C++11 mode. I think we also need to test this: namespace A { namespace B { template class T void f(); } using namespace B; } template void A::f(); // { dg-error } I think your code won't catch this, because we need to know what the explicit namespace was, not just whether there was one. Can we handle this in check_explicit_specialization rather than all the way down in register_specialization? Jason
Re: [patch,gomp-4_0-branch] openacc parallel reduction part 1
Hi! On Tue, 11 Nov 2014 16:03:05 +0100, I wrote: On Tue, 8 Jul 2014 07:28:24 -0700, Cesar Philippidis cesar_philippi...@mentor.com wrote: On 07/07/2014 02:55 AM, Thomas Schwinge wrote: On Sun, 6 Jul 2014 16:10:56 -0700, Cesar Philippidis cesar_philippi...@mentor.com wrote: This patch is the first step to enabling parallel reductions in openacc. I've committed this updated version of the patch. In r217354, I just applied the following cleanup to gomp-4_0-branch: commit 4fe8b3620b258ac904d9eade5f76dede69a80c98 Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Nov 11 14:52:26 2014 + OpenACC reductions maintenance. gcc/ * omp-low.c (maybe_lookup_reduction): Don't require an OpenACC context. (lower_oacc_offload): Simplify use of maybe_lookup_reduction. gcc/ * omp-low.c (delete_omp_context): Dispose of reduction_map. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217354 138bc75d-0d04-0410-961f-82ee72b054a4 I further tried to tidy this up as follows -- but that is causing the reduction execution tests to fail; indeed -fdump-tree-all already shows unexpected changes during gimplification. (I first suspected that variables are added to a GIMPLE_OMP_FOR reduction_map, and then not found when reading from GIMPLE_OACC_PARALLEL one, but now I'm not at all sure about this theory.) Cesar, is cleanup like that useful at all, and if yes, could you look into that, later on? (Definitely not urgent.) commit 3ef04b65c1b5d3db5aa4b903a1ec0f693bb75ca8 Author: Thomas Schwinge tho...@codesourcery.com Date: Tue Nov 11 13:04:00 2014 +0100 [WIP] Make reduction_map per context. --- gcc/omp-low.c | 41 + 1 file changed, 29 insertions(+), 12 deletions(-) diff --git gcc/omp-low.c gcc/omp-low.c index 5695ec3..44ed9a0 100644 --- gcc/omp-low.c +++ gcc/omp-low.c @@ -987,8 +987,19 @@ lookup_reduction (const char *id, omp_context *ctx) { gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt)); - splay_tree_node n; - n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id); + splay_tree_node n = NULL; + do +{ + if (ctx-reduction_map != NULL) + n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) id); + if (n != NULL) + break; + /* If not found, recurse into outer context. */ + ctx = ctx-outer; +} + while (ctx != NULL +/* ctx-reduction_map != NULL */); + gcc_assert (n != NULL); return (tree) n-value; } @@ -996,8 +1007,17 @@ static inline tree maybe_lookup_reduction (tree var, omp_context *ctx) { splay_tree_node n = NULL; - if (ctx-reduction_map) -n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var); + do +{ + if (ctx-reduction_map != NULL) + n = splay_tree_lookup (ctx-reduction_map, (splay_tree_key) var); + if (n != NULL) + break; + /* If not found, recurse into outer context. */ + ctx = ctx-outer; +} + while (ctx != NULL +/* ctx-reduction_map != NULL */); return n ? (tree) n-value : NULL_TREE; } @@ -1498,8 +1518,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx) ctx-cb = outer_ctx-cb; ctx-cb.block = NULL; ctx-depth = outer_ctx-depth + 1; - /* FIXME: handle reductions recursively. */ - ctx-reduction_map = outer_ctx-reduction_map; } else { @@ -1513,7 +1531,6 @@ new_omp_context (gimple stmt, omp_context *outer_ctx) ctx-cb.eh_lp_nr = 0; ctx-cb.transform_call_graph_edges = CB_CGE_MOVE; ctx-depth = 1; - //TODO ctx-reduction_map = TODO; } ctx-cb.decl_map = new hash_maptree, tree; @@ -1571,10 +1588,7 @@ delete_omp_context (splay_tree_value value) splay_tree_delete (ctx-field_map); if (ctx-sfield_map) splay_tree_delete (ctx-sfield_map); - if (ctx-reduction_map - /* Shared over several omp_contexts. */ - (ctx-outer == NULL - || ctx-reduction_map != ctx-outer-reduction_map)) + if (ctx-reduction_map) splay_tree_delete (ctx-reduction_map); /* We hijacked DECL_ABSTRACT_ORIGIN earlier. We need to clear it before @@ -1765,6 +1779,7 @@ scan_sharing_clauses (tree clauses, omp_context *ctx) install_var_local (array, c); /* Insert it into the current context. */ + //TODO splay_tree_insert (ctx-reduction_map, (splay_tree_key) omp_get_id(var), (splay_tree_value) array); @@ -2394,8 +2409,8 @@ scan_oacc_offload (gimple stmt, omp_context *outer_ctx) DECL_ARTIFICIAL (name) = 1; DECL_NAMELESS (name) = 1; TYPE_NAME (ctx-record_type) = name; - create_omp_child_function (ctx, false); ctx-reduction_map = splay_tree_new (splay_tree_compare_pointers, 0, 0); + create_omp_child_function (ctx, false); gimple_omp_set_child_fn (stmt,
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
Resending as text/plain On 11 November 2014 15:14, Charles Baylis charles.bay...@linaro.org wrote: On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com wrote: This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane intrinsics, and also provides a single place to do bigendian lane-swapping for all those intrinsics (and others to follow in later patches). This allows us to remove many define_expands that just do a range-check and endian-swap before outputting the RTL for a corresponding _internal insn. Changes to aarch64-simd.md are not as big as they look, they are highly repetitive, like the code they are removing! Testcases are also repetitive, as unfortunately dg-error doesn't care *how many* errors there were matching it's pattern, as long as at least 1, hence having to separate each into own file - the last 0 in the dg-error disables the line-number checking, as the line numbers in our error messages refer to lines within arm_neon.h rather than within the test case. (They do at least mention the user function containing the call to the intrinsic.) Ok for trunk? It looks like there are a few places where you have 8 spaces where a tab ought to be. Other than that, it looks good to me (but I can't approve) I am looking making errors found in arm_neon.h a bit more user friendly, which depends on checking bounds on constant int parameters as you've done here. Do you plan to do similar changes for loads/stores/shifts, and also for the ARM back-end? I can help out if you don't already have patches in development. Charles
[PATCH] Remove pedantic_lvalues
As pre-approved by Joseph the following removes pedantic_lvalues which the C FE now handles itself without help from fold-const.c. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. The C++ FE is still not happy without NON_LVALUE_EXPRs though. Richard. 2014-11-11 Richard Biener rguent...@suse.de * tree-core.h (pedantic_lvalues): Remove. * fold-const.c (pedantic_lvalues): Likewise. (pedantic_non_lvalue_loc): Remove conditional non_lvalue_loc call. c/ * c-decl.c (c_init_decl_processing): Do not set pedantic_lvalues to true. Index: trunk/gcc/c/c-decl.c === *** trunk.orig/gcc/c/c-decl.c 2014-10-29 13:34:20.979438627 +0100 --- trunk/gcc/c/c-decl.c2014-11-11 14:16:04.605138651 +0100 *** c_init_decl_processing (void) *** 3947,3954 input_location = save_loc; - pedantic_lvalues = true; - make_fname_decl = c_make_fname_decl; start_fname_decls (); } --- 3947,3952 Index: trunk/gcc/fold-const.c === *** trunk.orig/gcc/fold-const.c 2014-11-11 10:54:01.424669165 +0100 --- trunk/gcc/fold-const.c 2014-11-11 14:15:36.954139861 +0100 *** non_lvalue_loc (location_t loc, tree x) *** 2160,2179 return build1_loc (loc, NON_LVALUE_EXPR, TREE_TYPE (x), x); } - /* Nonzero means lvalues are limited to those valid in pedantic ANSI C. -Zero means allow extended lvalues. */ - - int pedantic_lvalues; - /* When pedantic, return an expr equal to X but certainly not valid as a pedantic lvalue. Otherwise, return X. */ static tree pedantic_non_lvalue_loc (location_t loc, tree x) { - if (pedantic_lvalues) - return non_lvalue_loc (loc, x); - return protected_set_expr_location_unshare (x, loc); } --- 2160,2171 Index: trunk/gcc/tree-core.h === *** trunk.orig/gcc/tree-core.h 2014-11-11 09:39:42.722864279 +0100 --- trunk/gcc/tree-core.h 2014-11-11 14:17:49.193134074 +0100 *** extern GTY(()) builtin_info_type builtin *** 1877,1886 /* If nonzero, an upper limit on alignment of structure fields, in bits, */ extern unsigned int maximum_field_alignment; - /* Nonzero means lvalues are limited to those valid in pedantic ANSI C. -Zero means allow extended lvalues. */ - extern int pedantic_lvalues; - /* Points to the FUNCTION_DECL of the function whose body we are reading. */ extern GTY(()) tree current_function_decl; --- 1877,1882
Re: [RFC PATCH, AARCH64] Add support for -mlong-calls option
On 27/10/14 09:21, Yangfei (Felix) wrote: +/* Handle pragmas for compatibility with Intel's compilers. */ +#define REGISTER_TARGET_PRAGMAS() do { \ + c_register_pragma (0, long_calls, aarch64_pr_long_calls); \ + c_register_pragma (0, no_long_calls, aarch64_pr_no_long_calls); \ + c_register_pragma (0, long_calls_off, aarch64_pr_long_calls_off); \ +} while (0) + #define FUNCTION_ARG_PADDING(MODE, TYPE) \ (aarch64_pad_arg_upward (MODE, TYPE) ? upward : downward) Hi, I updated the patch with the following two changes: 1. Add one entry in ChangeLog for this patch; 2. Enable this feature for sibling calls too. Assuming no issues pop up, OK for trunk? Hi Felix, Sorry for the delay responding, I've been out of the office recently and I'm only just catching up on a backlog of GCC related emails. I'm in two minds about this; I can potentially see the need for attributes to enable long calls for specific calls, and maybe also for pragmas that can be used to efficiently mark a group of functions in that way; but I don't really see the value in adding a -mlong-calls option to do this globally. The reasoning is as follows: long calls are generally very expensive and relatively few functions should need them in most applications (since code that needs to span more than a single block of 128Mbytes - the span of a BL or B instruction - will be very rare in reality). The best way to handle very large branches for those rare cases where you do have a very large contiguous block of code more than 128MB is by having the linker insert veneers when needed; the code will branch to the veneer which will insert an indirect branch at that point (the ABI guarantees that at function call boundaries IP0 and IP1 will not contain live values, making them available for such purposes). In a very small number of cases it might be desirable to mark specific functions as being too far away to reach; in those cases the attributes and pragma methods can be used to mark such calls as being far calls. Aside: The reason -mlong-calls was added to GCC for ARM is that the code there pre-dates the EABI, which introduced the concept of link-time veneering of calls - the option should be unnecessary now that almost everyone uses the EABI as the basis for their platform ABI. We don't have such a legacy for AArch64 and I'd need to see strong justification for its use before adding the option there as well. So please can you rework the patch to remove the -mlong-calls option and just leave the attribute and pragma interfaces. R. Index: gcc/ChangeLog === --- gcc/ChangeLog (revision 216558) +++ gcc/ChangeLog (working copy) @@ -1,3 +1,26 @@ +2014-10-27 Felix Yang felix.y...@huawei.com + Haijian Zhang z.zhanghaij...@huawei.com + + * config/aarch64/aarch64.opt (mlong-calls): New option. + * config/aarch64/aarch64.h (REGISTER_TARGET_PRAGMAS): Define. + * config/aarch64/aarch64.c (aarch64_set_default_type_attributes, + aarch64_attribute_table, aarch64_comp_type_attributes, + aarch64_decl_is_long_call_p, aarch64_function_in_section_p, + aarch64_pr_long_calls, aarch64_pr_no_long_calls, + aarch64_pr_long_calls_off): New functions. + (TARGET_SET_DEFAULT_TYPE_ATTRIBUTES): Define as + aarch64_set_default_type_attributes. + (TARGET_ATTRIBUTE_TABLE): Define as aarch64_attribute_table. + (TARGET_COMP_TYPE_ATTRIBUTES): Define as aarch64_comp_type_attribute. + (aarch64_pragma_enum): New enum. + (aarch64_attribute_table): New attribute table. + * config/aarch64/aarch64-protos.h (aarch64_pr_long_calls, + aarch64_pr_no_long_calls, aarch64_pr_long_calls_off): New declarations. + * config/aarch64/aarch64.md (sibcall, sibcall_value): Modified to + generate indirect call for sibling call when needed. + * config/aarch64/predicate.md (aarch64_call_insn_operand): Modified to + exclude a symbol_ref for an indirect call. + 2014-10-22 Richard Sandiford richard.sandif...@arm.com * lra.c (lra): Remove call to recog_init. Index: gcc/config/aarch64/predicates.md === --- gcc/config/aarch64/predicates.md (revision 216558) +++ gcc/config/aarch64/predicates.md (working copy) @@ -27,7 +27,8 @@ ) (define_predicate aarch64_call_insn_operand - (ior (match_code symbol_ref) + (ior (and (match_code symbol_ref) + (match_test !aarch64_is_long_call_p (op))) (match_operand 0 register_operand))) (define_predicate aarch64_simd_register Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 216558) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -581,11 +581,13 @@
Re: [PATCH][AArch64] Add bounds checking to vqdm*_lane intrinsics via a qualifier that also flips endianness
[Resending in gcc-patches-accepted form] I'm working on a patch for vget_lane (that removes the be_checked_get_lane thing which isn't an intrinsic). Other than that, no not yet - loads and stores I was thinking to wait until David Sherwood + Alan Hayward's patches have been settled, but there's still ARM, indeed. If you have any way/ideas to get better error messages (i.e. line numbers), that'd be particularly good, tho :) Cheers, Alan Charles Baylis wrote: On 6 November 2014 10:19, Alan Lawrence alan.lawre...@arm.com mailto:alan.lawre...@arm.com wrote: This generates out-of-range errors at compile- (rather than assemble-)time for the vqdm*_lane intrinsics, and also provides a single place to do bigendian lane-swapping for all those intrinsics (and others to follow in later patches). This allows us to remove many define_expands that just do a range-check and endian-swap before outputting the RTL for a corresponding _internal insn. Changes to aarch64-simd.md http://aarch64-simd.md are not as big as they look, they are highly repetitive, like the code they are removing! Testcases are also repetitive, as unfortunately dg-error doesn't care *how many* errors there were matching it's pattern, as long as at least 1, hence having to separate each into own file - the last 0 in the dg-error disables the line-number checking, as the line numbers in our error messages refer to lines within arm_neon.h rather than within the test case. (They do at least mention the user function containing the call to the intrinsic.) Ok for trunk? It looks like there are a few places where you have 8 spaces where a tab ought to be. Other than that, it looks good to me (but I can't approve) I am looking making errors found in arm_neon.h a bit more user friendly, which depends on checking bounds on constant int parameters as you've done here. Do you plan to do similar changes for loads/stores/shifts, and also for the ARM back-end? I can help out if you don't already have patches in development. Charles