Re: ARM: VFPv3-D16 vs. VFPv3-D32
Which Cortex-R you are targeting that supports both D16 and D32? Thanks, Joey On Thu, Oct 17, 2013 at 3:13 PM, Sebastian Huber sebastian.hu...@embedded-brains.de wrote: Hello, it seems that it is not possible to deduce from GCC built-in defines whether we compile for the VFPv3-D16 or VFPv3-D32 floating point unit. touch empty.c arm-eabi-gcc -march=armv7-r -mfpu=vfpv3-d16 -mfloat-abi=hard -E -P -v -dD empty.c vfpv3-d16.txt arm-eabi-gcc -march=armv7-r -mfpu=vfpv3 -mfloat-abi=hard -E -P -v -dD empty.c vfpv3-d32.txt diff vfpv3-d16.txt vfpv3-d32.txt Is it possible to add a built-in define for this? Or as an alternative is it possible to use a GCC configuration target specific multi-lib define that indicates it? I want to use such a compiler provided define to determine how the context switch support in an operating system should look like. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: ARM: VFPv3-D16 vs. VFPv3-D32
On 2013-10-17 09:28, Joey Ye wrote: Which Cortex-R you are targeting that supports both D16 and D32? I have a Cortex-R variant which supports D16 only and now I want to add a multi-lib to our GCC target configuration and use a compiler built-in to adjust the context switch code for this multi-lib. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: ARM: VFPv3-D16 vs. VFPv3-D32
There is no macro to indicate VFP variances. Probably you can check CPU variance instead. As I know Cortex-R only support D16. Joey On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber sebastian.hu...@embedded-brains.de wrote: On 2013-10-17 09:28, Joey Ye wrote: Which Cortex-R you are targeting that supports both D16 and D32? I have a Cortex-R variant which supports D16 only and now I want to add a multi-lib to our GCC target configuration and use a compiler built-in to adjust the context switch code for this multi-lib. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: Compilation flags in libgfortran
On Wed, Oct 16, 2013 at 12:22 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: On 16/10/13 10:37, pins...@gmail.com wrote: On Oct 15, 2013, at 6:58 AM, Igor Zamyatin izamya...@gmail.com wrote: Hi All! Is there any particular reason that matmul* modules from libgfortran are compiled with -O2 -ftree-vectorize? I see some regressions on Atom processor after r202980 (http://gcc.gnu.org/ml/gcc-cvs/2013-09/msg00846.html) Why not just use O3 for those modules? -O3 and -O2 -ftree-vectorize won't give much performance difference. What you are seeing is the cost model needs improvement; at least for atom. Hi all, I think http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01908.html introduced the new cheap vectoriser cost model that favors compilation time over runtime performance and is set as default for -O2. -O3 uses the dynamic model which potentially gives better runtime performance in exchange for longer compile times (if I understand the new rules correctly). Therefore, I'd expect -O3 to give a better vector performance than -O2... But this suggests to compile with -O2 -ftree-vectorize -fvect-cost-model=dynamic, not building with -O3. Richard. Kyrill
Re: ARM: VFPv3-D16 vs. VFPv3-D32
On 17/10/13 08:56, Joey Ye wrote: There is no macro to indicate VFP variances. Probably you can check CPU variance instead. As I know Cortex-R only support D16. Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile and therefore only 16 regs. R. Joey On Thu, Oct 17, 2013 at 3:47 PM, Sebastian Huber sebastian.hu...@embedded-brains.de wrote: On 2013-10-17 09:28, Joey Ye wrote: Which Cortex-R you are targeting that supports both D16 and D32? I have a Cortex-R variant which supports D16 only and now I want to add a multi-lib to our GCC target configuration and use a compiler built-in to adjust the context switch code for this multi-lib. -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: ARM: VFPv3-D16 vs. VFPv3-D32
On 2013-10-17 14:24, Richard Earnshaw wrote: On 17/10/13 08:56, Joey Ye wrote: There is no macro to indicate VFP variances. Probably you can check CPU variance instead. As I know Cortex-R only support D16. Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile and therefore only 16 regs. Thanks for this information. I was not aware that no existing Cortex-R variant has a D32 VFP unit. Is the converse true for Cortex-A variants or are there Cortex-A cores with VFP-D16 and VFP-D32 units? -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.hu...@embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: ARM: VFPv3-D16 vs. VFPv3-D32
On 17/10/13 13:46, Sebastian Huber wrote: On 2013-10-17 14:24, Richard Earnshaw wrote: On 17/10/13 08:56, Joey Ye wrote: There is no macro to indicate VFP variances. Probably you can check CPU variance instead. As I know Cortex-R only support D16. Checking __ARM_ARCH_PROFILE == 'R' would tell you that it's R profile and therefore only 16 regs. Thanks for this information. I was not aware that no existing Cortex-R variant has a D32 VFP unit. Is the converse true for Cortex-A variants or are there Cortex-A cores with VFP-D16 and VFP-D32 units? No, the converse is not true. In general (though I don't think there's a guarantee that this will be the case) if you don't have Neon you won't have D32. However, if you have Neon you must have D32. Not all Cortex-A cores have Neon; though many do these days. R.
Re: [RFC] By default if-convert only basic blocks that will be vectorized
Jakub, Richard, I believe this patch is a good opportunity to improve the vectorization capabilities. I have the following question related to it: whether we plan to treat the #pragma omp simd as a directive to vectorize the underlying loop, hence dropping any assessment regarding profitablity? Regards, Sergos On Tue, Oct 15, 2013 at 4:32 PM, Jakub Jelinek ja...@redhat.com wrote: Hi! Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. Here is a prototype of a patch that will by default (unless explicit -ftree-loop-if-convert) only if-convert loops internally for vectorization, so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized basic blocks, but will not appear if vectorization fails, or in the scalar loop if vectorization is conditional, or in the prologue or epilogue loops around the vectorized loop. Instead of moving the ifcvt pass inside of the vectorizer, this patch during ifcvt performs loop versioning depending on a special internal call, only if the internal call returns true we go to the if-converted original loop, otherwise the non-if-converted copy of the original loop is performed. And the vectorizer is taught to fold this internal call into true resp. false depending on if the loop was vectorized or not, and vectorizer loop versioning, peeling for alignment and for bound are adjusted to also copy from the non-if-converted loop rather than if-converted one. Besides fixing the various PRs where if-conversion pessimizes code I'd like to also move forward with this with conditional loads and stores, http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html where the if-unconversion approach looked like a failure. This patch doesn't yet handle if-converted inner loop in outer loop vectorization, something on my todo list (so several vect-cond-*.c tests FAIL because they are no longer vectorized) plus I had to change two SLP vectorization tests that silently relied on loop if-conversion being performed to actually optimize the basic block (if the same thing didn't appear in a loop, it wouldn't be optimized at all). On the newly added testcase on x86_64, there are before this patch 18 scalar conditional moves, with the patch just 2 (both in the checking routine). Comments? --- gcc/internal-fn.def.jj 2013-10-11 14:32:57.079909782 +0200 +++ gcc/internal-fn.def 2013-10-11 17:23:58.705526840 +0200 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) --- gcc/tree-vect-loop-manip.c.jj 2013-09-30 22:13:47.0 +0200 +++ gcc/tree-vect-loop-manip.c 2013-10-15 12:57:54.854970913 +0200 @@ -374,24 +374,31 @@ LOOP- loop1 static void slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop, + struct loop *scalar_loop, bool is_new_loop, basic_block *new_exit_bb) { - gimple orig_phi, new_phi; + gimple orig_phi, new_phi, scalar_phi = NULL; gimple update_phi, update_phi2; tree guard_arg, loop_arg; basic_block new_merge_bb = guard_edge-dest; edge e = EDGE_SUCC (new_merge_bb, 0); basic_block update_bb = e-dest; basic_block orig_bb = loop-header; - edge new_exit_e; + edge new_exit_e, scalar_e = NULL; tree current_new_name; - gimple_stmt_iterator gsi_orig, gsi_update; + gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none (); /* Create new bb between loop and new_merge_bb. */ *new_exit_bb = split_edge (single_exit (loop)); new_exit_e = EDGE_SUCC (*new_exit_bb, 0); + if (scalar_loop != NULL !is_new_loop) +{ + gsi_scalar = gsi_start_phis (scalar_loop-header); + scalar_e = EDGE_SUCC (scalar_loop-latch, 0); +} + for (gsi_orig = gsi_start_phis (orig_bb), gsi_update = gsi_start_phis (update_bb); !gsi_end_p (gsi_orig) !gsi_end_p (gsi_update); @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge tree new_res; orig_phi = gsi_stmt (gsi_orig); update_phi = gsi_stmt (gsi_update); + if (scalar_e != NULL) + { + scalar_phi = gsi_stmt (gsi_scalar); + gsi_next (gsi_scalar); + } /** 1. Handle new-merge-point phis **/ @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge current_new_name = loop_arg; else { - current_new_name = get_current_def (loop_arg); + if (scalar_e) + { + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e); + current_new_name =
Re: [RFC] By default if-convert only basic blocks that will be vectorized
On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. I think on many other targets it actually helps. I know for one it helps on octeon even though octeon has no vector instructions. I think it helps most arm targets too. Thanks, Andrew Here is a prototype of a patch that will by default (unless explicit -ftree-loop-if-convert) only if-convert loops internally for vectorization, so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized basic blocks, but will not appear if vectorization fails, or in the scalar loop if vectorization is conditional, or in the prologue or epilogue loops around the vectorized loop. Instead of moving the ifcvt pass inside of the vectorizer, this patch during ifcvt performs loop versioning depending on a special internal call, only if the internal call returns true we go to the if-converted original loop, otherwise the non-if-converted copy of the original loop is performed. And the vectorizer is taught to fold this internal call into true resp. false depending on if the loop was vectorized or not, and vectorizer loop versioning, peeling for alignment and for bound are adjusted to also copy from the non-if-converted loop rather than if-converted one. Besides fixing the various PRs where if-conversion pessimizes code I'd like to also move forward with this with conditional loads and stores, http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html where the if-unconversion approach looked like a failure. This patch doesn't yet handle if-converted inner loop in outer loop vectorization, something on my todo list (so several vect-cond-*.c tests FAIL because they are no longer vectorized) plus I had to change two SLP vectorization tests that silently relied on loop if-conversion being performed to actually optimize the basic block (if the same thing didn't appear in a loop, it wouldn't be optimized at all). On the newly added testcase on x86_64, there are before this patch 18 scalar conditional moves, with the patch just 2 (both in the checking routine). Comments? --- gcc/internal-fn.def.jj2013-10-11 14:32:57.079909782 +0200 +++ gcc/internal-fn.def2013-10-11 17:23:58.705526840 +0200 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) --- gcc/tree-vect-loop-manip.c.jj2013-09-30 22:13:47.0 +0200 +++ gcc/tree-vect-loop-manip.c2013-10-15 12:57:54.854970913 +0200 @@ -374,24 +374,31 @@ LOOP- loop1 static void slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop, +struct loop *scalar_loop, bool is_new_loop, basic_block *new_exit_bb) { - gimple orig_phi, new_phi; + gimple orig_phi, new_phi, scalar_phi = NULL; gimple update_phi, update_phi2; tree guard_arg, loop_arg; basic_block new_merge_bb = guard_edge-dest; edge e = EDGE_SUCC (new_merge_bb, 0); basic_block update_bb = e-dest; basic_block orig_bb = loop-header; - edge new_exit_e; + edge new_exit_e, scalar_e = NULL; tree current_new_name; - gimple_stmt_iterator gsi_orig, gsi_update; + gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none (); /* Create new bb between loop and new_merge_bb. */ *new_exit_bb = split_edge (single_exit (loop)); new_exit_e = EDGE_SUCC (*new_exit_bb, 0); + if (scalar_loop != NULL !is_new_loop) +{ + gsi_scalar = gsi_start_phis (scalar_loop-header); + scalar_e = EDGE_SUCC (scalar_loop-latch, 0); +} + for (gsi_orig = gsi_start_phis (orig_bb), gsi_update = gsi_start_phis (update_bb); !gsi_end_p (gsi_orig) !gsi_end_p (gsi_update); @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge tree new_res; orig_phi = gsi_stmt (gsi_orig); update_phi = gsi_stmt (gsi_update); + if (scalar_e != NULL) +{ + scalar_phi = gsi_stmt (gsi_scalar); + gsi_next (gsi_scalar); +} /** 1. Handle new-merge-point phis **/ @@ -460,7 +472,13 @@ slpeel_update_phi_nodes_for_guard1 (edge current_new_name = loop_arg; else { - current_new_name = get_current_def (loop_arg); + if (scalar_e) +{ + current_new_name = PHI_ARG_DEF_FROM_EDGE (scalar_phi, scalar_e); + current_new_name = get_current_def (current_new_name); +} + else +current_new_name = get_current_def (loop_arg); /* current_def is not available only if the variable does
Re: Enable building of libatomic on AArch64
Ping? Michael Hudson-Doyle michael.hud...@linaro.org writes: Marcus Shawcroft marcus.shawcr...@gmail.com writes: On 3 October 2013 23:43, Michael Hudson-Doyle michael.hud...@linaro.org wrote: Hi, As libatomic builds for and the tests pass on AArch64 (built on x86_64 but tested on a foundation model, logs and summary: http://people.linaro.org/~mwhudson/libatomic.sum.txt http://people.linaro.org/~mwhudson/runtest-log-v-2.txt ) this patch enables the build. Cheers, mwh (first time posting to this list, let me know if I'm doing it wrong) 2013-10-04 Michael Hudson-Doyle michael.hud...@linaro.org * configure.tgt: Add AArch64 support. Hi, The patch looks fine to me. Thanks for looking! The ChangeLog entry should reflect the code that was removed rather than the functionality added. Perhaps: * configure.tgt (aarch64*): Remove. There are few too many negatives going on to make a pithy explanation easy... Did you investigate whether or not the 10 UNSUPPORTED results in the testsuite are sane? I did not, but have now. I think that 5 look legitimate since they require 128 bit sync ops. The other 5 look superficially like they should be supported on aarch64. We may just be missing aarch64 target supports wiring in check_effective_target_sync_long_long_runtime? Yes, that was it, with appropriate changes the -4 tests all pass. However, just out of a sense of curiosity, I added wiring to claim aarch64* supports 128 bit sync ops and all the -5 tests pass too. Is that just luck or because the reservation granule on the foundation model is big enough or something else? In any case, I'll attach a patch that just claims support for long long sync ops for now... Cheers, mwh /Marcus 2013-10-04 Michael Hudson-Doyle michael.hud...@linaro.org * libatomic/configure.tgt (aarch64*): Remove code preventing build. * gcc/testsuite/lib/target-supports.exp (check_effective_target_sync_long_long): AArch64 supports atomic operations on long long. (check_effective_target_sync_long_long_runtime): AArch64 can execute atomic operations on long long. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 7eb4dfe..5557c06 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4508,6 +4508,7 @@ proc check_effective_target_sync_int_128_runtime { } { proc check_effective_target_sync_long_long { } { if { [istarget x86_64-*-*] || [istarget i?86-*-*]) + || [istarget aarch64*-*-*] || [istarget arm*-*-*] || [istarget alpha*-*-*] || ([istarget sparc*-*-*] [check_effective_target_lp64]) } { @@ -4537,6 +4538,8 @@ proc check_effective_target_sync_long_long_runtime { } { } } }] +} elseif { [istarget aarch64*-*-*] } { + return 1 } elseif { [istarget arm*-*-linux-*] } { return [check_runtime sync_longlong_runtime { #include stdlib.h diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt index b9e5d6c..7eaab38 100644 --- a/libatomic/configure.tgt +++ b/libatomic/configure.tgt @@ -95,11 +95,6 @@ fi # Other system configury case ${target} in - aarch64*) - # This is currently not supported in AArch64. - UNSUPPORTED=1 - ;; - arm*-*-linux*) # OS support for atomic primitives. config_path=${config_path} linux/arm posix
PR libstdc++/58729 - tr2::dynamic_bitset::resize fails
This patch bootstraps and tests clean on x86-64-linux. Truthfully, dynamic_bitset needs some more love wrt C++11 and a testsuite. It got put in before it was baked really. That will be later. 2013-10-16 Edward Smith-Rowland 3dw...@verizon.net PR libstdc++/58729 * include/tr2/dynamic_bitset (_M_resize, resize): Use input value to set bits; (_M_do_left_shift, _M_do_right_shift, _M_do_to_ulong, _M_do_to_ullong, _M_do_find_first, _M_do_find_next, _M_copy_from_ptr, operator): Move long methods outline to... * include/tr2/dynamic_bitset.tcc: New. * include/Makefile.am: Add dynamic_bitset.tcc. * include/Makefile.in: Add dynamic_bitset.tcc. * testsuite/tr2/dynamic_bitset/pr58729.cc: New. Index: include/tr2/dynamic_bitset === --- include/tr2/dynamic_bitset (revision 203739) +++ include/tr2/dynamic_bitset (working copy) @@ -137,7 +137,12 @@ if (__nbits % _S_bits_per_block 0) ++__sz; if (__sz != this-_M_w.size()) - this-_M_w.resize(__sz); + { + block_type __val = 0; + if (__value) + __val = std::numeric_limitsblock_type::max(); + this-_M_w.resize(__sz, __val); + } } allocator_type @@ -246,7 +251,7 @@ bool _M_is_equal(const __dynamic_bitset_base __x) const { - if (__x.size() == this-size()) + if (__x._M_w.size() == this-_M_w.size()) { for (size_t __i = 0; __i this-_M_w.size(); ++__i) if (this-_M_w[__i] != __x._M_w[__i]) @@ -260,7 +265,7 @@ bool _M_is_less(const __dynamic_bitset_base __x) const { - if (__x.size() == this-size()) + if (__x._M_w.size() == this-_M_w.size()) { for (size_t __i = this-_M_w.size(); __i 0; --__i) { @@ -297,9 +302,9 @@ bool _M_is_subset_of(const __dynamic_bitset_base __b) { - if (__b.size() == this-size()) + if (__b._M_w.size() == this-_M_w.size()) { - for (size_t __i = 0; __i _M_w.size(); ++__i) + for (size_t __i = 0; __i this-_M_w.size(); ++__i) if (this-_M_w[__i] != (this-_M_w[__i] | __b._M_w[__i])) return false; return true; @@ -364,140 +369,6 @@ } }; - // Definitions of non-inline functions from __dynamic_bitset_base. - templatetypename _WordT, typename _Alloc -void -__dynamic_bitset_base_WordT, _Alloc::_M_do_left_shift(size_t __shift) -{ - if (__builtin_expect(__shift != 0, 1)) - { - const size_t __wshift = __shift / _S_bits_per_block; - const size_t __offset = __shift % _S_bits_per_block; - - if (__offset == 0) - for (size_t __n = this-_M_w.size() - 1; __n = __wshift; --__n) - this-_M_w[__n] = this-_M_w[__n - __wshift]; - else - { - const size_t __sub_offset = _S_bits_per_block - __offset; - for (size_t __n = _M_w.size() - 1; __n __wshift; --__n) - this-_M_w[__n] = ((this-_M_w[__n - __wshift] __offset) -| (this-_M_w[__n - __wshift - 1] __sub_offset)); - this-_M_w[__wshift] = this-_M_w[0] __offset; - } - - std::fill(this-_M_w.begin(), this-_M_w.begin() + __wshift, - static_cast_WordT(0)); - } -} - - templatetypename _WordT, typename _Alloc -void -__dynamic_bitset_base_WordT, _Alloc::_M_do_right_shift(size_t __shift) -{ - if (__builtin_expect(__shift != 0, 1)) - { - const size_t __wshift = __shift / _S_bits_per_block; - const size_t __offset = __shift % _S_bits_per_block; - const size_t __limit = this-_M_w.size() - __wshift - 1; - - if (__offset == 0) - for (size_t __n = 0; __n = __limit; ++__n) - this-_M_w[__n] = this-_M_w[__n + __wshift]; - else - { - const size_t __sub_offset = (_S_bits_per_block - - __offset); - for (size_t __n = 0; __n __limit; ++__n) - this-_M_w[__n] = ((this-_M_w[__n + __wshift] __offset) -| (this-_M_w[__n + __wshift + 1] __sub_offset)); - this-_M_w[__limit] = this-_M_w[_M_w.size()-1] __offset; - } - - std::fill(this-_M_w.begin() + __limit + 1, this-_M_w.end(), - static_cast_WordT(0)); - } -} - - templatetypename _WordT, typename _Alloc -unsigned long -__dynamic_bitset_base_WordT, _Alloc::_M_do_to_ulong() const -{ - size_t __n = sizeof(unsigned long) / sizeof(block_type); - for (size_t __i = __n; __i this-_M_w.size(); ++__i) - if (this-_M_w[__i]) - __throw_overflow_error(__N(__dynamic_bitset_base::_M_do_to_ulong)); - unsigned
Re: [Patch] Fix undefined behaviors in regex
On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote: To be honest, I was thinking something much smaller than the whole regex ;) But let's add Marek in CC. int work() { } int main() { int a = work(); return a; } /* This is a smaller case to test the sanitizer. It seems that the undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`, is that too old? */ No, that's not too old, the thing is -fsanitize=undefined isn't complete - we currently sanitize shift, division by zero, and __builtin_unreachable call; VLA sanitization is done, but not commited because I'm waiting for a review of the C++ FE part of that patch, and on NULL pointer checking I'm working now. Missing return statement will definitely be added, too (quite easy, I should think), and that would detect the bug in your testcase. Still, thanks for letting me know. Marek
Re: [Patch] Fix undefined behaviors in regex
On Thu, Oct 17, 2013 at 09:12:41AM +0200, Marek Polacek wrote: On Wed, Oct 16, 2013 at 07:02:03PM -0400, Tim Shen wrote: To be honest, I was thinking something much smaller than the whole regex ;) But let's add Marek in CC. int work() { } int main() { int a = work(); return a; } /* This is a smaller case to test the sanitizer. It seems that the undefined sanitizer is not merged? I use `g++ (GCC) 4.9.0 20131003`, is that too old? */ No, that's not too old, the thing is -fsanitize=undefined isn't complete - we currently sanitize shift, division by zero, and __builtin_unreachable call; VLA sanitization is done, but not commited because I'm waiting for a review of the C++ FE part of that patch, and on NULL pointer checking I'm working now. Missing return statement will definitely be added, too (quite easy, I should think), and that would detect the bug in your testcase. Though, in the above case, the question is why people ignore warnings from the compiler and need to have special runtime instrumentation to remind them instead. I'm not objecting to that sanitization, only find it weird. Jakub
Re: [PATCH][i386]Fix PR 57756
On Wed, 2013-10-16 19:40:21 -0700, Xinliang David Li davi...@google.com wrote: On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote: On Wed, Oct 16, 2013 at 7:23 PM, Sriraman Tallam tmsri...@google.com wrote: I was unable to build a native powerpc compiler. I checked for build_target_node and build_optimization_node throughout and changed rs6000 because it had references. I did not realize function_specific_save and function_specific_restore have to be changed. Sorry for breaking it. As Mike replied, gcc110 is available. Richard Biener's approval was dependent upon successful bootstrap and passing the regression testsuite, which you did not even attempt, nor did you try to build a cross-compiler. This is an oversight. I agree that it is better to test on multiple platforms for large changes like this. In the past, Sri has been very attentive to any fallouts due to his changes, so is this time. As of speaking about multiple platforms... This patch didn't only produce fallout on rs6k, but also for quite a number of other architectures. I already send one message (it can be found in the archives at http://gcc.gnu.org/ml/gcc-patches/2013-10/msg01156.html) listing quite a number of targets that are broken right now. This situation didn't change significantly since they started to fail. Please have a look at my build robot[1]. All targets (except nds32, nios2 and arceb) used to build. Don't get me wrong. I don't want to overly blame Sriraman for breaking it in the first place. Shit happens. But please have an eye on fixing the fallout, timely. MfG, JBG [1] http://toolchain.lug-owl.de/buildbot/?limit=2000 -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: Ich hatte in letzter Zeit ein bißchen viel Realitycheck. the second : Langsam möchte ich mal wieder weiterträumen können. -- Maximilian Wilhelm (18. Mai 2005, #lug-owl.de) signature.asc Description: Digital signature
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote: +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} That looks weird, why not just static bool ix86_macro_fusion_p (void) { return TARGET_FUSE_CMP_AND_BRANCH; } ? Marek
Re: [PATCH][i386]Fix PR 57756
What about all the other targets you broke? Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 And now for something completely different.
Re: [PATCH] Fix PR58143 and dups
On Tue, 15 Oct 2013, Richard Biener wrote: This is an alternate fix (see http://gcc.gnu.org/ml/gcc-patches/2013-09/msg00234.html for the other one) for the various PRs that show that LIM exposes undefined signed overflow on paths where it wasn't executed before LIM ultimately leading to a miscompilation. For this fix we rewrite invariant stmts to use unsigned arithmetic when it is one of the operations that SCEV and niter analysis handles (thus not division or absolute). The other fix instead disables invariant motion for those expressions. Bootstrapped and tested on x86_64-unknown-linux-gnu. The issue is also present on the 4.8 branch, so either patch should be backported in the end. I will try to benchmark both tomorrow (unless somebody beats me on that). Comparing both patches doesn't get conclusive results. A single-run SPEC 2k6 on a Sandy Bridge machine gives (base is Bernds patch, peak is mine, -Ofast -funroll-loops -march-native): Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 400.perlbench9770332 29.4 *9770325 30.0 * 401.bzip29650450 21.4 *9650449 21.5 * 403.gcc 8050282 28.5 *8050282 28.6 * 429.mcf 9120225 40.6 *9120226 40.4 * 445.gobmk 10490411 25.6 * 10490408 25.7 * 456.hmmer9330364 25.6 *9330365 25.6 * 458.sjeng 12100433 27.9 * 12100432 28.0 * 462.libquantum 20720318 65.1 * 20720325 63.8 * 464.h264ref 22130525 42.2 * 22130527 42.0 * 471.omnetpp 6250263 23.7 *6250264 23.7 * 473.astar7020347 20.2 *7020346 20.3 * 483.xalancbmk6900188 36.7 *6900189 36.5 * Est. SPECint_base2006 -- Est. SPECint2006 -- Estimated Estimated Base Base BasePeak Peak Peak Benchmarks Ref. Run Time Ratio Ref. Run Time Ratio -- -- - --- - - 410.bwaves 13590307 44.2 * 13590303 44.9 * 416.gamess NR NR 433.milc 9180356 25.8 *9180356 25.8 * 434.zeusmp 9100295 30.8 *9100297 30.7 * 435.gromacs 7140283 25.3 *7140282 25.3 * 436.cactusADM 11950380 31.4 * 11950383 31.2 * 437.leslie3d 9400288 32.7 *9400284 33.1 * 444.namd 8020401 20.0 *8020402 20.0 * 447.dealII 11440277 41.3 * 11440278 41.1 * 450.soplex 8340208 40.1 *8340206 40.5 * 453.povray 5320180 29.6 *5320178 29.9 * 454.calculix 8250393 21.0 *8250392 21.0 * 459.GemsFDTD10610316 33.6 * 10610308 34.4 * 465.tonto9840294 33.5 *9840294 33.5 * 470.lbm 13740245 56.1 * 13740245 56.1 * 481.wrf 11170259 43.1 * 11170259 43.2 * 482.sphinx3 19490396 49.3 * 19490397 49.1 * Est. SPECfp_base2006-- Est. SPECfp2006 -- off-noise (more than 5s difference) may be 462.libquantum and 459.GemsFDTD. I didn't include unpatched trunk in the comparison (not fixing the bug isn't an option after all). Conceptually I like the rewriting into unsigned arithmetic more so I'm going to apply that variant later today (re-testing 3 runs of 462.libquantum and 459.GemsFDTD, this time with address-space randomization turned off). Richard. Any preference or other suggestions? Thanks, Richard. 2013-10-15 Richard Biener rguent...@suse.de PR tree-optimization/58143 * tree-ssa-loop-im.c (arith_code_with_undefined_signed_overflow): New function. (rewrite_to_defined_overflow): Likewise. (move_computations_dom_walker::before_dom): Rewrite stmts with undefined signed overflow that are not always
Re: [PATCH] Fix PR58143 and dups
On Thu, Oct 17, 2013 at 09:56:31AM +0200, Richard Biener wrote: off-noise (more than 5s difference) may be 462.libquantum and 459.GemsFDTD. I didn't include unpatched trunk in the comparison (not fixing the bug isn't an option after all). Conceptually I like the rewriting into unsigned arithmetic more so I'm going to apply that variant later today (re-testing 3 runs of 462.libquantum and 459.GemsFDTD, this time with address-space randomization turned off). Can't we rewrite for the selected arithmetic operations and punt (is that what Bernd's patch did) on moving other arithmetics? Well, there are operations that are safe even for signed types, e.g. rotates, isn't RSHIFT_EXPR also safe? Can't you handle also LSHIFT_EXPR (or do we treat it even signed as never undefined in the middle-end/backend?). Jakub
Re: Patch: Add #pragma ivdep support to the ME and C FE
On Wed, 16 Oct 2013, Tobias Burnus wrote: Frederic Riss wrote: Just one question. You describe the pragma in the doco patch as: +This pragma tells the compiler that the immediately following @code{for} +loop can be executed in any loop index order without affecting the result. +The pragma aids optimization and in particular vectorization as the +compiler can then assume a vectorization safelen of infinity. I'm not a specialist, but I was always told that the 'original' meaning of ivdep (which I believe was introduced by Cray), was that the compiler could assume that there are only forward dependencies in the loop, but not that it can be executed in any order. The nice thing about #pragma ivdep is that there is no real standard. And the explanation of the different vendors is also not completely clear. Some overview about this is given in the following file on pages 13-14 for Cray Reaseach PVP, MIPSPRO Open64, Intel ICC, Multiflow http://sysrun.haifa.il.ibm.com/hrl/greps2007/papers/GREPS2007-Benoit.pdf That's summerized as: - vector: ignore lexical upward dependencies (Cray PVP, Intel ICC) - parallel: ignore loop-carried dependencies (MIPSPRO, Open64) - liberal: ignore loop-variant dependencies (Multiflow) The quotes for Cray and Intel are below. Cray: http://docs.cray.com/books/004-2179-001/html-004-2179-001/brtlrwh.html#EKZ5MRWH The ivdep directive tells the compiler to ignore vector dependencies for the loop immediately following the directive. Conditions other than vector dependencies can inhibit vectorization. If these conditions are satisfactory, the loop vectorizes. This directive is useful for some loops that contain pointers and indirect addressing. The format of this directive is as follows: #pragma _CRI ivdep Which suggests we use #pragma GCC ivdep to not collide with eventually different semantics in existing programs that use variants of this pragma? Intel: http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm The ivdep pragma instructs the compiler to ignore assumed vector dependencies. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Use this pragma only when you know that the assumed loop dependencies are safe to ignore. This suggests that _known_ dependences are still treated as dependences. But what is known obviously depends on the implementation which may not know that a[i] and a[i+1] depend but merely assume it. Not a standard-proof definition of the pragma ;) That said, safelen even overrides know dependences (but with unknown distance vector)! (that looks like a bug to me, or at least a QOI issue) The Intel docs give this example: ... Given your description, this loop wouldn't be a candidate for ivdep, as reversing the loop index order changes the semantics. I believe that the way you interpret it (ie. setting vectorization safe length to INT_MAX) is correct with respect to this other definition, though. Do you have a suggestion for a better wording? My idea was to interpret this part similar to OpenMP's simd with safelen=infinity. (Actually, I believe loop-safelen was added for OpenMPv4's and/or Cilk Plus's simd.) OpenMPv4.0, http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf , states for this (excerpt from page 70): A SIMD loop has logical iterations numbered 0, 1,...,N-1 where N is the number of loop iterations, and the logical numbering denotes the sequence in which the iterations would be executed if the associated loop(s) were executed with no SIMD instructions. If the safelen clause is used then no two iterations executed concurrently with SIMD instructions can have a greater distance in the logical iteration space than its value. The parameter of the safelen clause must be a constant positive integer expression. The number of iterations that are executed concurrently at any given time is implementation defined. Each concurrent iteration will be executed by a different SIMD lane. Each set of concurrent iterations is a SIMD chunk. OTOH, if we are mapping ivdep to safelen why not simply allow #pragma GCC safelen 4 ? Oh, and are there any plans to maintain this information in some way till the back-end? Software pipelining could be another huge winner for that kind of dependency analysis simplification. I don't know until when loop-safelen is kept. As it is late in the middle-end, providing the backend with this information should be simple. It's kept as long as we preserve loops which at the moment is until after RTL loop optimizations are finished. Extending this isn't hard, I just didn't see a reason to do that. Richard.
Re: [wide-int] int_traits tree
Kenneth Zadeck zad...@naturalbridge.com writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your len = MAX (len, max_len) deals with that. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. Thanks, Richard
Re: patch to canonize unsigned tree-csts
Kenneth Zadeck zad...@naturalbridge.com writes: On 10/15/2013 02:30 PM, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: if (small_prec) ; else if (precision == xprecision) while (len = 0 val[len - 1] == -1) len--; Err, len 0 obviously. you were only close. Bah. patch tested on ppc and committed as revision 203739. Thanks. So that should have got rid of the last use of scratch in tree-addr_wide_int and tree-max_wide_int. Richard
RE: FW: MAX_PATH problems with mingw gcc
-Original Message- From: Joey Ye [mailto:joey.ye...@gmail.com] There is an issue on file system with symbolic link, like Linux ext2/3. It could vitally change behavior of GCC. The issue is described as following. Such a logic cannot be deduced simple from the name string, so your patch can do nothing about it. For your patch IMHO the way out could be just define a real function for DOS FS, and leave a blank function body otherwise. Like: filename_normalize (char *fn) { #if DOS ... #endif } Thank you for pointing this problem. So, on file systems with symlinks support playing with filenames as strings is impossible. This means that filename_normalize name is too pretentious - it will do nothing for most of gcc consumers. From other side - it is useful for OS-es with small MAX_PATH. Do you think it should be renamed as filename_shortcut/filename_simplify or something like it? So readers won't expect too much from it even without looking at its body. Is it allowed to write # ifdef HAVE_DOS_BASED_FILE_SYSTEM extern void filename_normalize (char *f); #else #define filename_normalize (char *f) #endif directly in include/filenames.h? This way we'll avoid unnecessary empty call on platforms with symlinks. And more common question. I can imagine that different filenames produced after cross build on different OS-es for the same target may confuse some upper-level tools, like code analyzers, code coverage, etc... Does it make sense to push such solution to gcc mainsteam? May be it is better to keep the patch for cross toolchains builders... Regards Vladimir
Re: Patch: Add #pragma ivdep support to the ME and C FE
On Thu, Oct 17, 2013 at 10:07:43AM +0200, Richard Biener wrote: Which suggests we use #pragma GCC ivdep to not collide with eventually different semantics in existing programs that use variants of this pragma? Yeah, perhaps. Intel: http://software.intel.com/sites/products/documentation/doclib/iss/2013/compiler/cpp-lin/GUID-B25ABCC2-BE6F-4599-AEDF-2434F4676E1B.htm The ivdep pragma instructs the compiler to ignore assumed vector dependencies. To ensure correct code, the compiler treats an assumed dependence as a proven dependence, which prevents vectorization. This pragma overrides that decision. Use this pragma only when you know that the assumed loop dependencies are safe to ignore. This suggests that _known_ dependences are still treated as dependences. But what is known obviously depends on the implementation which may not know that a[i] and a[i+1] depend but merely assume it. Not a standard-proof definition of the pragma ;) Very bad definition indeed. That said, safelen even overrides know dependences (but with unknown distance vector)! (that looks like a bug to me, or at least a QOI issue) safelen is whatever the OpenMP 4.0 standard requires (and Cilk+ is I believe just defering the description to the OpenMP 4.0 definition), and fortunately the OpenMP 4.0 standard doesn't contain any so badly worded definition. The actual wording is not that the = safelen consecutive iterations can be run in any order, but that they can be performed all together using (possibly emulated) SIMD instructions. Thus I think it is correct if we use it for decisions if we can vectorize a loop (without versioning it for alias), regardless of known vs. unknown dependencies - if we have known dependencies that would result in known broken code, perhaps we should warn?, but probably not for anything further, it should not affect aliasing of scalar or vector loads/stores in the loop, etc. Then I think we'd handle forward but not backward dependencies. Given the void ignore_vec_dep(int *a, int k, int c, int m) { #pragma omp simd for (int i = 0; i m; i++) a[i] = a[i + k] * c; } testcase, I think we'll handle it fine for k = -m and k = 0. For k = m obviously, there is no overlap and even runtime versioning for alias would handle it right, for smaller k because the load (vector or non-vector) will be before the store and we don't tell aliasing the two don't alias. We don't vectorize any load+store operations (expressed as one stmt in GIMPLE; struct copies or atomic stmts), do we? Those would be a problem with the above. If for anything else we place all the VF loads where the original load was in the IL and all the VF stores where the original store was in the IL, and just leave it to other passes to reorder if they can prove it doesn't alias, we should be fine. Jakub
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your len = MAX (len, max_len) deals with that. It's len = MIN (len, max_len) which looked suspicious to me, but with precision = xprecision precision can only be zero if xprecision is zero which looked to me like it cannot happen - or rather it should be fixed. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref pcst) { ... unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED small_prec (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst = 1 be optimized to cst.len == 1 cst.val[0] = 1. It expands to L27: MEM[(long int *)D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct wide_int_ref_storage *)D.50698].scratch; MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)cst + 260B]; if (_277 = 64) goto bb 42; else goto bb 43; bb 42: xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 yl_496; goto bb 44; bb 43: _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage *)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage *)cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is 1, no? bb 44: # _504 = PHI _497(42), _503(43) D.50698 ={v} {CLOBBER}; if (_504 != 0) goto bb 45; else goto bb 46; bb 45: pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B]; goto bb 229 (L131); bb 46: _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0); ix_66 = (int) _65; goto bb 91; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 cst.to_uhwi () 1) ? Thanks, Richard -- Richard Biener rguent...@suse.de SUSE / SUSE
RE: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper)
-Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches-ow...@gcc.gnu.org] On Behalf Of Mike Stump Sent: 16 October 2013 23:06 To: Paulo J.Matos Cc: gcc-patches@gcc.gnu.org Subject: Re: Teaching emacs about GCC coding conventions (was Re: [PATCH] tree_code_name wrapper) First, we like to wait and let patches bake on mainline before considering back porting, this has no bake time yet in our tree. Second, only patches that impact quality enough should be back ported. I tend to think that this one does not impact quality enough to worry about. Also, you should develop on trunk, not 4_8. Arguably, I would say no. Now, a release manager can always review approve it; it should be very, very low risk. Makes sense. Thanks for the clarification. -- Paulo Matos
Re: [Patch] Fix undefined behaviors in regex
On 10/17/2013 09:17 AM, Jakub Jelinek wrote: Though, in the above case, the question is why people ignore warnings from the compiler and need to have special runtime instrumentation to remind them instead. I'm not objecting to that sanitization, only find it weird. I had the same thought. Paolo.
Re: PR libstdc++/58729 - tr2::dynamic_bitset::resize fails
Hi, On 10/17/2013 09:04 AM, Ed Smith-Rowland wrote: This patch bootstraps and tests clean on x86-64-linux. Truthfully, dynamic_bitset needs some more love wrt C++11 and a testsuite. It got put in before it was baked really. That will be later. Patch is Ok with me. Before committing you may want to test -m32 too. Thanks, Paolo.
RE: FW: MAX_PATH problems with mingw gcc
Thank you for review. Please see inline From: Joey Ye [mailto:joey.ye...@gmail.com] Sent: Thursday, October 17, 2013 7:36 AM The macro FILENAME_NORMALIZE doesn't look necessary. I would prefer use filename_normalize directly, just as filename_cmp is used in GCC rather than FILENAME_CMP The patch is quite old(from gcc 4.2, may be 4.3). As I remember those days filenames.h had only extern int filename_cmp (const char *s1, const char *s2); #define FILENAME_CMP(s1, s2)filename_cmp(s1, s2) at the end. So I just followed its style. Nowadays the style is changed. I'll use filename_normalize everywhere. Also it normalize filename like: c:\abc\ into c:/abc\ The ending \ isn't normalized. Need to refine logic here to handle this case. Thank you. I was concentrated on filenames shorting correctness and missed this case. In fact it is not too bad - the patch doesn't guarantee that all filenames inside gcc process will be as short as possible and will have forward slashes. It just simplifies filenames in some key places. Anyway, I'll add this case to my test and fix it. Other comments are accepted without notes:) I'll redo the patch in several days. Thank you Vladimir
Re: [PATCH] Hoist loop invariant statements containing data refs with zero-step during loop-versioning in vectorization.
On Wed, 16 Oct 2013, Cong Hou wrote: On Wed, Oct 16, 2013 at 2:02 AM, Richard Biener rguent...@suse.de wrote: On Tue, 15 Oct 2013, Cong Hou wrote: Thank you for your reminder, Jeff! I just noticed Richard's comment. I have modified the patch according to that. The new patch is attached. (posting patches inline is easier for review, now you have to deal with no quoting markers ;)) Comments inline. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 8a38316..2637309 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2013-10-15 Cong Hou co...@google.com + + * tree-vect-loop-manip.c (vect_loop_versioning): Hoist loop invariant + statement that contains data refs with zero-step. + 2013-10-14 David Malcolm dmalc...@redhat.com * dumpfile.h (gcc::dump_manager): New class, to hold state diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 075d071..9d0f4a5 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2013-10-15 Cong Hou co...@google.com + + * gcc.dg/vect/pr58508.c: New test. + 2013-10-14 Tobias Burnus bur...@net-b.de PR fortran/58658 diff --git a/gcc/testsuite/gcc.dg/vect/pr58508.c b/gcc/testsuite/gcc.dg/vect/pr58508.c new file mode 100644 index 000..cb22b50 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr58508.c @@ -0,0 +1,20 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -ftree-vectorize -fdump-tree-vect-details } */ + + +/* The GCC vectorizer generates loop versioning for the following loop + since there may exist aliasing between A and B. The predicate checks + if A may alias with B across all iterations. Then for the loop in + the true body, we can assert that *B is a loop invariant so that + we can hoist the load of *B before the loop body. */ + +void foo (int* a, int* b) +{ + int i; + for (i = 0; i 10; ++i) +a[i] = *b + 1; +} + + +/* { dg-final { scan-tree-dump-times hoist 2 vect } } */ +/* { dg-final { cleanup-tree-dump vect } } */ diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c index 574446a..f4fdec2 100644 --- a/gcc/tree-vect-loop-manip.c +++ b/gcc/tree-vect-loop-manip.c @@ -2477,6 +2477,92 @@ vect_loop_versioning (loop_vec_info loop_vinfo, adjust_phi_and_debug_stmts (orig_phi, e, PHI_RESULT (new_phi)); } Note that applying this kind of transform at this point invalidates some of the earlier analysis the vectorizer performed (namely the def-kind which now effectively gets vect_external_def from vect_internal_def). In this case it doesn't seem to cause any issues (we re-compute the def-kind everytime we need it (how wasteful)). + /* Extract load and store statements on pointers with zero-stride + accesses. */ + if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) +{ + /* In the loop body, we iterate each statement to check if it is a load +or store. Then we check the DR_STEP of the data reference. If +DR_STEP is zero, then we will hoist the load statement to the loop +preheader, and move the store statement to the loop exit. */ We don't move the store yet. Micha has a patch pending that enables vectorization of zero-step stores. + for (gimple_stmt_iterator si = gsi_start_bb (loop-header); + !gsi_end_p (si);) While technically ok now (vectorized loops contain a single basic block) please use LOOP_VINFO_BBS () to get at the vector of basic-blcoks and iterate over them like other code does. Have done it. + { + gimple stmt = gsi_stmt (si); + stmt_vec_info stmt_info = vinfo_for_stmt (stmt); + struct data_reference *dr = STMT_VINFO_DATA_REF (stmt_info); + + if (dr integer_zerop (DR_STEP (dr))) + { + if (DR_IS_READ (dr)) + { + if (dump_enabled_p ()) + { + dump_printf_loc + (MSG_NOTE, vect_location, + hoist the statement to outside of the loop ); hoisting out of the vectorized loop: + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt, 0); + dump_printf (MSG_NOTE, \n); + } + + gsi_remove (si, false); + gsi_insert_on_edge_immediate (loop_preheader_edge (loop), stmt); Note that this will result in a bogus VUSE on the stmt at this point which will be only fixed because of implementation details of loop versioning. Either get the correct VUSE from the loop header virtual PHI node preheader edge (if there is none then the current VUSE is the correct one to use) or clear it. I just cleared the VUSE since I noticed that after the vectorization pass the correct VUSE
Re: [RFC] By default if-convert only basic blocks that will be vectorized
On Wed, 16 Oct 2013, pins...@gmail.com wrote: On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote: Hi! Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. I think on many other targets it actually helps. I know for one it helps on octeon even though octeon has no vector instructions. I think it helps most arm targets too. The main issue is that it has no cost model - the only cost model being that it can successfully if-convert all conditional code in a loop, resulting in a single-BB loop. So it is clearly vectorization targeted. It's infrastructure may be useful to do a more sensible if-conversion on GIMPLE level on scalar code. Of course even the infrastructure needs some TLC (and some better generic machinery of keeping track and simplifying of a predicate combination). Thanks, Richard. Thanks, Andrew Here is a prototype of a patch that will by default (unless explicit -ftree-loop-if-convert) only if-convert loops internally for vectorization, so the COND_EXPRs actually only appear as VEC_COND_EXPRs in the vectorized basic blocks, but will not appear if vectorization fails, or in the scalar loop if vectorization is conditional, or in the prologue or epilogue loops around the vectorized loop. Instead of moving the ifcvt pass inside of the vectorizer, this patch during ifcvt performs loop versioning depending on a special internal call, only if the internal call returns true we go to the if-converted original loop, otherwise the non-if-converted copy of the original loop is performed. And the vectorizer is taught to fold this internal call into true resp. false depending on if the loop was vectorized or not, and vectorizer loop versioning, peeling for alignment and for bound are adjusted to also copy from the non-if-converted loop rather than if-converted one. Besides fixing the various PRs where if-conversion pessimizes code I'd like to also move forward with this with conditional loads and stores, http://gcc.gnu.org/ml/gcc-patches/2012-11/msg00202.html where the if-unconversion approach looked like a failure. This patch doesn't yet handle if-converted inner loop in outer loop vectorization, something on my todo list (so several vect-cond-*.c tests FAIL because they are no longer vectorized) plus I had to change two SLP vectorization tests that silently relied on loop if-conversion being performed to actually optimize the basic block (if the same thing didn't appear in a loop, it wouldn't be optimized at all). On the newly added testcase on x86_64, there are before this patch 18 scalar conditional moves, with the patch just 2 (both in the checking routine). Comments? --- gcc/internal-fn.def.jj2013-10-11 14:32:57.079909782 +0200 +++ gcc/internal-fn.def2013-10-11 17:23:58.705526840 +0200 @@ -43,3 +43,4 @@ DEF_INTERNAL_FN (STORE_LANES, ECF_CONST DEF_INTERNAL_FN (GOMP_SIMD_LANE, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_VF, ECF_CONST | ECF_LEAF | ECF_NOTHROW) DEF_INTERNAL_FN (GOMP_SIMD_LAST_LANE, ECF_CONST | ECF_LEAF | ECF_NOTHROW) +DEF_INTERNAL_FN (LOOP_VECTORIZED, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW) --- gcc/tree-vect-loop-manip.c.jj2013-09-30 22:13:47.0 +0200 +++ gcc/tree-vect-loop-manip.c2013-10-15 12:57:54.854970913 +0200 @@ -374,24 +374,31 @@ LOOP- loop1 static void slpeel_update_phi_nodes_for_guard1 (edge guard_edge, struct loop *loop, +struct loop *scalar_loop, bool is_new_loop, basic_block *new_exit_bb) { - gimple orig_phi, new_phi; + gimple orig_phi, new_phi, scalar_phi = NULL; gimple update_phi, update_phi2; tree guard_arg, loop_arg; basic_block new_merge_bb = guard_edge-dest; edge e = EDGE_SUCC (new_merge_bb, 0); basic_block update_bb = e-dest; basic_block orig_bb = loop-header; - edge new_exit_e; + edge new_exit_e, scalar_e = NULL; tree current_new_name; - gimple_stmt_iterator gsi_orig, gsi_update; + gimple_stmt_iterator gsi_orig, gsi_update, gsi_scalar = gsi_none (); /* Create new bb between loop and new_merge_bb. */ *new_exit_bb = split_edge (single_exit (loop)); new_exit_e = EDGE_SUCC (*new_exit_bb, 0); + if (scalar_loop != NULL !is_new_loop) +{ + gsi_scalar = gsi_start_phis (scalar_loop-header); + scalar_e = EDGE_SUCC (scalar_loop-latch, 0); +} + for (gsi_orig = gsi_start_phis (orig_bb), gsi_update = gsi_start_phis (update_bb); !gsi_end_p (gsi_orig) !gsi_end_p (gsi_update); @@ -401,6 +408,11 @@ slpeel_update_phi_nodes_for_guard1 (edge tree new_res; orig_phi = gsi_stmt (gsi_orig); update_phi = gsi_stmt (gsi_update); + if (scalar_e != NULL) +{ +
Re: [PATCH] Fix PR58143 and dups
On Thu, 17 Oct 2013, Jakub Jelinek wrote: On Thu, Oct 17, 2013 at 09:56:31AM +0200, Richard Biener wrote: off-noise (more than 5s difference) may be 462.libquantum and 459.GemsFDTD. I didn't include unpatched trunk in the comparison (not fixing the bug isn't an option after all). Conceptually I like the rewriting into unsigned arithmetic more so I'm going to apply that variant later today (re-testing 3 runs of 462.libquantum and 459.GemsFDTD, this time with address-space randomization turned off). Can't we rewrite for the selected arithmetic operations and punt (is that what Bernd's patch did) on moving other arithmetics? Well, there are operations that are safe even for signed types, e.g. rotates, isn't RSHIFT_EXPR also safe? Can't you handle also LSHIFT_EXPR (or do we treat it even signed as never undefined in the middle-end/backend?). Sure, we can. I concentrated on fixing the cases that later cause issues with aggressively treating the ops during niter analysis which means ops that SCEV handles (see tree-scalar-evolution.c:interpret_rhs_expr). But I'd rather have testcases here ... Teaching LIM how to move conditional code would also be nice. Btw, I don't think we'd lose many optimization opportunities if we for example don't treat signed shifts or division as invoking undefined behavior on overflow. All the overflow business is important for loop optimizations and thus important only for the cases that SCEV can handle ... Richard.
[RESEND] Enable building of libatomic on AArch64
Resending as the previous attempt went missing... 2013-10-04 Michael Hudson-Doyle michael.hud...@linaro.org * libatomic/configure.tgt (aarch64*): Remove code preventing build. * gcc/testsuite/lib/target-supports.exp (check_effective_target_sync_long_long): AArch64 supports atomic operations on long long. (check_effective_target_sync_long_long_runtime): AArch64 can execute atomic operations on long long. diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index 7eb4dfe..5557c06 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -4508,6 +4508,7 @@ proc check_effective_target_sync_int_128_runtime { } { proc check_effective_target_sync_long_long { } { if { [istarget x86_64-*-*] || [istarget i?86-*-*]) + || [istarget aarch64*-*-*] || [istarget arm*-*-*] || [istarget alpha*-*-*] || ([istarget sparc*-*-*] [check_effective_target_lp64]) } { @@ -4537,6 +4538,8 @@ proc check_effective_target_sync_long_long_runtime { } { } } }] +} elseif { [istarget aarch64*-*-*] } { + return 1 } elseif { [istarget arm*-*-linux-*] } { return [check_runtime sync_longlong_runtime { #include stdlib.h diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt index b9e5d6c..7eaab38 100644 --- a/libatomic/configure.tgt +++ b/libatomic/configure.tgt @@ -95,11 +95,6 @@ fi # Other system configury case ${target} in - aarch64*) - # This is currently not supported in AArch64. - UNSUPPORTED=1 - ;; - arm*-*-linux*) # OS support for atomic primitives. config_path=${config_path} linux/arm posix
Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
On 19/09/13 18:21, Charles Baylis wrote: Hi Here is an updated version. Changelog: * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails with hardfloat, and test is not thumb-specific * gcc,target/arm/thumb-ltu.c: Avoid test failure with hardfloat ABI by requiring arm_thumb1_ok * lib/target-supports.exp (check_effective_target_arm_fp16_ok_nocache): don't force -mfloat-abi=soft when building for hardfloat target ChangeLogs should be formatted to 80 columns. Otherwise OK. R. On 19 August 2013 16:34, Richard Earnshaw rearn...@arm.com wrote: On 15/08/13 15:10, Charles Baylis wrote: Hi The attached patch fixes some tests which fail when testing gcc for a arm-none-linux-gnueabihf target because they do not expect to be built with a hard float ABI. The change in target-supports.exp fixes arm-fp16-ops-5.c and arm-fp16-ops-6.c. Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause any other tests to break. Comments? This is my first patch, so please point out anything wrong. 2013-08-15 Charles Baylis charles.bay...@linaro.org * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * target-supports.exp: don't force -mfloat-abi=soft when building for hardfloat target hf-fixes.txt Index: gcc/testsuite/gcc.dg/builtin-apply2.c === --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726) +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-skip-if Variadic funcs have all args on stack. Normal funcs have args in registers. { aarch64*-*-* avr-*-* } { * } { } } */ /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* } { -mfloat-abi=hard } { } } */ +/* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-gnueabihf } { * } { -mfloat-abi=soft* } } */ As you've noticed, basing the test's behaviour on the config variant doesn't work reliably. The builtin-apply2 test really should be skipped if the current test variant is not soft-float. We already have check_effective_target_arm_hf_eabi in target-supports.exp that checks whether __ARM_PCS_VFP is defined during a compilation. So can replace both arm related lines in builtin-apply2 with /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* arm_hf_eabi} { * } { } } */ /* PR target/12503 */ /* Origin: pierre.nguyen-tu...@asim.lip6.fr */ Index: gcc/testsuite/gcc.dg/tls/pr42894.c === --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726) +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy) @@ -1,6 +1,7 @@ /* PR target/42894 */ /* { dg-do compile } */ /* { dg-options -march=armv5te -mthumb { target arm*-*-* } } */ +/* { dg-options -march=armv5te -mthumb -mfloat-abi=soft { target arm*-*-*hf } } */ /* { dg-require-effective-target tls } */ Although the original PR was for Thumb1, this is a generic test. I'm not convinced that on ARM it should try to force thumb1. Removing the original dg-options line should solve the problem and we then get better multi-lib testing as well. extern __thread int t; Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c === --- gcc/testsuite/gcc.target/arm/thumb-ltu.c (revision 201726) +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { -march=armv6 -march=armv6j -march=armv6z } } */ -/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 } */ +/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft } */ This won't work if there's an explict -mfloat-abi={softfp,hard} on the multilib options. Probably the best thing to do here is to skip the test if arm_thumb1_ok is not true. void f(unsigned a, unsigned b, unsigned c, unsigned d) { Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 201726) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2445,6 +2445,11 @@ # Must generate floating-point instructions. return 0 } +if [check-flags [list { *-*-gnueabihf } { * } { } ]] { +# Use existing float-abi and force an fpu which supports fp16 This should use arm_hf_eabi as described above. + set et_arm_fp16_flags -mfpu=vfpv4 + return 1; +
[Ada] Clean ups in SPARK mode
1) Remove special expansion of NOT IN operator in SPARK verification The special expansion for NOT IN operator in the SPARK formal verification mode is not needed anymore. Now removed. 2) Document additional requirements on tree for SPARK verification Formal verification of SPARK code is done in a special mode, in which the tree generated by the frontend must respect additional requirements. These are now described in a special section of sinfo.ads 2013-10-17 Yannick Moy m...@adacore.com * exp_spark.adb (Expand_SPARK): Remove special case for NOT IN operation. * sinfo.ads: Add special comment section to describe SPARK mode effect on tree. * exp_spark.ads: Remove comments, moved to sinfo.ads. Index: exp_spark.adb === --- exp_spark.adb (revision 203568) +++ exp_spark.adb (working copy) @@ -25,7 +25,6 @@ with Atree;use Atree; with Einfo;use Einfo; -with Exp_Ch4; use Exp_Ch4; with Exp_Dbug; use Exp_Dbug; with Exp_Util; use Exp_Util; with Sem_Aux; use Sem_Aux; @@ -80,12 +79,6 @@ N_Identifier= Expand_Potential_Renaming (N); - -- A NOT IN B gets transformed to NOT (A IN B). This is the same - -- expansion used in the normal case, so shared the code. - - when N_Not_In = -Expand_N_Not_In (N); - when N_Object_Renaming_Declaration = Expand_SPARK_N_Object_Renaming_Declaration (N); Index: exp_spark.ads === --- exp_spark.ads (revision 203568) +++ exp_spark.ads (working copy) @@ -30,54 +30,6 @@ -- Expand_SPARK is called directly by Expander.Expand. --- SPARK expansion has three main objectives: - ---1. Perform limited expansion to explicit some Ada rules and constructs --- (translate 'Old and 'Result, replace renamings by renamed, insert ---conversions, expand actuals in calls to introduce temporaries, expand ---generics instantiations) - ---2. Facilitate treatment for the formal verification back-end (fully --- qualify names, expand set membership, compute data dependences) - ---3. Avoid the introduction of low-level code that is difficult to analyze --- formally, as typically done in the full expansion for high-level --- constructs (tasking, dispatching) - --- To fulfill objective 1, Expand_SPARK selectively expands some constructs. - --- To fulfill objective 2, the tree after SPARK expansion should be fully --- analyzed semantically. In particular, all expression must have their proper --- type, and semantic links should be set between tree nodes (partial to full --- view, etc.) Some kinds of nodes should be either absent, or can be ignored --- by the formal verification backend: - --- N_Object_Renaming_Declaration: can be ignored safely --- N_Expression_Function: absent (rewitten) --- N_Expression_With_Actions: absent (not generated) - --- SPARK cross-references are generated from the regular cross-references --- (used for browsing and code understanding) and additional references --- collected during semantic analysis, in particular on all --- dereferences. These SPARK cross-references are output in a separate section --- of ALI files, as described in spark_xrefs.adb. They are the basis for the --- computation of data dependences in the formal verification backend. This --- implies that all cross-references should be generated in this mode, even --- those that would not make sense from a user point-of-view, and that --- cross-references that do not lead to data dependences for subprograms can --- be safely ignored. - --- To support the formal verification of units parameterized by data, the --- value of deferred constants should not be considered as a compile-time --- constant at program locations where the full view is not visible. - --- To fulfill objective 3, Expand_SPARK does not expand features that are not --- formally analyzed (tasking), or for which formal analysis relies on the --- source level representation (dispatching, aspects, pragmas). However, these --- should be semantically analyzed, which sometimes requires the insertion of --- semantic pre-analysis, for example for subprogram contracts and pragma --- check/assert. - with Types; use Types; package Exp_SPARK is Index: sinfo.ads === --- sinfo.ads (revision 203568) +++ sinfo.ads (working copy) @@ -508,6 +508,48 @@ -- simply ignore these nodes, since they are not relevant to the task -- of back annotating representation information. + + -- SPARK Mode -- + + + -- When a file is compiled in SPARK mode (-gnatd.F), a very light expansion + -- is performed and the analysis must generate a tree in a form
[Ada] Fix generation of references for SPARK formal verification
The generation of references for SPARK formal verification was missing some write references through renamings. This is now fixed. Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Yannick Moy m...@adacore.com * sem_ch8.adb (Find_Direct_Name): Keep track of assignments for renamings in SPARK mode. Index: sem_ch8.adb === --- sem_ch8.adb (revision 203568) +++ sem_ch8.adb (working copy) @@ -5073,9 +5073,14 @@ -- Entity is unambiguous, indicate that it is referenced here -- For a renaming of an object, always generate simple reference, --- we don't try to keep track of assignments in this case. +-- we don't try to keep track of assignments in this case, except +-- in SPARK mode where renamings are traversed for generating +-- local effects of subprograms. -if Is_Object (E) and then Present (Renamed_Object (E)) then +if Is_Object (E) + and then Present (Renamed_Object (E)) + and then not SPARK_Mode +then Generate_Reference (E, N); -- If the renamed entity is a private protected component,
[Ada] Remove useless special handling for SPARK verification
The use of a specific light expansion for SPARK verification has rendered obsolete a number of special handling cases only triggered in the normal full expansion. Remove these useless cases now. Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Yannick Moy m...@adacore.com * exp_ch3.adb (Expand_Freeze_Class_Wide_Type, Expand_Freeze_Class_Wide_Type, Expand_Freeze_Class_Wide_Type): Remove useless special cases. * exp_ch4.adb (Expand_Allocator_Expression, Expand_N_Allocator, Expand_N_Op_Expon): Remove useless special cases. * exp_ch6.adb (Is_Build_In_Place_Function_Call): Disable build-in-place in SPARK mode by testing Full_Expander_Active instead of Expander_Active. (Make_Build_In_Place_Call_In_Allocator): Remove useless special case. * exp_util.adb (Build_Allocate_Deallocate_Proc): Remove useless special case. * sem_eval.adb (Compile_Time_Known_Value): Remove special handling of deferred constant. Index: exp_util.adb === --- exp_util.adb(revision 203568) +++ exp_util.adb(working copy) @@ -560,13 +560,6 @@ -- Start of processing for Build_Allocate_Deallocate_Proc begin - -- Do not perform this expansion in SPARK mode because it is not - -- necessary. - - if SPARK_Mode then - return; - end if; - -- Obtain the attributes of the allocation / deallocation if Nkind (N) = N_Free_Statement then Index: exp_ch4.adb === --- exp_ch4.adb (revision 203568) +++ exp_ch4.adb (working copy) @@ -1268,14 +1268,10 @@ --* .NET/JVM - these targets do not support address arithmetic --and unchecked conversion, key elements of Finalize_Address. ---* SPARK mode - the call is useless and results in unwanted ---expansion. - --* CodePeer mode - TSS primitive Finalize_Address is not --created in this mode. if VM_Target = No_VM - and then not SPARK_Mode and then not CodePeer_Mode and then Present (Finalization_Master (PtrT)) and then Present (Temp_Decl) @@ -4295,16 +4291,13 @@ end if; -- The finalization master must be inserted and analyzed as part of - -- the current semantic unit. This form of expansion is not carried - -- out in SPARK mode because it is useless. Note that the master is - -- updated when analysis changes current units. + -- the current semantic unit. Note that the master is updated when + -- analysis changes current units. - if not SPARK_Mode then -if Present (Rel_Typ) then - Set_Finalization_Master (PtrT, Finalization_Master (Rel_Typ)); -else - Set_Finalization_Master (PtrT, Current_Anonymous_Master); -end if; + if Present (Rel_Typ) then +Set_Finalization_Master (PtrT, Finalization_Master (Rel_Typ)); + else +Set_Finalization_Master (PtrT, Current_Anonymous_Master); end if; end if; @@ -4839,15 +4832,11 @@ --Set_Finalize_Address -- (PtrTFM, TFD'Unrestricted_Access); - -- Do not generate this call in the following cases: - -- - --* SPARK mode - the call is useless and results in - --unwanted expansion. - -- - --* CodePeer mode - TSS primitive Finalize_Address is - --not created in this mode. + -- Do not generate this call in CodePeer mode, as TSS + -- primitive Finalize_Address is not created in this + -- mode. - elsif not (SPARK_Mode or CodePeer_Mode) then + elsif not CodePeer_Mode then Insert_Action (N, Make_Set_Finalize_Address_Call (Loc = Loc, @@ -7321,9 +7310,9 @@ begin Binary_Op_Validity_Checks (N); - -- CodePeer and GNATprove want to see the unexpanded N_Op_Expon node + -- CodePeer wants to see the unexpanded N_Op_Expon node - if CodePeer_Mode or SPARK_Mode then + if CodePeer_Mode then return; end if; Index: exp_ch6.adb === --- exp_ch6.adb (revision 203568) +++ exp_ch6.adb (working copy) @@ -9599,7 +9599,11 @@ -- disabled (such as with -gnatc) since those would trip over the raise -- of Program_Error below. - if not Expander_Active then + -- In SPARK mode, build-in-place calls are not
Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
On 17/10/13 11:32, Richard Earnshaw wrote: On 19/09/13 18:21, Charles Baylis wrote: Hi Here is an updated version. Changelog: * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails with hardfloat, and test is not thumb-specific * gcc,target/arm/thumb-ltu.c: Avoid test failure with hardfloat ABI by requiring arm_thumb1_ok * lib/target-supports.exp (check_effective_target_arm_fp16_ok_nocache): don't force -mfloat-abi=soft when building for hardfloat target ChangeLogs should be formatted to 80 columns. Oh, and they should be complete sentences. So start with a capital letter, and end with a full stop. R. Otherwise OK. R. On 19 August 2013 16:34, Richard Earnshaw rearn...@arm.com wrote: On 15/08/13 15:10, Charles Baylis wrote: Hi The attached patch fixes some tests which fail when testing gcc for a arm-none-linux-gnueabihf target because they do not expect to be built with a hard float ABI. The change in target-supports.exp fixes arm-fp16-ops-5.c and arm-fp16-ops-6.c. Tested on arm-none-linux-gnueabihf using qemu-arm, and does not cause any other tests to break. Comments? This is my first patch, so please point out anything wrong. 2013-08-15 Charles Baylis charles.bay...@linaro.org * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * arm/thumb-ltu.c: Use -mfloat-abi=soft as Thumb1 does not support hardfloat ABI * target-supports.exp: don't force -mfloat-abi=soft when building for hardfloat target hf-fixes.txt Index: gcc/testsuite/gcc.dg/builtin-apply2.c === --- gcc/testsuite/gcc.dg/builtin-apply2.c (revision 201726) +++ gcc/testsuite/gcc.dg/builtin-apply2.c (working copy) @@ -1,6 +1,7 @@ /* { dg-do run } */ /* { dg-skip-if Variadic funcs have all args on stack. Normal funcs have args in registers. { aarch64*-*-* avr-*-* } { * } { } } */ /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* } { -mfloat-abi=hard } { } } */ +/* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-gnueabihf } { * } { -mfloat-abi=soft* } } */ As you've noticed, basing the test's behaviour on the config variant doesn't work reliably. The builtin-apply2 test really should be skipped if the current test variant is not soft-float. We already have check_effective_target_arm_hf_eabi in target-supports.exp that checks whether __ARM_PCS_VFP is defined during a compilation. So can replace both arm related lines in builtin-apply2 with /* { dg-skip-if Variadic funcs use Base AAPCS. Normal funcs use VFP variant. { arm*-*-* arm_hf_eabi} { * } { } } */ /* PR target/12503 */ /* Origin: pierre.nguyen-tu...@asim.lip6.fr */ Index: gcc/testsuite/gcc.dg/tls/pr42894.c === --- gcc/testsuite/gcc.dg/tls/pr42894.c(revision 201726) +++ gcc/testsuite/gcc.dg/tls/pr42894.c(working copy) @@ -1,6 +1,7 @@ /* PR target/42894 */ /* { dg-do compile } */ /* { dg-options -march=armv5te -mthumb { target arm*-*-* } } */ +/* { dg-options -march=armv5te -mthumb -mfloat-abi=soft { target arm*-*-*hf } } */ /* { dg-require-effective-target tls } */ Although the original PR was for Thumb1, this is a generic test. I'm not convinced that on ARM it should try to force thumb1. Removing the original dg-options line should solve the problem and we then get better multi-lib testing as well. extern __thread int t; Index: gcc/testsuite/gcc.target/arm/thumb-ltu.c === --- gcc/testsuite/gcc.target/arm/thumb-ltu.c (revision 201726) +++ gcc/testsuite/gcc.target/arm/thumb-ltu.c (working copy) @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-skip-if incompatible options { arm*-*-* } { -march=* } { -march=armv6 -march=armv6j -march=armv6z } } */ -/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 } */ +/* { dg-options -mcpu=arm1136jf-s -mthumb -O2 -mfloat-abi=soft } */ This won't work if there's an explict -mfloat-abi={softfp,hard} on the multilib options. Probably the best thing to do here is to skip the test if arm_thumb1_ok is not true. void f(unsigned a, unsigned b, unsigned c, unsigned d) { Index: gcc/testsuite/lib/target-supports.exp === --- gcc/testsuite/lib/target-supports.exp (revision 201726) +++ gcc/testsuite/lib/target-supports.exp (working copy) @@ -2445,6 +2445,11 @@ # Must generate floating-point instructions. return 0 } +if [check-flags [list { *-*-gnueabihf } { * } { } ]] { +# Use existing
[Ada] Add switch to output reason why spec requires body
The warning switch -gnatw.y(.Y) activates(deactivates) a mode in which information messages are given that show why a package spec requires a body. This can be useful if you have a large package which unexpectedly requires a body. 1. package ReqBody is | info: ReqBody requires body (Elaborate_Body) 2.pragma Elaborate_Body; 3.A : Integer; 4.B : Integer; 5.procedure K; | info: ReqBody requires body (K requires completion) 6. end ReqBody; Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Robert Dewar de...@adacore.com * gnat_ugn.texi: Document -gnatw.y/-gnatw.Y. * opt.ads (List_Body_Required_Info): New flag. * prep.adb: Minor reformatting. * sem_ch7.adb (Unit_Requires_Body_Info): New procedure (Analyze_Package_Specification): Add call to Unit_Requires_Body_Info. * ug_words: Add entries for -gnatw.y and -gnatw.Y. * usage.adb: Add line for new warning switch -gnatw.y/.Y. * vms_data.ads: Add entry for [NO_]WHY_SPEC_NEEDS_BODY warning qualifier. * warnsw.ads, warnsw.adb: Implement new warning switch -gnatw.y/.Y. Index: sem_ch7.adb === --- sem_ch7.adb (revision 203598) +++ sem_ch7.adb (working copy) @@ -136,6 +136,11 @@ -- inherited private operation has been overridden, then it's replaced by -- the overriding operation. + procedure Unit_Requires_Body_Info (P : Entity_Id); + -- Outputs info messages showing why package specification P requires a + -- body. Caller has checked that the switch requesting this information + -- is set, and that the package does indeed require a body. + -- -- Analyze_Package_Body -- -- @@ -1515,6 +1520,15 @@ (\pragma Elaborate_Body is required in this case, P); end; end if; + + -- If switch set, output information on why body required + + if List_Body_Required_Info +and then In_Extended_Main_Source_Unit (Id) +and then Unit_Requires_Body (Id) + then + Unit_Requires_Body_Info (Id); + end if; end Analyze_Package_Specification; -- @@ -1686,8 +1700,8 @@ and then No (Interface_Alias (Node (Op_Elmt_2))) then -- The private inherited operation has been - -- overridden by an explicit subprogram: replace - -- the former by the latter. + -- overridden by an explicit subprogram: + -- replace the former by the latter. New_Op := Node (Op_Elmt_2); Replace_Elmt (Op_Elmt, New_Op); @@ -2748,4 +2762,135 @@ return False; end Unit_Requires_Body; + - + -- Unit_Requires_Body_Info -- + - + + procedure Unit_Requires_Body_Info (P : Entity_Id) is + E : Entity_Id; + + begin + -- Imported entity never requires body. Right now, only subprograms can + -- be imported, but perhaps in the future we will allow import of + -- packages. + + if Is_Imported (P) then + return; + + -- Body required if library package with pragma Elaborate_Body + + elsif Has_Pragma_Elaborate_Body (P) then + Error_Msg_N + (?Y?info: requires body (Elaborate_Body), P); + + -- Body required if subprogram + + elsif Is_Subprogram (P) or else Is_Generic_Subprogram (P) then + Error_Msg_N (?Y?info: requires body (subprogram case), P); + + -- Body required if generic parent has Elaborate_Body + + elsif Ekind (P) = E_Package +and then Nkind (Parent (P)) = N_Package_Specification +and then Present (Generic_Parent (Parent (P))) + then + declare +G_P : constant Entity_Id := Generic_Parent (Parent (P)); + begin +if Has_Pragma_Elaborate_Body (G_P) then + Error_Msg_N + (?Y?info: requires body (generic parent Elaborate_Body), + P); +end if; + end; + + -- A [generic] package that introduces at least one non-null abstract + -- state requires completion. However, there is a separate rule that + -- requires that such a package have a reason other than this for a + -- body being required (if necessary a pragma Elaborate_Body must be + -- provided). If Ignore_Abstract_State is True, we don't do this check + -- (so we can use Unit_Requires_Body to check for some other reason). + + elsif Ekind_In (P, E_Generic_Package, E_Package) +and then Present (Abstract_States (P)) +and then + not
Re: [PATCH,ARM] fix testsuite failures for arm-none-linux-gnueabihf
On Thu, Oct 17, 2013 at 11:32:42AM +0100, Richard Earnshaw wrote: On 19/09/13 18:21, Charles Baylis wrote: Here is an updated version. Changelog: * gcc.dg/builtin-apply2.c: skip test on arm hardfloat ABI targets * gcc.dg/tls/pr42894.c: Remove options, forcing -mthumb fails with hardfloat, and test is not thumb-specific * gcc,target/arm/thumb-ltu.c: Avoid test failure with hardfloat ABI by requiring arm_thumb1_ok * lib/target-supports.exp (check_effective_target_arm_fp16_ok_nocache): don't force -mfloat-abi=soft when building for hardfloat target ChangeLogs should be formatted to 80 columns. Not only that. The descriptions should start with capital letter and end with a dot. For pr42894.c, you are not removing options, you are removing dg-options, and the rest is why, not what, so doesn't belong into the ChangeLog description. Similarly, for thumb-ltu.c, what are doing is remove dg-skip-if and require affective target arm_thumb1_ok. Jakub
Re: RFC: Add of type-demotion pass
On Wed, Oct 16, 2013 at 6:33 PM, Jeff Law l...@redhat.com wrote: On 10/16/13 03:31, Richard Biener wrote: I see two primary effects of type sinking. Note it was called type demotion ;) ;) It's a mental block of mine; it's been called type hoisting/sinking in various contexts and I see parallels between the code motion algorithms and how the type promotion/demotion exposes unnecessary type conversions. So I keep calling them hoisting/sinking. I'll try to use promotion/demotion. First and probably the most important in my mind is by sinking a cast through its uses the various transformations we already perform are more likely to apply *without* needing to handle optimizing through typecasts explicitly. I would say it is desirable to express arithmetic in the smallest possible types (see what premature optimization the C family frontends do to narrow operations again after C integer promotion applied). I don't see this as the major benefit of type demotion. Yes, there is some value in shrinking constants and the like, but in my experience the benefits are relatively small and often get lost in things like partial register stalls on x86, the PA and probably others (yes, the PA has partial register stalls, it's just that nobody used that term). What I really want to get at here is avoiding having a large number of optimizers looking back through the use-def chains and attempting to elide typecasts in the middle of a chain of statements of interest. Hmm, off the top of my head only forwprop and VRP look back through use-def chains to elide typecasts. And they do that to optimize those casts, thus it is their job ...? Other cases are around, but those are of the sorts of is op1 available in type X and/or can I safely cast it to type X? that code isn't going to be simplified by generic promotion / demotion because that code isn't going to know what type pass Y in the end wants. Abstracting functions that can answer those questions instead of repeating N variants of it would of course be nice. Likewise reducing the number of places we perform promotion / demotion (remove it from frontend code and fold, add it in the GIMPLE combiner). Also making the GIMPLE combiner available as an utility to apply to a single statement (see my very original GIMPLE-fold proposal) would be very useful. As for promotion / demotion (if you are not talking about applying PROMOTE_MODE which rather forces promotion of variables and requires inserting compensation code), you want to optimize op1 = (T) op1'; op2 = (T) op2'; x = op1 OP op2; (*) y = (T2) x; to either carry out OP in type T2 or in a type derived from the types of op1' and op2'. For the simple case combine-like pattern matching is ok. It gets more complicated if there are a series of statements here (*), but even that case is handled by iteratively applying the combiner patterns (which forwprop does). If you split out promotion / demotion into a separate pass then you introduce pass ordering issues as combining may introduce promotion / demotion opportunities and the other way around. That wouldn't apply to a pass lowering GIMPLE to fully honor PROMOTE_MODE. You need some kind of range information to do this, thus either integrate it into VRP (there is already code that does this there) or use range information from VRP which we now preserve. If the primary goal is to shrink types, then yes, you want to use whatever information you can, including VRP. But that's not the primary goal in my mind, at least not at this stage. There's no reason why this pass couldn't utilize VRP information to provide more opportunities to demote types and achieve the goal you want. But I'd consider that a follow-on opportunity. The second primary effect is, given two casts where the first indirectly feeds the second (ie, the first feeds some statement, which then feeds the second cast), if we're able to sink the first cast, we end up with the first cast directly feeding the second cast. When this occurs one of the two casts can often be eliminated. Sadly, I didn't keep any of those test files, but I regularly saw them in GCC bootstraps. This transformation is applied both by fold-const.c and by SSA forwprop (our GIMPLE combiner). Doing it in yet another pass looks wrong (and it isn't type demotion but also can be promotion). Yes, I know. And we need to get this back down to a single implementation. I don't much care which of the 3 implementations we keep, but it really should just be one and it needs to be reusable. I probably should have stated this differently -- the second primary effect is to expose more cases where type conversions can be eliminated via type promotion/demotion. I don't much care which of the 3 blobs of code to eliminate the conversions we use -- I do care that we've got a consistent way to promote/demote conversions to expose the unnecessary type conversions. Sure.
[Ada] Scalar_Storage_Order consistency for nested composites
Scalar_Storage_Order must be consistent between any component of a composite type (array or record) and the composite type itself. We already enforced this in the case where the enclosing type has a Scalar_Storage_Order attribute definition, and the component type has none. We now also do so also in the opposite case when the enclosing type has no Scalar_Storage_Order clause, and any component does have one. The following compilation must be rejected with the indicated error messages: $ gcc -c nat_comp_rev_el.ads nat_comp_rev_el.ads:26:04: composite type must have explicit scalar storage order nat_comp_rev_el.ads:27:04: composite type must have explicit scalar storage order nat_comp_rev_el.ads:29:07: composite type must have explicit scalar storage order nat_comp_rev_el.ads:34:07: composite type must have explicit scalar storage order with System; use System; package Nat_Comp_Rev_El is type U32 is mod 2**32; type LE_Record is record X : U32; end record; for LE_Record use record X at 0 range 0 .. 31; end record; for LE_Record'Bit_Order use Low_Order_First; for LE_Record'Scalar_Storage_Order use Low_Order_First; type BE_Record is record X : U32; end record; for BE_Record use record X at 0 range 0 .. 31; end record; for BE_Record'Bit_Order use High_Order_First; for BE_Record'Scalar_Storage_Order use High_Order_First; -- Reject the below declarations: the component type has an explicit SSO, -- so we also require one on the enclosing composite type. type Two_LE is array (1 .. 2) of LE_Record; type Two_BE is array (1 .. 2) of BE_Record; type Rec_LE is record Comp : LE_Record; X: Integer; end record; type Rec_BE is record Comp : BE_Record; X: Integer; end record; end Nat_Comp_Rev_El; Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Thomas Quinot qui...@adacore.com * freeze.adb (Check_Component_Storage_Order): Reject a record or array type that does not have an explicit Scalar_Storage_Order attribute definition if a component of the record, or the elements of the array, have one. * gnat_rm.texi (attribute Scalar_Storage_Order): Document the above rule. Index: gnat_rm.texi === --- gnat_rm.texi(revision 203568) +++ gnat_rm.texi(working copy) @@ -8727,6 +8727,10 @@ if the component does not start on a byte boundary, then the scalar storage order specified for S and for the nested component type shall be identical. +If @var{S} appears as the type of a record or array component, the enclosing +record or array shall also have a @code{Scalar_Storage_Order} attribute +definition clause. + No component of a type that has a @code{Scalar_Storage_Order} attribute definition may be aliased. Index: freeze.adb === --- freeze.adb (revision 203568) +++ freeze.adb (working copy) @@ -92,11 +92,15 @@ procedure Check_Component_Storage_Order (Encl_Type : Entity_Id; - Comp : Entity_Id); + Comp : Entity_Id; + ADC : Node_Id); -- For an Encl_Type that has a Scalar_Storage_Order attribute definition - -- clause, verify that the component type is compatible. For arrays, - -- Comp is Empty; for records, it is the entity of the component under - -- consideration. + -- clause, verify that the component type has an explicit and compatible + -- attribute/aspect. For arrays, Comp is Empty; for records, it is the + -- entity of the component under consideration. For an Encl_Type that + -- does not have a Scalar_Storage_Order attribute definition clause, + -- verify that the component also does not have such a clause. + -- ADC is the attribute definition clause if present (or Empty). procedure Check_Strict_Alignment (E : Entity_Id); -- E is a base type. If E is tagged or has a component that is aliased @@ -1068,11 +1072,12 @@ procedure Check_Component_Storage_Order (Encl_Type : Entity_Id; - Comp : Entity_Id) + Comp : Entity_Id; + ADC : Node_Id) is Comp_Type : Entity_Id; + Comp_ADC : Node_Id; Err_Node : Node_Id; - ADC : Node_Id; Comp_Byte_Aligned : Boolean; -- Set True for the record case, when Comp starts on a byte boundary @@ -1113,11 +1118,24 @@ -- the attribute definition clause is attached to the first subtype. Comp_Type := Base_Type (Comp_Type); - ADC := Get_Attribute_Definition_Clause - (First_Subtype (Comp_Type), -Attribute_Scalar_Storage_Order); + Comp_ADC := Get_Attribute_Definition_Clause +(First_Subtype (Comp_Type), + Attribute_Scalar_Storage_Order); - if Is_Record_Type (Comp_Type) or else Is_Array_Type
[Ada] Check for illegal global refs to abstract state when refinement visible
This implements the rule in SPARK RM (6.1.5(4)): A global item shall not denote a state abstraction whose refinement is visible (a state abstraction cannot be named within its enclosing package's body other than in its refinement). The following is compiled with -gnatd.V 1. package Depends_Illegal 2.with Abstract_State = A 3. is 4.pragma Elaborate_Body (Depends_Illegal); 5. end Depends_Illegal; 1. package body Depends_Illegal 2.with Refined_State = (A = (X, Y)) 3. is 4.X, Y : Natural := 0; 5.function F1 (Par1 : Natural) return Natural 6. with Global = A, | global reference to A not allowed (SPARK RM 6.1.5(4)) refinement of A is visible at line 2 7.Depends = (F1'Result = A, | global reference to A not allowed (SPARK RM 6.1.5(4)) refinement of A is visible at line 2 8.null = Par1) 9.is 10.begin 11. return X; 12.end F1; 13. end Depends_Illegal; Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Robert Dewar de...@adacore.com * einfo.ads, einfo.adb (Has_Body_References): New flag. (Body_References): New field. * sem_prag.adb (Record_Possible_Body_Reference): New procedure (Analyze_Input_Output): Call Record_Possible_Body_Reference (Analyze_Global_Item): Call Record_Possible_Body_Reference (Analyze_Refinement_Clause): Output messages if illegal global refs. Index: einfo.adb === --- einfo.adb (revision 203568) +++ einfo.adb (working copy) @@ -132,6 +132,7 @@ --String_Literal_Low_BoundNode15 --Access_Disp_Table Elist16 + --Body_References Elist16 --Cloned_Subtype Node16 --DTC_Entity Node16 --Entry_FormalNode16 @@ -552,8 +553,8 @@ --Has_Delayed_Rep_Aspects Flag261 --May_Inherit_Delayed_Rep_Aspects Flag262 --Has_Visible_Refinement Flag263 + --Has_Body_References Flag264 - --(unused)Flag264 --(unused)Flag265 --(unused)Flag266 --(unused)Flag267 @@ -733,6 +734,12 @@ return Flag40 (Id); end Body_Needed_For_SAL; + function Body_References (Id : E) return L is + begin + pragma Assert (Ekind (Id) = E_Abstract_State); + return Elist16 (Id); + end Body_References; + function C_Pass_By_Copy (Id : E) return B is begin pragma Assert (Is_Record_Type (Id)); @@ -1294,6 +1301,12 @@ return Flag139 (Id); end Has_Biased_Representation; + function Has_Body_References (Id : E) return B is + begin + pragma Assert (Ekind (Id) = E_Abstract_State); + return Flag264 (Id); + end Has_Body_References; + function Has_Completion (Id : E) return B is begin return Flag26 (Id); @@ -3336,6 +3349,12 @@ Set_Flag40 (Id, V); end Set_Body_Needed_For_SAL; + procedure Set_Body_References (Id : E; V : L) is + begin + pragma Assert (Ekind (Id) = E_Abstract_State); + Set_Elist16 (Id, V); + end Set_Body_References; + procedure Set_C_Pass_By_Copy (Id : E; V : B := True) is begin pragma Assert (Is_Record_Type (Id) and then Is_Base_Type (Id)); @@ -3909,6 +3928,12 @@ Set_Flag139 (Id, V); end Set_Has_Biased_Representation; + procedure Set_Has_Body_References (Id : E; V : B := True) is + begin + pragma Assert (Ekind (Id) = E_Abstract_State); + Set_Flag264 (Id, V); + end Set_Has_Body_References; + procedure Set_Has_Completion (Id : E; V : B := True) is begin Set_Flag26 (Id, V); @@ -7984,6 +8009,7 @@ W (Has_Anonymous_Master,Flag253 (Id)); W (Has_Atomic_Components, Flag86 (Id)); W (Has_Biased_Representation, Flag139 (Id)); + W (Has_Body_References, Flag264 (Id)); W (Has_Completion, Flag26 (Id)); W (Has_Completion_In_Body, Flag71 (Id)); W (Has_Complex_Representation, Flag140 (Id)); @@ -8672,6 +8698,10 @@ procedure Write_Field16_Name (Id : Entity_Id) is begin case Ekind (Id) is + + when E_Abstract_State = +Write_Str (Body_References); + when E_Record_Type| E_Record_Type_With_Private = Write_Str (Access_Disp_Table); Index: einfo.ads === --- einfo.ads (revision 203568) +++ einfo.ads (working copy) @@ -493,6 +493,12 @@ --
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 16 Oct 09:59, Richard Henderson wrote: On 10/16/2013 09:07 AM, Kirill Yukhin wrote: I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). False. gen_lowpart is perfectly safe post-reload. Indeed, taking the subreg of a hard register should arrive x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset); in simplify_subreg. Have you encountered some specific problem with gen_lowpart? Yes. Patch [8/8] contains testsuite for AVX-512. This pattern is covered as well. When trying to do so: (define_insn_and_split vec_extract_lo_v32hi [(set (match_operand:V16HI 0 nonimmediate_operand =v,m) (vec_select:V16HI (match_operand:V32HI 1 nonimmediate_operand vm,v) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] TARGET_AVX512F !(MEM_P (operands[0]) MEM_P (operands[1])) # reload_completed [(const_int 0)] { rtx op1 = operands[1]; op1 = gen_lowpart (V16HImode, op1); emit_move_insn (operands[0], op1); DONE; }) I've got ICE, with following bt: #1 0x006f28d6 in gen_reg_rtx (mode=V32HImode) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:866 #2 0x0070243a in copy_to_reg (x=(reg:V32HI 21 xmm0 [163])) at /export/users/kyukhin/gcc/git/gcc/gcc/explow.c\ :606 #3 0x0091dfb8 in gen_lowpart_general (mode=V16HImode, x=optimized out) at /export/users/kyukhin/gcc/git/gcc/gcc/rtlhooks.c:50 #4 0x00ce16e8 in gen_split_4943 (curr_insn=optimized out, operands=0x16f6320) at /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md:6329 #5 0x006f7865 in try_split (pat=(set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))), trial=(insn 48 46 49 6 (set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) /export/users/kyukhin/gcc/git/gcc/gcc/testsuite/gcc.target/i386/avx512f-vec-unpack.c:24 2151 {ve\ c_extract_lo_v32hi} (nil)), last=optimized out) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:3467 So, we have: [rtlhooks.c:50]gen_lowpart_general () - [explow.c:606]copy_to_reg () - [emit-rtl.c:866]gen_reg_rtx (): gcc_assert (can_create_pseudo_p ()); Maybe the code in the pattern is buggy? Or is it a gen_lowpart? -- Thanks, K
Re: alias fix for PR58685
On Wed, Oct 16, 2013 at 7:53 PM, Bernd Schmidt ber...@codesourcery.com wrote: The sequence of events here can be summarized as shrink-wrapping causes the i386 backend to do something that confuses alias analysis. The miscompilation is that two instructions are swapped by the scheduler when they shouldn't be, due to an incorrect reg_base_value. The miscompiled function has the following parts: * a loop which uses %rdi as an incoming argument register * following that, a decrement of the stack pointer (shrink-wrapped to this point rather than the start of the function) * the rest of the function which has one set of %rdi to (symbol_ref LC0). The argument register %rdi is dead by the point where we decrement the stack pointer. The i386 backend splits the sub into a sequence of two insns, a clobber of %rdi and a push of it (which register is chosen appears to be somewhat random). When called from the scheduler, init_alias_analysis incorrectly deduces that %rdi has a base value of (symbol_ref LC0). Although it tries to track which registers are argument registers that may hold a pointer, this information is lost when we encounter the clobber. The main part of the following patch is to modify that piece of code to also set reg_seen for argument registers when encountering a clobber. There are other problems in this area, one of which showed up while testing an earlier patch. i386/pr57003.c demonstrates that return values of FUNCTION_ARG_REGNO are not constant across the whole compilation; they are affected by the ms_abi attribute. This necessitates changing from a static_reg_base_value array to a per-function one. Once that is done, it's better to look at DECL_ARGUMENTS in a similar way to what combine.c does to identify the arguments of the specific function we're compiling rather than using the less specific FUNCTION_ARG_REGNO. Lastly, I modified a test for Pmode to use the valid_pointer_mode hook which should be a little more correct. Bootstrapped and tested on x86_64-linux. Ok? + if (!bitmap_bit_p (reg_seen, regno)) + { + /* We have to make an exception here for an argument +register, in case it is used before the clobber. */ + gcc_assert (new_reg_base_value[regno] == arg_base_value); + bitmap_set_bit (reg_seen, regno); + } please use gcc_checking_assert. (bah, I hate that subtle difference in sbitmap vs. bitmap not returning whether the bit changed in the modifying operations ...) + /* We have to make an exception here for an argument +register, in case it is used before the clobber. */ can you elaborate here _why_ we have to make that exception? Otherwise looks good to me (with my limited knowledge of this area). Thanks, Richard. Bernd
Re: [wide-int] int_traits tree
Richard Biener rguent...@suse.de writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your len = MAX (len, max_len) deals with that. It's len = MIN (len, max_len) Oops, yeah, I meant MIN, sorry. which looked suspicious to me, but with precision = xprecision precision can only be zero if xprecision is zero which looked to me like it cannot happen - or rather it should be fixed. Despite the comment above the code, I don't think this MIN is there for the zero-precision case. I think it's there to handle the new tree representation. The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the 0x00 0xff 0xff becomes 0xff. The 0x00 0xff is still there in the array, but not used. but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? The fundamental problem here is that we're trying to support two cases: (a) doing N-bit arithemtic in cases where the inputs have N bits (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits and are extended according to TYPE_SIGN. Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is 0x8000 regardless of whether the type is signed or unsigned. But if it's extended to 32-bits you get two different numbers 0x8000 and 0x8000, depending on the sign. So for one value of the precision parameter (i.e. xprecision), signed and unsigned constants produce the same number. But for another value of the precision parameter (those greater than xprecision), signed and unsigned constants produce different numbers. Yet at the moment the tree constant has a single representation. So I think the possibilities are: (1) Use the representation of an N-bit wide_int to store N-bit tree constants. Do work when extending them to wider wide_ints. (2) Use the representation of max_wide_int to store N-bit tree constants. Do work when creating an N-bit wide_int. (3) Store both representations in the tree constant. (4) Require all tree arithmetic to be done in the same way as rtl arithmetic, with explicit extensions. This gets rid of case (b). (5) Require all tree arithemtic to be done in wider wide_ints than the inputs, which
Re: [PATCH] Enhance ifcombine to recover non short circuit branches
On Thu, Oct 17, 2013 at 4:14 AM, Andrew Pinski pins...@gmail.com wrote: On Wed, Oct 16, 2013 at 2:12 AM, Zhenqiang Chen zhenqiang.c...@linaro.org wrote: Hi, The patch enhances ifcombine pass to recover some non short circuit branches. Basically, it will do the following transformation: Case 1: if (cond1) if (cond2) == if (cond1 cond2) Case 2: if (cond1) goto L1 if (cond2) goto L1 == if (cond1 || cond2) goto L1 Case 3: if (cond1) goto L1 else goto L2 L1: if (cond2) goto L2 == if (invert (cond1) || cond2) goto L2 Case 4: if (cond1) goto L1 if (cond2) goto L2 L1: == if (invert (cond1) cond2) goto L2 Bootstrap on X86-64 and ARM. Two new FAILs in regression tests: gcc.dg/uninit-pred-8_b.c gcc.dg/uninit-pred-9_b.c uninit pass should be enhanced to handle more complex conditions. Will submit a bug to track it and fix it later. Is it OK for trunk? I had a much simpler change which did basically the same from 4.7 (I can update it if people think this is a better approach). I like that more (note you can now use is_gimple_condexpr as predicate for force_gimple_operand). With that we should be able to kill the fold-const.c transform? Thanks, Richard. Thanks, Andrew Pinski 2012-09-29 Andrew Pinski apin...@cavium.com * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (ifcombine_ifandif): Handle cases where maybe_fold_and_comparisons fails, combining the branches anyways. (tree_ssa_ifcombine): Inverse the order of the basic block walk, increases the number of combinings. * Makefile.in (tree-ssa-ifcombine.o): Update dependencies. * testsuite/gcc.dg/tree-ssa/phi-opt-2.c: Expect zero ifs as the compiler produces a b now. * testsuite/gcc.dg/tree-ssa/phi-opt-9.c: Use a function call to prevent conditional move to be used. * testsuite/gcc.dg/tree-ssa/ssa-dom-thread-3.c: Remove check for one or more intermediate. Thanks! -Zhenqiang ChangeLog: 2013-10-16 Zhenqiang Chen zhenqiang.c...@linaro.org * fold-const.c (simple_operand_p_2): Make it global. * tree.h (simple_operand_p_2): Declare it. * tree-ssa-ifcombine.c: Include rtl.h and tm_p.h. (bb_has_overhead_p, generate_condition_node, ifcombine_ccmp): New functions. (ifcombine_fold_ifandif): New function, extracted from ifcombine_ifandif. (ifcombine_ifandif): Call ifcombine_ccmp. (tree_ssa_ifcombine_bb): Skip optimized bb. testsuite/ChangeLog 2013-10-16 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-1.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-2.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-3.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-4.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-5.c: New test case. * gcc.dg/tree-ssa/ssa-ifcombine-ccmp-6.c: New test case. * gcc.dg/tree-ssa/ssa-dom-thread-3.c: Updated.
Re: patch to canonize unsigned tree-csts
On Wed, 16 Oct 2013, Kenneth Zadeck wrote: On 10/15/2013 02:30 PM, Richard Sandiford wrote: Richard Sandiford rdsandif...@googlemail.com writes: if (small_prec) ; else if (precision == xprecision) while (len = 0 val[len - 1] == -1) len--; Err, len 0 obviously. you were only close.patch tested on ppc and committed as revision 203739. Index: gcc/tree.c === --- gcc/tree.c (revision 203701) +++ gcc/tree.c (working copy) @@ -1204,7 +1204,7 @@ wide_int_to_tree (tree type, const wide_ } wide_int cst = wide_int::from (pcst, prec, sgn); - unsigned int len = int (cst.get_len ()); + unsigned int len = cst.get_len (); unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED small_prec Index: gcc/tree.h === --- gcc/tree.h (revision 203701) +++ gcc/tree.h (working copy) @@ -5204,13 +5204,13 @@ wi::int_traits const_tree::decompose ( scratch[len - 1] = sext_hwi (val[len - 1], precision); return wi::storage_ref (scratch, len, precision); } - } - - if (precision xprecision + HOST_BITS_PER_WIDE_INT) - { - len = wi::force_to_size (scratch, val, len, xprecision, precision, UNS IGNED); - return wi::storage_ref (scratch, len, precision); - } + } + /* We have to futz here because a large unsigned int with +precision 128 may look (0x0 0x 0xF...) as a +tree-cst and as (0xF...) as a wide-int. */ + else if (precision == xprecision len == max_len) +while (len 1 val[len - 1] == (HOST_WIDE_INT)-1) + len--; } Err, that now undoes the extra zero word thing? Or was I confused about the previous code and this append extra zero word for MSB set unsigned constants? Richard.
[PATCH][AArch64] Implement %c output template
Hi all, This patch implements the %c output template for inline asm. The code for it is almost identical to the support in arm, so it's pretty straightforward. I've added a few compile tests for it as well. Tested aarch64-none-elf on a model. Ok for trunk? Thanks, Kyrill [gcc/] 2013-10-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_print_operand): Handle 'c'. [gcc/testsuite] 2013-10-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/c-output-template.c: New testcase. * gcc.target/aarch64/c-output-template-2.c: Likewise. * gcc.target/aarch64/c-output-template-3.c: Likewise.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b61b453..d2a3d49 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3426,6 +3426,32 @@ aarch64_print_operand (FILE *f, rtx x, char code) { switch (code) { +/* An integer or symbol address without a preceding # sign. */ +case 'c': + switch (GET_CODE (x)) + { + case CONST_INT: + fprintf (f, HOST_WIDE_INT_PRINT_DEC, INTVAL (x)); + break; + + case SYMBOL_REF: + output_addr_const (f, x); + break; + + case CONST: + if (GET_CODE (XEXP (x, 0)) == PLUS + GET_CODE (XEXP (XEXP (x, 0), 0)) == SYMBOL_REF) + { + output_addr_const (f, x); + break; + } + /* Fall through. */ + + default: + output_operand_lossage (Unsupported operand for code '%c', code); + } + break; + case 'e': /* Print the sign/zero-extend size as a character 8-b, 16-h, 32-w. */ { diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c new file mode 100644 index 000..16ff58d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-2.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +struct tracepoint { +int dummy; +int state; +}; +static struct tracepoint tp; + +void +test (void) +{ +__asm__ (@ %c0 : : i (tp)); +} + +/* { dg-final { scan-assembler @ tp } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c new file mode 100644 index 000..e332fe1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/c-output-template-3.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ + +struct tracepoint { +int dummy; +int state; +}; +static struct tracepoint tp; + +void +test (void) +{ +__asm__ (@ %c0 : : i (tp.state)); +} + +/* { dg-final { scan-assembler @ tp\\+4 } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/c-output-template.c b/gcc/testsuite/gcc.target/aarch64/c-output-template.c new file mode 100644 index 000..1b67c91 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/c-output-template.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ + +void +test (void) +{ +__asm__ (@ %c0 : : i (42)); +} + +/* { dg-final { scan-assembler @ 42 } } */
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
On Thu, Oct 17, 2013 at 12:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). False. gen_lowpart is perfectly safe post-reload. Indeed, taking the subreg of a hard register should arrive x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset); in simplify_subreg. Have you encountered some specific problem with gen_lowpart? Yes. Patch [8/8] contains testsuite for AVX-512. This pattern is covered as well. When trying to do so: (define_insn_and_split vec_extract_lo_v32hi [(set (match_operand:V16HI 0 nonimmediate_operand =v,m) (vec_select:V16HI (match_operand:V32HI 1 nonimmediate_operand vm,v) (parallel [(const_int 0) (const_int 1) (const_int 2) (const_int 3) (const_int 4) (const_int 5) (const_int 6) (const_int 7) (const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] TARGET_AVX512F !(MEM_P (operands[0]) MEM_P (operands[1])) # reload_completed [(const_int 0)] { rtx op1 = operands[1]; op1 = gen_lowpart (V16HImode, op1); emit_move_insn (operands[0], op1); DONE; }) I've got ICE, with following bt: #1 0x006f28d6 in gen_reg_rtx (mode=V32HImode) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:866 #2 0x0070243a in copy_to_reg (x=(reg:V32HI 21 xmm0 [163])) at /export/users/kyukhin/gcc/git/gcc/gcc/explow.c\ :606 #3 0x0091dfb8 in gen_lowpart_general (mode=V16HImode, x=optimized out) at /export/users/kyukhin/gcc/git/gcc/gcc/rtlhooks.c:50 #4 0x00ce16e8 in gen_split_4943 (curr_insn=optimized out, operands=0x16f6320) at /export/users/kyukhin/gcc/git/gcc/gcc/config/i386/sse.md:6329 #5 0x006f7865 in try_split (pat=(set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))), trial=(insn 48 46 49 6 (set (reg:V16HI 23 xmm2 [164]) (vec_select:V16HI (reg:V32HI 21 xmm0 [163]) (parallel [ (const_int 0 [0]) (const_int 1 [0x1]) (const_int 2 [0x2]) (const_int 3 [0x3]) (const_int 4 [0x4]) (const_int 5 [0x5]) (const_int 6 [0x6]) (const_int 7 [0x7]) (const_int 8 [0x8]) (const_int 9 [0x9]) (const_int 10 [0xa]) (const_int 11 [0xb]) (const_int 12 [0xc]) (const_int 13 [0xd]) (const_int 14 [0xe]) (const_int 15 [0xf]) ]))) /export/users/kyukhin/gcc/git/gcc/gcc/testsuite/gcc.target/i386/avx512f-vec-unpack.c:24 2151 {ve\ c_extract_lo_v32hi} (nil)), last=optimized out) at /export/users/kyukhin/gcc/git/gcc/gcc/emit-rtl.c:3467 So, we have: [rtlhooks.c:50]gen_lowpart_general () - [explow.c:606]copy_to_reg () - [emit-rtl.c:866]gen_reg_rtx (): gcc_assert (can_create_pseudo_p ()); Maybe the code in the pattern is buggy? Or is it a gen_lowpart? I think that original approach with gen_rtx_REG is correct and follows established practice in sse.md (please grep for gen_reg_RTX in sse.md). If this approach is necessary due to the deficiency of gen_lowpart, then the fix to gen_lowpart should be proposed in a follow-up patch. Uros.
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your len = MAX (len, max_len) deals with that. It's len = MIN (len, max_len) Oops, yeah, I meant MIN, sorry. which looked suspicious to me, but with precision = xprecision precision can only be zero if xprecision is zero which looked to me like it cannot happen - or rather it should be fixed. Despite the comment above the code, I don't think this MIN is there for the zero-precision case. I think it's there to handle the new tree representation. The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the 0x00 0xff 0xff becomes 0xff. The 0x00 0xff is still there in the array, but not used. but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? The fundamental problem here is that we're trying to support two cases: (a) doing N-bit arithemtic in cases where the inputs have N bits (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits and are extended according to TYPE_SIGN. Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is 0x8000 regardless of whether the type is signed or unsigned. But if it's extended to 32-bits you get two different numbers 0x8000 and 0x8000, depending on the sign. So for one value of the precision parameter (i.e. xprecision), signed and unsigned constants produce the same number. But for another value of the precision parameter (those greater than xprecision), signed and unsigned constants produce different numbers. Yet at the moment the tree constant has a single representation. But a correctly extended one, up to its len! (as opposed to RTL) So I think the possibilities are: (1) Use the representation of an N-bit wide_int to store N-bit tree constants. Do work when
Re: Ping Re: [gomp4] Dumping gimple for offload.
Ping. On 09 Oct 19:12, Ilya Tocar wrote: Ping. On 03 Oct 20:05, Ilya Tocar wrote: On 26 Sep 21:21, Ilya Tocar wrote: On 25 Sep 15:48, Richard Biener wrote: On Wed, Sep 25, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: On 24 Sep 11:02, Richard Biener wrote: On Mon, Sep 23, 2013 at 3:29 PM, Ilya Tocar tocarip.in...@gmail.com wrote: thus consider assigning the section name in a different place. Richard. What do you mean by different place? I can add global dumping_omp_target variable to choose correct name, depending on it's value (patch below). Is it better? More like passing down a different abstraction, like for @@ -907,9 +907,15 @@ output_symtab (void) { symtab_node node = lto_symtab_encoder_deref (encoder, i); if (cgraph_node *cnode = dyn_cast cgraph_node (node)) -lto_output_node (ob, cnode, encoder); + { + if (!dumping_omp_target || lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) + lto_output_node (ob, cnode, encoder); + } else -lto_output_varpool_node (ob, varpool (node), encoder); + if (!dumping_omp_target || lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) + lto_output_varpool_node (ob, varpool (node), encoder); } have the symtab encoder already not contain the varpool nodes you don't need. And instead of looking up attributes, mark the symtab node with a flag. Good idea! I've tried creating 2 encoders, and adding only nodes with omp declare target attribute in omp case. There is still some is_omp passing to control lto_set_symtab_encoder_in_partition behaivor, because i think it's better than global var. What do you think? Updated version of the patch. I've checked that it doesn't break lto on SPEC 2006. Streaming for omp is enabled by -fopnemp flag. Works with and without enabled lto. Ok for gomp4 branch? --- gcc/cgraphunit.c | 15 +-- gcc/ipa-inline-analysis.c | 2 +- gcc/lto-cgraph.c | 15 ++- gcc/lto-streamer.c| 5 +++-- gcc/lto-streamer.h| 10 -- gcc/lto/lto-partition.c | 4 ++-- gcc/passes.c | 12 ++-- gcc/tree-pass.h | 2 +- 8 files changed, 44 insertions(+), 21 deletions(-) diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c index 1644ca9..d595475 100644 --- a/gcc/cgraphunit.c +++ b/gcc/cgraphunit.c @@ -2016,7 +2016,18 @@ ipa_passes (void) passes-all_lto_gen_passes); if (!in_lto_p) -ipa_write_summaries (); +{ + if (flag_openmp) + { + section_name_prefix = OMP_SECTION_NAME_PREFIX; + ipa_write_summaries (true); + } + if (flag_lto) + { + section_name_prefix = LTO_SECTION_NAME_PREFIX; + ipa_write_summaries (false); + } +} if (flag_generate_lto) targetm.asm_out.lto_end (); @@ -2107,7 +2118,7 @@ compile (void) cgraph_state = CGRAPH_STATE_IPA; /* If LTO is enabled, initialize the streamer hooks needed by GIMPLE. */ - if (flag_lto) + if (flag_lto || flag_openmp) lto_streamer_hooks_init (); /* Don't run the IPA passes if there was any error or sorry messages. */ diff --git a/gcc/ipa-inline-analysis.c b/gcc/ipa-inline-analysis.c index ba6221e..4420213 100644 --- a/gcc/ipa-inline-analysis.c +++ b/gcc/ipa-inline-analysis.c @@ -3721,7 +3721,7 @@ inline_generate_summary (void) /* When not optimizing, do not bother to analyze. Inlining is still done because edge redirection needs to happen there. */ - if (!optimize !flag_lto !flag_wpa) + if (!optimize !flag_lto !flag_wpa !flag_openmp) return; function_insertion_hook_holder = diff --git a/gcc/lto-cgraph.c b/gcc/lto-cgraph.c index 952588d..4a7d179 100644 --- a/gcc/lto-cgraph.c +++ b/gcc/lto-cgraph.c @@ -236,8 +236,13 @@ lto_symtab_encoder_in_partition_p (lto_symtab_encoder_t encoder, void lto_set_symtab_encoder_in_partition (lto_symtab_encoder_t encoder, -symtab_node node) +symtab_node node, bool is_omp) { + /* Ignore non omp target nodes for omp case. */ + if (is_omp !lookup_attribute (omp declare target, + DECL_ATTRIBUTES (node-symbol.decl))) +return; + int index = lto_symtab_encoder_encode (encoder, (symtab_node)node); encoder-nodes[index].in_partition = true; } @@ -760,7 +765,7 @@ add_references (lto_symtab_encoder_t encoder, ignored by the
Re: [wide-int] int_traits tree
On 10/17/2013 04:46 AM, Richard Biener wrote: the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is The issue is not that the rules are different between the different flavors of wide int, it is that the circumstances are different. The rule is that the only bits above the precision that exist are if the precision is not an even multiple of the HBPWI. In that case, the bits are always an extension of the sign bits. max_wide_int and addr_wide_int have large enough precisions so that it is impossible to ever generate an unsigned number on the target that is large enough to ever run against the precision.However, for the fixed precision case, you can, and at least on the ppc, you do, generate unsigned numbers that are big enough to have to be recanonicalized like this. inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref pcst) { ... unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED small_prec (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. It will get a comment. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst = 1 be optimized to cst.len == 1 cst.val[0] = 1. It expands to L27: MEM[(long int *)D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct wide_int_ref_storage *)D.50698].scratch; MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)cst + 260B]; if (_277 = 64) goto bb 42; else goto bb 43; bb 42: xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 yl_496; goto bb 44; bb 43: _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage *)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage *)cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is 1, no? bb 44: # _504 = PHI _497(42), _503(43) D.50698 ={v} {CLOBBER}; if (_504 != 0) goto bb 45; else goto bb 46; bb 45: pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B]; goto bb 229 (L131); bb 46: _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0); ix_66 = (int) _65; goto bb 91; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 cst.to_uhwi () 1) ? i find this ugly, but i see where you are coming from. The problem is that both you and i know that the len has to be 1, but the optimizer does not. This is a case where I think that we made a mistake getting rid of the wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return (len == 1 val[0] ==1) and i think that is much nicer than what you put there. On the other hand, it seem that a person more skilled than i am with c++ could specialize the comparisons with an integer constant, since i believe that that constant must fit in one hwi (I am a little concerned about large unsigned constants). Thanks, Richard
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 04:46 AM, Richard Biener wrote: the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is The issue is not that the rules are different between the different flavors of wide int, it is that the circumstances are different. The rule is that the only bits above the precision that exist are if the precision is not an even multiple of the HBPWI. In that case, the bits are always an extension of the sign bits. max_wide_int and addr_wide_int have large enough precisions so that it is impossible to ever generate an unsigned number on the target that is large enough to ever run against the precision.However, for the fixed precision case, you can, and at least on the ppc, you do, generate unsigned numbers that are big enough to have to be recanonicalized like this. inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref pcst) { ... unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED small_prec (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. It will get a comment. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst = 1 be optimized to cst.len == 1 cst.val[0] = 1. It expands to L27: MEM[(long int *)D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct wide_int_ref_storage *)D.50698].scratch; MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)cst + 260B]; if (_277 = 64) goto bb 42; else goto bb 43; bb 42: xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 yl_496; goto bb 44; bb 43: _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage *)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage *)cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is 1, no? bb 44: # _504 = PHI _497(42), _503(43) D.50698 ={v} {CLOBBER}; if (_504 != 0) goto bb 45; else goto bb 46; bb 45: pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B]; goto bb 229 (L131); bb 46: _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0); ix_66 = (int) _65; goto bb 91; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 cst.to_uhwi () 1) ? i find this ugly, but i see where you are coming from. The problem is that both you and i know that the len has to be 1, but the optimizer does not. This is a case where I think that we made a mistake getting rid of the wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return (len == 1 val[0] ==1) and i think that is much nicer than what you put there. On the other hand, it seem that a person more
Re: [wide-int] int_traits tree
Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the 0x00 0xff 0xff becomes 0xff. The 0x00 0xff is still there in the array, but not used. but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? The fundamental problem here is that we're trying to support two cases: (a) doing N-bit arithemtic in cases where the inputs have N bits (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits and are extended according to TYPE_SIGN. Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is 0x8000 regardless of whether the type is signed or unsigned. But if it's extended to 32-bits you get two different numbers 0x8000 and 0x8000, depending on the sign. So for one value of the precision parameter (i.e. xprecision), signed and unsigned constants produce the same number. But for another value of the precision parameter (those greater than xprecision), signed and unsigned constants produce different numbers. Yet at the moment the tree constant has a single representation. But a correctly extended one, up to its len! (as opposed to RTL) But extending the precision can change the right value of len. Take the same example with 16-bit HWIs. In wide_int terms, and with the original tree representation, the constant is a single HWI: 0x8000 with len 1. And in case (a) -- where we're asking for a 16-bit wide_int -- this single HWI is all we want. The signed and unsigned constants give the same wide_int. But the same constant extended to 32 bits and left uncompressed would be two HWIs: 0x 0x8000 for unsigned constants 0x 0x8000 for signed constants Compressed according to the sign scheme they are: 0x 0x8000 (len == 2) for unsigned constants 0x8000 (len == 1) for signed constants which is also the new tree representation. So the unsigned case is different for (a) and (b). So I think the possibilities are: (1) Use the representation of an N-bit wide_int to store N-bit tree constants. Do work when extending them to wider wide_ints. (2) Use the representation of max_wide_int to store N-bit tree constants. Do work when creating an N-bit wide_int. (3) Store both representations in the tree constant. (4) Require all tree arithmetic to be done in the same way as rtl arithmetic, with explicit extensions. This gets rid of case (b). (5) Require all tree arithemtic to be done in wider wide_ints than the inputs, which I think is what you preferred. This gets rid of case (a). (6) Allow the same wide_int constant to have several different representations. Remember that this is to some extent what Kenny's original
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. Richard. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the 0x00 0xff 0xff becomes 0xff. The 0x00
Re: [wide-int] int_traits tree
On 10/17/2013 07:49 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 04:46 AM, Richard Biener wrote: the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. AFAIK that is what the code produces, but now Kenny says this is only for some kind of wide-ints but not all? That is, why is The issue is not that the rules are different between the different flavors of wide int, it is that the circumstances are different. The rule is that the only bits above the precision that exist are if the precision is not an even multiple of the HBPWI. In that case, the bits are always an extension of the sign bits. max_wide_int and addr_wide_int have large enough precisions so that it is impossible to ever generate an unsigned number on the target that is large enough to ever run against the precision.However, for the fixed precision case, you can, and at least on the ppc, you do, generate unsigned numbers that are big enough to have to be recanonicalized like this. inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? I would like to see us move in that direction (given that the tree rep of INTEGER_CST has transitioned to variable-length already). Btw, code such as tree wide_int_to_tree (tree type, const wide_int_ref pcst) { ... unsigned int small_prec = prec (HOST_BITS_PER_WIDE_INT - 1); bool recanonize = sgn == UNSIGNED small_prec (prec + HOST_BITS_PER_WIDE_INT - 1) / HOST_BITS_PER_WIDE_INT == len; definitely needs a comment. I would have thought _all_ unsigned numbers need re-canonicalization. Well, maybe only if we're forcing the extra zeros. It will get a comment. [This function shows another optimization issue: case BOOLEAN_TYPE: /* Cache false or true. */ limit = 2; if (wi::leu_p (cst, 1)) ix = cst.to_uhwi (); I would have expected cst = 1 be optimized to cst.len == 1 cst.val[0] = 1. It expands to L27: MEM[(long int *)D.50698 + 16B] = 1; MEM[(struct wide_int_ref_storage *)D.50698] = MEM[(struct wide_int_ref_storage *)D.50698].scratch; MEM[(struct wide_int_ref_storage *)D.50698 + 8B] = 1; MEM[(struct wide_int_ref_storage *)D.50698 + 12B] = 32; _277 = MEM[(const struct wide_int_storage *)cst + 260B]; if (_277 = 64) goto bb 42; else goto bb 43; bb 42: xl_491 = zext_hwi (1, 32); // ok, checking enabled and thus out-of-line _494 = MEM[(const long int *)cst]; _495 = (long unsigned int) _494; yl_496 = zext_hwi (_495, _277); _497 = xl_491 yl_496; goto bb 44; bb 43: _503 = wi::ltu_p_large (MEM[(struct wide_int_ref_storage *)D.50698].scratch, 1, 32, MEM[(const struct wide_int_storage *)cst].val, len_274, _277); this keeps D.50698 and cst un-SRAable - inline storage is problematic for this reason. But the representation should guarantee the compare with a low precision (32 bit) constant is evaluatable at compile-time if len of the larger value is 1, no? bb 44: # _504 = PHI _497(42), _503(43) D.50698 ={v} {CLOBBER}; if (_504 != 0) goto bb 45; else goto bb 46; bb 45: pretmp_563 = MEM[(const struct wide_int_storage *)cst + 256B]; goto bb 229 (L131); bb 46: _65 = generic_wide_intwide_int_storage::to_uhwi (cst, 0); ix_66 = (int) _65; goto bb 91; The question is whether we should try to optimize wide-int for such cases or simply not use wi:leu_p (cst, 1) but rather if (cst.fits_uhwi_p () == 1 cst.to_uhwi () 1) ? i find this ugly, but i see where you are coming from. The problem is that both you and i know that the len has to be 1, but the optimizer does not. This is a case where I think that we made a mistake getting rid of the wi::one_p, wi::zero_p and wi::minus_one_p. The internals of one_p were return (len == 1 val[0] ==1) and i think that is much nicer than what you put there. On the other hand, it seem that a person more skilled than i am with c++ could specialize the comparisons with an integer constant, since i believe that that constant must fit in
RFA: Change behaviour of %* in spec strings
Hi Guys, I would like to make a change to the way %* behaves in spec strings. Currently whenever %* is encountered the text that was previously matched is substituted, followed by a space. Thus %{foo=*:bar%*baz} would match -foo=4 and insert 'bar4 baz'. I would like to change this so that the space is only inserted if %* is at the end of a spec sequence. (In all of the strings currently built in to gcc this is the case, so my patch will not change current behaviour). The motivation is that I want to construct a pathname based on a command line option using a spec string like this: %{mmcu=*:--script=%*/memory.ld} So that if the user invokes -mmcu=msp430f2617 then gcc will generate: --script=msp430f2617/memory.ld A spec string like this however: %{mmcu=*:--script=%*}/memory.ld would match the current behaviour and generate: --script=msp430f2617 /memory.ld As a secondary feature of the patch I have also updated the documentation to explicitly state when a space will be inserted into the generated text. I have tested the patch and found no regressions using an i686-pc-liunx-gnu toolchain and an msp430-elf toolchain. OK to apply ? Cheers Nick gcc/ChangeLog 2013-10-17 Nick Clifton ni...@redhat.com * gcc.c (do_spec_1): Do not insert a space after a %* substitution unless it is the last part of a spec substring. * doc/invoke.texi (Spec Files): Document space insertion behaviour of %*. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 203750) +++ gcc/doc/invoke.texi (working copy) @@ -10929,6 +10929,22 @@ for each matching switch, with the @code{%*} replaced by the part of that switch matching the @code{*}. +If @code{%*} appears as the last part of a spec sequence then a space +will added after the end of the last substitution. If there is more +text in the sequence however then a space will not be generated. This +allows the @code{%*} substitution to be used as part of a larger +string. For example, a spec string like this: + +@smallexample +%@{mcu=*:--script=%*/memory.ld@} +@end smallexample + +when matching an option like @code{-mcu=newchip} will produce: + +@smallexample +--script=newchip/memory.ld +@end smallexample + @item %@{.@code{S}:@code{X}@} Substitutes @code{X}, if processing a file with suffix @code{S}. Index: gcc/gcc.c === --- gcc/gcc.c (revision 203750) +++ gcc/gcc.c (working copy) @@ -388,7 +388,8 @@ %2process CC1PLUS_SPEC as a spec. %*substitute the variable part of a matched option. (See below.) Note that each comma in the substituted string is replaced by - a single space. + a single space. A space is appended after the last substitution + unless there is more text in current sequence. %Sremove all occurrences of -S from the command line. Note - this command is position dependent. % commands in the spec string before this one will see -S, % commands in the @@ -422,7 +423,9 @@ once, no matter how many such switches appeared. However, if %* appears somewhere in X, then X will be substituted once for each matching switch, with the %* replaced by the - part of that switch that matched the '*'. + part of that switch that matched the '*'. A space will be + appended after the last substitution unless there is more + text in current sequence. %{.S:X} substitutes X, if processing a file with suffix S. %{!.S:X} substitutes X, if NOT processing a file with suffix S. %{,S:X} substitutes X, if processing a file which will use spec S. @@ -5375,7 +5378,17 @@ { if (soft_matched_part[0]) do_spec_1 (soft_matched_part, 1, NULL); - do_spec_1 ( , 0, NULL); + /* Only insert a space after the substitution if it is at the + end of the current sequence. So if: + +%{foo=*:bar%*}%{foo=*:one%*two} + + matches -foo=hello then it will produce: + +barhello onehellotwo + */ + if (*p == 0 || *p == '}') + do_spec_1 ( , 0, NULL); } else /* Catch the case where a spec string contains something like
Re: [wide-int] int_traits tree
On 10/17/2013 08:29 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. it is until someone comes up with a port that this will not work for, then they will have to add some machinery to sniff the port and make this bigger.I am hoping to be retired by the time this happens. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). It is now asserted for (as of a few days ago). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep. I think that richard needs to work this out. I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. I am not following you here. In trees, the msb is effectively a sign bit, even for unsigned numbers because we add that extra block. but inside of wide int, we do not add extra blocks beyond the precision. That would be messy for a lot of other reasons.
Re: [wide-int] int_traits tree
On 10/17/2013 07:30 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Kenneth Zadeck zad...@naturalbridge.com writes: As mentioned in my message yesterday, I thought your new way of canonising unsigned tree constants meant that there was always an upper zero bit. Is that right? i believe this is correct. If so, xprecision precision is a no-op, because the number always has the right form for wider precisions. The only difficult case is xprecision == precision, since then we need to peel off any upper -1 HWIs. say my HWI size is 8 bits (just to keep from typing a million 'f's. if i have a 16 bit unsigned number that is all 1s in the tree world it is 3 hwis 0x00 0xff 0xff. but inside regular wide int, it would take 1 wide int whose value is 0xff. inside of max it would be the same as the tree, but then the test precision xprecision + hbpwi never kicks in because precision is guaranteed to be huge. inside of addr_wide_int, i think we tank with the assertion. It should be OK for addr_wide_int too. The precision still fits 2 HWIs. The initial length is greater than the maximum length of an addr_wide_int, but your len = MAX (len, max_len) deals with that. It's len = MIN (len, max_len) Oops, yeah, I meant MIN, sorry. which looked suspicious to me, but with precision = xprecision precision can only be zero if xprecision is zero which looked to me like it cannot happen - or rather it should be fixed. Despite the comment above the code, I don't think this MIN is there for the zero-precision case. I think it's there to handle the new tree representation. The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It was changed because you asked me to change it. However, you end up with special cases either way. the case actually comes up on the ppc because they do a lot of 128 bit math.I think i got thru the x86-64 without noticing this. Well, it'd be suspicious if we're directly using 128-bit numbers in addr_wide_int. The justification for the assertion was that we should explicitly truncate to addr_wide_int when deliberately ignoring upper bits, beyond bit or byte address width. 128 bits definitely falls into that category on powerpc. My question is whether with 8-bit HWI 0x00 0xff 0xff is a valid wide-int value if it has precision 16. No, for a 16-bit wide_int it should be 0xff. 0x00 0xff 0xff is correct for any wide_int wider than 16 bits though. AFAIK that is what the code produces, In which case? This is: precision == 16 xprecision == 16 len == 3 max_len == 2 The MIN trims the len to 2 and then the loop Kenny added trims it again to 1, so the 0x00 0xff 0xff becomes 0xff. The 0x00 0xff is still there in the array, but not used. but now Kenny says this is only for some kind of wide-ints but not all? That is, why is inline wi::storage_ref wi::int_traits const_tree::decompose (HOST_WIDE_INT *scratch, unsigned int precision, const_tree x) { unsigned int len = TREE_INT_CST_NUNITS (x); const HOST_WIDE_INT *val = (const HOST_WIDE_INT *) TREE_INT_CST_ELT (x, 0); return wi::storage_ref (val, len, precision); } not a valid implementation together with making sure that the INTEGER_CST tree rep has that extra word of zeros if required? The fundamental problem here is that we're trying to support two cases: (a) doing N-bit arithemtic in cases where the inputs have N bits (b) doing N-bit arithmetic in cases where the inputs have fewer than N bits and are extended according to TYPE_SIGN. Let's assume 32-bit HWIs. The 16-bit (4-hex-digit) constant 0x8000 is 0x8000 regardless of whether the type is signed or unsigned. But if it's extended to 32-bits you get two different numbers 0x8000 and 0x8000, depending on the sign. So for one value of the precision parameter (i.e. xprecision), signed and unsigned constants produce the same number. But for another value of the precision parameter (those greater than xprecision), signed and unsigned constants produce different numbers. Yet at the moment the tree constant has a single representation. But a correctly extended one, up to its len! (as opposed to RTL) So I think the possibilities are: (1) Use the representation of an N-bit wide_int to store N-bit tree constants. Do work when extending them to wider
RE: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
On Wed, 16 Oct 2013 22:50:20, DJ Delorie wrote: Not all of them can work, because they describe something that can't be done in hardware. For example, the first test has an incomplete bitfield - the fields do not completely describe an int so the structure is smaller (one byte, according to sizeof()) than the access type (2-8 bytes). where in the C standard did you read the requirement that every bit field must be complete? (This is a serious question). extern struct { volatile unsigned int b : 1; } bf3; on my compiler this structure occupies 4 bytes. and it is aligned at 4 bytes. That is OK for me and AAPCS. But the access bf3.b=1 is SI mode with Sandra's patch (part 1, which just obeys the AAPCS and does nothing else) and QI mode without this patch, which is just a BUG. I am quite surprised how your target manages to avoid it? It is as Sandra said, at least on ARM -fstrict-volatile-bitfields does not function at all. And the C++11 memory model wins all the time. Looking through the tests, most of them combine packed with mismatched types. IMHO, those tests are invalid. I dont think so. They are simply packed. And volatile just says that the value may be changed by a different thread. It has a great impact on loop optimizations. Either way, if -fstrict-volatile-bitfields does not do what it's supposed to do, the correct action is to fix it - not to disable it on targets that rely on it for correct operation. Agreed. That is the number one priority here. ... I've not objected to fixing -fstrict-volatile-bitfields, or making the -fno-strict-volatile-bitfields case match the standard. I've only objected to breaking my targets by making that flag not the default. Fine. Why cant we just get this fixed? Bernd.
Re: [PATCH][i386]Fix PR 57756
On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote: How is Google going to change its patch commit policies to ensure that this does not happen again? There is nothing to change. Google follows http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed the oversight and if there is any other fallout from his patch, he will address it. I don't see anything out of the ordinary here. Diego.
[PATCH] Fixup handling of zero-precision integer types
These two patches try to fix handling of zero-precision integer types (created by struct { int : 0; };). Currently they get assigned TYPE_MIN/MAX_VALUEs but clearly a zero-precision integer type does not have a single valid value. Of course for these nothing should look at TYPE_MIN/MAX_VALUE you'd think ... Clearly you are wrong. See the patch below which bootstrapped and tested on x86_64-unknown-linux-gnu for all languages in the attempt to apply the 2nd patch (still testing). Objections? Thanks, Richard. Index: gcc/stor-layout.c === --- gcc/stor-layout.c (revision 203744) +++ gcc/stor-layout.c (working copy) @@ -2054,7 +2054,7 @@ layout_type (tree type) case BOOLEAN_TYPE: /* Used for Java, Pascal, and Chill. */ if (TYPE_PRECISION (type) == 0) - TYPE_PRECISION (type) = 1; /* default to one byte/boolean. */ + gcc_unreachable (); /* ... fall through ... */ @@ -2062,7 +2062,7 @@ layout_type (tree type) case ENUMERAL_TYPE: if (TREE_CODE (TYPE_MIN_VALUE (type)) == INTEGER_CST tree_int_cst_sgn (TYPE_MIN_VALUE (type)) = 0) - TYPE_UNSIGNED (type) = 1; + gcc_assert (TYPE_UNSIGNED (type)); SET_TYPE_MODE (type, smallest_mode_for_size (TYPE_PRECISION (type), MODE_INT)); 2013-10-17 Richard Biener rguent...@suse.de * stor-layout.c (layout_type): Do not change TYPE_PRECISION or TYPE_UNSIGNED of integral types. (set_min_and_max_values_for_integral_type): Leave TYPE_MIN/MAX_VALUE NULL_TREE for zero-precision integral types. Index: gcc/stor-layout.c === --- gcc/stor-layout.c (revision 203744) +++ gcc/stor-layout.c (working copy) @@ -2052,18 +2052,9 @@ layout_type (tree type) of the language-specific code. */ gcc_unreachable (); -case BOOLEAN_TYPE: /* Used for Java, Pascal, and Chill. */ - if (TYPE_PRECISION (type) == 0) - TYPE_PRECISION (type) = 1; /* default to one byte/boolean. */ - - /* ... fall through ... */ - +case BOOLEAN_TYPE: case INTEGER_TYPE: case ENUMERAL_TYPE: - if (TREE_CODE (TYPE_MIN_VALUE (type)) == INTEGER_CST - tree_int_cst_sgn (TYPE_MIN_VALUE (type)) = 0) - TYPE_UNSIGNED (type) = 1; - SET_TYPE_MODE (type, smallest_mode_for_size (TYPE_PRECISION (type), MODE_INT)); TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type))); @@ -2520,6 +2511,12 @@ set_min_and_max_values_for_integral_type tree min_value; tree max_value; + /* For bitfields with zero width we end up creating integer types + with zero precision. Don't assign any minimum/maximum values + to those types, they don't have any valid value. */ + if (precision 1) +return; + if (is_unsigned) { min_value = build_int_cst (type, 0);
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 08:29 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. it is until someone comes up with a port that this will not work for, then they will have to add some machinery to sniff the port and make this bigger. I am hoping to be retired by the time this happens. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). It is now asserted for (as of a few days ago). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep.I think that richard needs to work this out. I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. I am not following you here. In trees, the msb is effectively a sign bit, even for unsigned numbers because we add that extra block. but inside of wide int, we do not add extra blocks beyond the precision. That would be messy for a lot of other reasons.
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 07:30 AM, Richard Biener wrote: Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It was changed because you asked me to change it. However, you end up with special cases either way. It was a mis-communication - I thought you added this to the wide-int rep to be able to use both tree and RTX reps directly. Richard.
Re: [wide-int] int_traits tree
Richard Biener rguent...@suse.de writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). No, we allow precision == xprecision for addr_wide_int and max_wide_int too. But all we do in that case is trim the length. Requiring precision xprecision was option (5) from my message. Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } Are you saying that: max_wide_int x = (tree) y; should just copy a pointer? Then what about: max_wide_int z = x; Should that just copy a pointer too, or is it OK for the second constructor to do a copy? In most cases we only use the fixed_wide_int to store the result of the computation. We don't use the above constructor for things like: max_wide_int x = wi::add ((tree) y, (int) z); wi::add uses y and z without going through max_wide_int. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider width. The branch uses wide_int for RTL quite naturally, and changing it wouldn't help do anything with the tree decompose routine. Thanks, Richard
Re: [wide-int] int_traits tree
Kenneth Zadeck zad...@naturalbridge.com writes: it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep. I think that richard needs to work this out. Hmm, you mean the patch to introduce all the wi:: stuff? Which cases do you mean? That patch didn't change the HWI representation (and therefore when a copy was needed by the to_shwi*/decompose routines). And it got rid of quite a few *_wide_int constructions, things like: - wide_int wi = (max_wide_int (prev_value) -.add (1, sgn, overflowed)); + wide_int wi = wi::add (prev_value, 1, sgn, overflowed); Thanks, Richard
[Ada] Handle constraint error in vector when large index type
To compute the range of values in the generic actual Index_Type for a vector, there is a series of compile-time tests to determine which of Index_Type or Count_Type to use for intermediate values. In the case of an Index_Type comprising a range of values larger than in Count_Type, the number of values was computed using Index_Type, and then converted to Count_Type. However, even though the computation of the result does not overflow, the conversion of the result to type Count_Type will fail when the value is greater than Count_Type'Last. The solution is to first test whether the result is less than Count_Type'Last, and only then convert the result. Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Matthew Heaney hea...@adacore.com * a-convec.adb, a-coinve.adb, a-cobove.adb (Insert, Insert_Space): Inspect value range before converting type. Index: a-coinve.adb === --- a-coinve.adb(revision 203568) +++ a-coinve.adb(working copy) @@ -1734,7 +1734,22 @@ -- worry about if No_Index were less than 0, but that case is -- handled above). -Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +if Index_Type'Last - No_Index = + Count_Type'Pos (Count_Type'Last) +then + -- We have determined that range of Index_Type has at least as + -- many values as in Count_Type, so Count_Type'Last is the + -- maximum number of items that are allowed. + + Max_Length := Count_Type'Last; + +else + -- The range of Index_Type has fewer values than in Count_Type, + -- so the maximum number of items is computed from the range of + -- the Index_Type. + + Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +end if; end if; elsif Index_Type'First = 0 then @@ -2504,7 +2519,22 @@ -- worry about if No_Index were less than 0, but that case is -- handled above). -Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +if Index_Type'Last - No_Index = + Count_Type'Pos (Count_Type'Last) +then + -- We have determined that range of Index_Type has at least as + -- many values as in Count_Type, so Count_Type'Last is the + -- maximum number of items that are allowed. + + Max_Length := Count_Type'Last; + +else + -- The range of Index_Type has fewer values than in Count_Type, + -- so the maximum number of items is computed from the range of + -- the Index_Type. + + Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +end if; end if; elsif Index_Type'First = 0 then Index: a-cobove.adb === --- a-cobove.adb(revision 203568) +++ a-cobove.adb(working copy) @@ -1227,7 +1227,22 @@ -- worry about if No_Index were less than 0, but that case is -- handled above). -Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +if Index_Type'Last - No_Index = + Count_Type'Pos (Count_Type'Last) +then + -- We have determined that range of Index_Type has at least as + -- many values as in Count_Type, so Count_Type'Last is the + -- maximum number of items that are allowed. + + Max_Length := Count_Type'Last; + +else + -- The range of Index_Type has fewer values than in Count_Type, + -- so the maximum number of items is computed from the range of + -- the Index_Type. + + Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +end if; end if; elsif Index_Type'First = 0 then @@ -1685,7 +1700,22 @@ -- worry about if No_Index were less than 0, but that case is -- handled above). -Max_Length := Count_Type'Base (Index_Type'Last - No_Index); +if Index_Type'Last - No_Index = + Count_Type'Pos (Count_Type'Last) +then + -- We have determined that range of Index_Type has at least as + -- many values as in Count_Type, so Count_Type'Last is the + -- maximum number of items that are allowed. + + Max_Length := Count_Type'Last; + +else + -- The range of Index_Type has fewer values than in Count_Type, + -- so the maximum number of items is computed from the range of + -- the Index_Type. + + Max_Length := Count_Type'Base (Index_Type'Last -
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). No, we allow precision == xprecision for addr_wide_int and max_wide_int too. But all we do in that case is trim the length. Requiring precision xprecision was option (5) from my message. Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } Are you saying that: max_wide_int x = (tree) y; should just copy a pointer? Then what about: No, it should do a copy. But only one - which it does now with the append-extra-zeros-in-tree-rep, I was just thinking how to avoid it when not doing that which means adjusting the rep in the copy we need to do anyway. If fixed_wide_int_storage is really the only reason to enlarge the tree rep. max_wide_int z = x; Should that just copy a pointer too, or is it OK for the second constructor to do a copy? In most cases we only use the fixed_wide_int to store the result of the computation. We don't use the above constructor for things like: max_wide_int x = wi::add ((tree) y, (int) z); wi::add uses y and z without going through max_wide_int. Yes, I am aware of that. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider
[Ada] Illegal constituent in state refinement
This patch modifies the logic in the analysis of aspect/pragma Refined_State to catch a case where a visible variable is used as a constituent in a state refinement. -- Source -- -- pack.ads package Pack with Abstract_State = State is Var : Integer; procedure Proc (Formal : Integer) with Global = (Output = State); end Pack; -- pack.adb package body Pack with Refined_State = (State = Var) is procedure Proc (Formal : Integer) with Refined_Global = (Output = Var) is begin null; end Proc; end Pack; -- Compilation and output -- $ gcc -c -gnatd.V pack.adb pack.adb:2:35: cannot use Var in refinement, constituent is not a hidden state of package Pack pack.adb:5:11: useless refinement, subprogram Proc does not mention abstract state with visible refinement Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Hristian Kirtchev kirtc...@adacore.com * sem_prag.adb (Analyze_Constituent): Move the check concerning option Part_Of to routine Check_Matching_Constituent. (Check_Matching_Constituent): Verify that an abstract state that acts as a constituent has the prope Part_Op option in its aspect/pragma Abstract_State. Account for the case when a constituent comes from a private child or private sibling. * sem_util.ads, sem_util.adb (Is_Child_Or_Sibling): New routine. Index: sem_prag.adb === --- sem_prag.adb(revision 203755) +++ sem_prag.adb(working copy) @@ -21439,52 +21439,75 @@ Error_Msg_NE (duplicate use of constituent , Constit, Constit_Id); return; - end if; - -- The related package has no hidden states, nothing to match. - -- This case arises when the constituents are states coming - -- from a private child. + -- A state can act as a constituent only when it is part of + -- another state. This relation is expressed by option Part_Of + -- of pragma Abstract_State. - if No (Hidden_States) then - return; + elsif Ekind (Constit_Id) = E_Abstract_State then + if not Is_Part_Of (Constit_Id, State_Id) then + Error_Msg_Name_1 := Chars (State_Id); + Error_Msg_NE + (state is not a valid constituent of ancestor + state %, Constit, Constit_Id); + return; + + -- The constituent has the proper Part_Of option, but may + -- not appear in the immediate hidden state of the related + -- package. This case arises when the constituent comes from + -- a private child or a private sibling. Recognize these + -- scenarios to avoid generating a bogus error message. + + elsif Is_Child_Or_Sibling + (Pack_1= Scope (State_Id), + Pack_2= Scope (Constit_Id), + Private_Child = True) + then + return; + end if; end if; -- Inspect the hidden states of the related package looking for -- a match. - State_Elmt := First_Elmt (Hidden_States); - while Present (State_Elmt) loop + if Present (Hidden_States) then + State_Elmt := First_Elmt (Hidden_States); + while Present (State_Elmt) loop - -- A valid hidden state or variable participates in a - -- refinement. Add the constituent to the list of processed - -- items to aid with the detection of duplicate constituent - -- use. Remove the constituent from Hidden_States to signal - -- that it has already been used. + -- A valid hidden state or variable acts as a constituent - if Node (State_Elmt) = Constit_Id then - Add_Item (Constit_Id, Constituents_Seen); - Remove_Elmt (Hidden_States, State_Elmt); + if Node (State_Elmt) = Constit_Id then - -- Collect the constituent in the list of refinement - -- items. Establish a relation between the refined state - -- and its constituent. +-- Add the constituent to the lis of processed items +-- to aid with the detection of duplicates. Remove the +-- constituent from Hidden_States to signal that it +-- has already been matched. -
[Ada] Containment of finalization actions in short circuit operators
This change ensures that any finalization action required within an expression that appears as the left operand of a short circuit operator remains contained within the code fragment that evaluates that operand (and not scattered after the evaluation of the complete Boolean expression). This is helpful for static analysis tools, in particular for coverage analysis. This is achieved by enclosing the left operand in an Expression_With_Actions, which will also capture the required finalization actions. A number of adjustments are made so that the rest of the compiler knows to deal appropriately with the new occurrences of these expressions with actions. Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Thomas Quinot qui...@adacore.com * exp_util.adb (Get_Current_Value_Condition, Set_Current_Value_Condition): Handle the case of expressions with actions * exp_util.adb (Insert_Actions): Handle the case of an expression with actions whose Actions list is empty. * exp_util.adb (Remove_Side_Effects.Side_Effect_Free): An expression with actions that has no Actions and whose Expression is side effect free is itself side effect free. * exp_util.adb (Remove_Side_Effects): Do not set an incorrect etype on temporary 'R' (Def_Id), which is in general an access to Exp_Type, not an Exp_Type. * sem_res.adb (Resolve): For an expression with actions, resolve the expression early. * sem_res.adb (Resolve_Expression_With_Actions): Rewrite an expression with actions whose value is compile time known and which has no actions into just its expression, so that its constant value is available downstream. * sem_res.adb (Resolve_Short_Circuit): Wrap the left operand in an expression with actions to contain any required finalization actions. * exp_ch4.adb (Expand_Expression_With_Actions): For an expression with actions returning a Boolean expression, ensure any finalization action is kept within the Actions list. * sem_warn.adb (Check_References, Check_Unset_Reference): add missing circuitry to handle expressions with actions. * checks.adb (Ensure_Valid): For an expression with actions, insert the validity check on the Expression. * sem_ch13.adb (Build_Static_Predicate.Get_RList): An expression with actions that has a non-empty Actions list is not static. An expression with actions that has an empty Actions list has the static ranges of its Expression. * sem_util.adb (Has_No_Obvious_Side_Effects): An expression with actions with an empty Actions list has no obvious side effects if its Expression itsekf has no obvious side effects. Index: exp_util.adb === --- exp_util.adb(revision 203762) +++ exp_util.adb(working copy) @@ -2706,18 +2706,36 @@ (N : Node_Id; S : Boolean) is - Cond : Node_Id; - Sens : Boolean; + Cond : Node_Id; + Prev_Cond : Node_Id; + Sens : Boolean; begin Cond := N; Sens := S; - -- Deal with NOT operators, inverting sense + loop +Prev_Cond := Cond; - while Nkind (Cond) = N_Op_Not loop -Cond := Right_Opnd (Cond); -Sens := not Sens; +-- Deal with NOT operators, inverting sense + +while Nkind (Cond) = N_Op_Not loop + Cond := Right_Opnd (Cond); + Sens := not Sens; +end loop; + +-- Deal with conversions, qualifications, and expressions with +-- actions. + +while Nkind_In (Cond, +N_Type_Conversion, +N_Qualified_Expression, +N_Expression_With_Actions) +loop + Cond := Expression (Cond); +end loop; + +exit when Cond = Prev_Cond; end loop; -- Deal with AND THEN and AND cases @@ -2798,9 +2816,16 @@ return; --- Case of Boolean variable reference, return as though the --- reference had said var = True. + elsif Nkind_In (Cond, + N_Type_Conversion, + N_Qualified_Expression, + N_Expression_With_Actions) + then +Cond := Expression (Cond); + -- Case of Boolean variable reference, return as though the + -- reference had said var = True. + else if Is_Entity_Name (Cond) and then Ent = Entity (Cond) then Val := New_Occurrence_Of (Standard_True, Sloc (Cond)); @@ -3406,8 +3431,13 @@ when N_Expression_With_Actions = if N = Expression (P) then -
Re: [wide-int] int_traits tree
Richard Biener rguent...@suse.de writes: What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider width. RTL constants are sign-extended (whether you call them then signed is up to you). They have a precision. This is how they are valid reps for wide-ints, and that doesn't change. I was saying that if we make not _all_ wide-ints sign-extended then we can use the tree rep as-is. We'd then have the wide-int rep being either zero or sign extended but not arbitrary random bits outside of the precision (same as the tree rep). OK, but does that have any practical value over leaving them arbitrary, as in Kenny's original implementation? Saying that upper bits can be either signs or zeros means that readers wouldn't be able to rely on one particular extension (so would have to do the work that they did in Kenny's implementation). At the same time it means that writers can't leave the upper bits in an arbitrary state, so would have to make sure that they are signs or zeros (presumably having a free choice of which). Thanks, Richard
[Ada] Warnings on unused with_clauses in subunits
When generating code, subunits are inserted into the tree of the parent unit, and their context is added to the context of the parent. This makes it hard to determine whether any with_clause on the proper body is superfluous. This patch adds circuitry to detect these superfluous with_clauses when the subunit is compiled by itself, in -gnatc mode. Executing: gcc -c -gnatc -gnatwu pkg-proc.adb must yield: pkg-proc.adb:1:06: warning: unit Unused_Pkg is not referenced pkg-proc.adb:2:06: warning: unit Unused_Pkg_2 is not referenced pkg.adb:3:03: warning: variable Variable is never read and never assigned --- with Unused_Pkg; with Unused_Pkg_2; with Ada.Text_Io; separate (Pkg) procedure Proc is begin Ada.Text_Io.Put_Line (Hello); end Proc; --- with Unused_Pkg; package body pkg is Variable : Unused_Pkg.T; procedure Proc is separate; end Pkg; --- package Pkg is procedure Proc; end Pkg; --- package body Unused_Pkg is procedure Empty is begin null; end Empty; end Unused_Pkg; -- package Unused_Pkg is type T is (Tbd); procedure Empty; end Unused_Pkg; -- package Unused_Pkg_2 is type T is (Tbd); end Unused_Pkg_2; Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Ed Schonberg schonb...@adacore.com * sem_warn.adb (Check_Unused_Withs): If the main unit is a subunit, apply the check to the units mentioned in its context only. This provides additional warnings on with_clauses that are superfluous. Index: sem_warn.adb === --- sem_warn.adb(revision 203763) +++ sem_warn.adb(working copy) @@ -2545,13 +2545,16 @@ return; end if; - -- Flag any unused with clauses, but skip this step if we are compiling - -- a subunit on its own, since we do not have enough information to - -- determine whether with's are used. We will get the relevant warnings - -- when we compile the parent. This is the normal style of GNAT - -- compilation in any case. + -- Flag any unused with clauses. For a subunit, check only the units + -- in its context, not those of the parent, which may be needed by other + -- subunits. We will get the full warnings when we compile the parent, + -- but the following is helpful when compiling a subunit by itself. if Nkind (Unit (Cunit (Main_Unit))) = N_Subunit then + if Current_Sem_Unit = Main_Unit then +Check_One_Unit (Main_Unit); + end if; + return; end if;
[Ada] Non-SPARK package bodies and state refinement
This patch modifies the analysis of package bodies to suppress an error message prompting for state refinement when the body's SPARK mode is Off. -- Source -- -- pack.ads package Pack with Abstract_State = State, Initializes= State is procedure Proc with Global = (In_Out = State), Depends = (State = State); end Pack; -- pack.adb package body Pack is pragma SPARK_Mode (Off); X : Integer := 0; procedure Proc is begin X := X + 1; end Proc; end Pack; - -- Compilation -- - $ gcc -c -gnatd.V pack.adb Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Hristian Kirtchev kirtc...@adacore.com * sem_ch3.adb (Analyze_Declarations): Emit an error message concerning state refinement when the spec defines at least one non-null abstract state and the body's SPARK mode is On. (Requires_State_Refinement): New routine. Index: sem_ch3.adb === --- sem_ch3.adb (revision 203762) +++ sem_ch3.adb (working copy) @@ -2071,6 +2071,12 @@ -- If the states have visible refinement, remove the visibility of each -- constituent at the end of the package body declarations. + function Requires_State_Refinement +(Spec_Id : Entity_Id; + Body_Id : Entity_Id) return Boolean; + -- Determine whether a package denoted by its spec and body entities + -- requires refinement of abstract states. + - -- Adjust_Decl -- - @@ -2100,6 +2106,82 @@ end if; end Remove_Visible_Refinements; + --- + -- Requires_State_Refinement -- + --- + + function Requires_State_Refinement +(Spec_Id : Entity_Id; + Body_Id : Entity_Id) return Boolean + is + function Mode_Is_Off (Prag : Node_Id) return Boolean; + -- Given pragma SPARK_Mode, determine whether the mode is Off + + - + -- Mode_Is_Off -- + - + + function Mode_Is_Off (Prag : Node_Id) return Boolean is +Mode : Node_Id; + + begin +-- The default SPARK mode is On + +if No (Prag) then + return False; +end if; + +Mode := + Get_Pragma_Arg (First (Pragma_Argument_Associations (Prag))); + +-- Then the pragma lacks an argument, the default mode is On + +if No (Mode) then + return False; +else + return Chars (Mode) = Name_Off; +end if; + end Mode_Is_Off; + + -- Start of processing for Requires_State_Refinement + + begin + -- A package that does not define at least one abstract state cannot + -- possibly require refinement. + + if No (Abstract_States (Spec_Id)) then +return False; + + -- The package instroduces a single null state which does not merit + -- refinement. + + elsif Has_Null_Abstract_State (Spec_Id) then +return False; + + -- Check whether the package body is subject to pragma SPARK_Mode. If + -- it is and the mode is Off, the package body is considered to be in + -- regular Ada and does not require refinement. + + elsif Mode_Is_Off (SPARK_Mode_Pragmas (Body_Id)) then +return False; + + -- The body's SPARK_Mode may be inherited from a similar pragma that + -- appears in the private declarations of the spec. The pragma we are + -- interested appears as the second entry in SPARK_Mode_Pragmas. + + elsif Present (SPARK_Mode_Pragmas (Spec_Id)) + and then Mode_Is_Off (Next_Pragma (SPARK_Mode_Pragmas (Spec_Id))) + then +return False; + + -- The spec defines at least one abstract state and the body has no + -- way of circumventing the refinement. + + else +return True; + end if; + end Requires_State_Refinement; + -- Local variables Body_Id : Entity_Id; @@ -2264,9 +2346,7 @@ -- State refinement is required when the package declaration has -- abstract states. Null states are not considered. -elsif Present (Abstract_States (Spec_Id)) - and then not Has_Null_Abstract_State (Spec_Id) -then +elsif Requires_State_Refinement (Spec_Id, Body_Id) then Error_Msg_NE (package requires state refinement, Context, Spec_Id); end if;
[Ada] Remove special handling for package declaring abstract state
On the basis of further discussion, we decided not to implement the rule saying that a package body must be required for some other reason if an abstract state is declared. Now we just say a package body is required if a non-null abstract state and that's it! This change undoes the error message. So if we compile the following with -gnatd.V -gnatld7 (but without -gnatc), we get cannot generate code for file nnas.ads (package spec) Compiling: nnas.ads 1. package NNAS 2.with Abstract_State = State 3. is 4.-- package declarations with non-null 5.-- Abstract State shall have bodies. 6. end NNAS; 6 lines: No errors The cannot generate line reflects the fact that this package spec requires a body, so the spec cannot be compiled alone. Tested on x86_64-pc-linux-gnu, committed on trunk 2013-10-17 Robert Dewar de...@adacore.com * sem_ch7.adb (Analyze_Package_Specification): Remove circuit for ensuring that a package spec requires a body for some other reason than that it contains the declaration of an abstract state. Index: sem_ch7.adb === --- sem_ch7.adb (revision 203755) +++ sem_ch7.adb (working copy) @@ -1493,34 +1493,6 @@ Check_One_Tagged_Type_Or_Extension_At_Most; - -- Issue an error if a package that is a library unit does not require a - -- body, and we have a non-null abstract state (SPARK LRM 7.1.5(4)). - - if not Unit_Requires_Body (Id, Ignore_Abstract_State = True) -and then Present (Abstract_States (Id)) - --- We use Scope_Depth of 1 to identify library units, which seems a --- bit ugly, but there doesn't seem to be an easier way. - -and then Scope_Depth (Id) = 1 - --- A null abstract state always appears as the sole element of the --- state list. - -and then not Is_Null_State (Node (First_Elmt (Abstract_States (Id - then - declare -P : constant Node_Id := Get_Pragma (Id, Pragma_Abstract_State); - begin -Error_Msg_NE - (package specifies a non-null abstract state, P, Id); -Error_Msg_N - (\but package does not otherwise require a body, P); -Error_Msg_N - (\pragma Elaborate_Body is required in this case, P); - end; - end if; - -- If switch set, output information on why body required if List_Body_Required_Info
Re: [PATCH][1/3] Re-submission of Altera Nios II port, gcc parts
On 07/14/2013 09:54 AM, Chung-Lin Tang wrote: Hi, the last ping of the Nios II patches was: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01416.html After assessing the state, we feel it would be better to post a re-submission of the newest patches. Since this hasn't attracted attention for months I'll have a go at a review. I was not involved in this project and haven't seen this code before. @@ -4196,6 +4203,7 @@ esac # version to the per-target configury. case $cpu_type in alpha | arm | avr | bfin | cris | i386 | m32c | m68k | microblaze | mips \ + | nios2 \ | pa | rs6000 | score | sparc | spu | tilegx | tilepro | xstormy16 | xtensa) insn=nop Could maybe format this nicer to fill up all but the last line. +;; These are documented for use in inline asm. +(define_register_constraint D00 D00_REG Hard register 0.) +(define_register_constraint D01 D01_REG Hard register 1.) +(define_register_constraint D02 D02_REG Hard register 2.) [...] This doesn't strike me as a good idea. To really make this work for cases where people write things like D28D13 you'd have to provide all the union classes as well which is obviously infeasible. There are alternative mechanisms to get assembly to use the registers you want, and I'll note that libgcc seems to be using those instead of these constraints. This probably also slows down the compiler, and it might be worth an experiment to see whether deleting these has any effect on register allocation. +;; We use the following constraint letters for constants +;; +;; I: -32768 to -32767 +;; J: 0 to 65535 +(define_insn movhi_internal + [(set (match_operand:HI 0 nonimmediate_operand =m, r,r, r,r) +(match_operand:HI 1 general_operand rM,m,rM,I,J))] The J alternative is unnecessary. The range 32768...65535 doesn't give a valid HImode constant, the RTL representation of integer constants always sign extends. Probably you can just use i here and in movqi_internal. +(define_insn movsi_internal + [(set (match_operand:SI 0 nonimmediate_operand =m, r,r, r,r,r,r,r) +(match_operand:SI 1 general_operand rM,m,rM,I,J,K,S,i))] Not required, but as an experiment you might want to reorder the alternatives and sprinkle extra rs to get optional reloads and see whether that improves code generation. You'd have (match_operand:SI 0 nonimmediate_operand =r,rm,r ,r ,r ,r ,r ,r) (match_operand:SI 1 general_operand rM,rM,rm,rI,rJ,rK,rS,ri))] with the register-register move first. It might not make a difference. +;; Split patterns for register alternative cases. +(define_split + [(set (match_operand:SI 0 register_operand ) +(sign_extend:SI (match_operand:HI 1 register_operand )))] Could merge into a define_insn_and_split? + reload_completed + [(set (match_dup 0) +(and:SI (match_dup 1) (const_int 65535))) + (set (match_dup 0) +(xor:SI (match_dup 0) (const_int 32768))) + (set (match_dup 0) +(plus:SI (match_dup 0) (const_int -32768)))] + operands[1] = gen_lowpart (SImode, operands[1]);) Is this really faster than two shifts (either as expander or splitter)? +;; Arithmetic Operations + +(define_insn addsi3 + [(set (match_operand:SI 0 register_operand =r) +(plus:SI (match_operand:SI 1 register_operand %r) + (match_operand:SI 2 arith_operand rIT)))] + + add%i2\\t%0, %1, %z2 + [(set_attr type alu)]) [...] +(define_insn mulsi3 + [(set (match_operand:SI 0 register_operand =r) +(mult:SI (match_operand:SI 1 register_operand %r) + (match_operand:SI 2 arith_operandrI)))] + TARGET_HAS_MUL + mul%i2\\t%0, %1, %z2 + [(set_attr type mul)]) There seems to be a slight mismatch here between arith_operand and the constraints. Unlike in addsi3, T (which calls nios2_unspec_reloc_p) isn't allowed here and in the other uses of arith_operand. That suggests that arith_operand should be split into two different predicates, one for addsi3 and one for all the other uses. +(define_expand divsi3 + [(set (match_operand:SI 0 register_operand =r) +(div:SI (match_operand:SI 1 register_operand r) +(match_operand:SI 2 register_operand r)))] + +{ + if (!TARGET_HAS_DIV) +{ + if (!TARGET_FAST_SW_DIV) +FAIL; + else +{ + if (nios2_emit_expensive_div (operands, SImode)) +DONE; +} +} +}) Shouldn't this FAIL if !nios2_emit_expensive_div (ok so that function never returns 0 but still...)? +;; Integer logical Operations + +(define_code_iterator LOGICAL [and ior xor]) +(define_code_attr logical_asm [(and and) (ior or) (xor xor)]) + +(define_insn codesi3 + [(set (match_operand:SI 0 register_operand =r,r,r) +(LOGICAL:SI (match_operand:SI 1 register_operand %r,r,r) +(match_operand:SI 2 logical_operand rM,J,K)))] + + @ +
[PATCH, SH] Add support for inlined builtin-strcmp (1/2)
Hello, This patch just reorganizes the SH code used for memory builtins into its own file, in preparation of the RTL strcmp hoisting in the next part. OK for trunk ? Thanks Christian 2013-10-17 Christian Bruel christian.br...@st.com * config.gcc (sh-*): Add sh-mem.o to extra_obj. * gcc/config/sh/t-sh (sh-mem.o): New rule. * gcc/config/sh/sh-mem (expand_block_move): Moved here. * gcc/config/sh/sh.c (force_into, expand_block_move): Move to sh-mem.c Index: gcc/config/sh/sh-mem.c === --- gcc/config/sh/sh-mem.c (revision 0) +++ gcc/config/sh/sh-mem.c (working copy) @@ -0,0 +1,176 @@ +/* Helper routines for memory move and comparison insns. + Copyright (C) 2013 Free Software Foundation, Inc. + +This file is part of GCC. + +GCC is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 3, or (at your option) +any later version. + +GCC is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with GCC; see the file COPYING3. If not see +http://www.gnu.org/licenses/. */ + +#include config.h +#include system.h +#include coretypes.h +#include tm.h +#include expr.h +#include tm_p.h + +/* Like force_operand, but guarantees that VALUE ends up in TARGET. */ +static void +force_into (rtx value, rtx target) +{ + value = force_operand (value, target); + if (! rtx_equal_p (value, target)) +emit_insn (gen_move_insn (target, value)); +} + +/* Emit code to perform a block move. Choose the best method. + + OPERANDS[0] is the destination. + OPERANDS[1] is the source. + OPERANDS[2] is the size. + OPERANDS[3] is the alignment safe to use. */ +bool +expand_block_move (rtx *operands) +{ + int align = INTVAL (operands[3]); + int constp = (CONST_INT_P (operands[2])); + int bytes = (constp ? INTVAL (operands[2]) : 0); + + if (! constp) +return false; + + /* If we could use mov.l to move words and dest is word-aligned, we + can use movua.l for loads and still generate a relatively short + and efficient sequence. */ + if (TARGET_SH4A_ARCH align 4 + MEM_ALIGN (operands[0]) = 32 + can_move_by_pieces (bytes, 32)) +{ + rtx dest = copy_rtx (operands[0]); + rtx src = copy_rtx (operands[1]); + /* We could use different pseudos for each copied word, but + since movua can only load into r0, it's kind of + pointless. */ + rtx temp = gen_reg_rtx (SImode); + rtx src_addr = copy_addr_to_reg (XEXP (src, 0)); + int copied = 0; + + while (copied + 4 = bytes) + { + rtx to = adjust_address (dest, SImode, copied); + rtx from = adjust_automodify_address (src, BLKmode, + src_addr, copied); + + set_mem_size (from, 4); + emit_insn (gen_movua (temp, from)); + emit_move_insn (src_addr, plus_constant (Pmode, src_addr, 4)); + emit_move_insn (to, temp); + copied += 4; + } + + if (copied bytes) + move_by_pieces (adjust_address (dest, BLKmode, copied), + adjust_automodify_address (src, BLKmode, + src_addr, copied), + bytes - copied, align, 0); + + return true; +} + + /* If it isn't a constant number of bytes, or if it doesn't have 4 byte + alignment, or if it isn't a multiple of 4 bytes, then fail. */ + if (align 4 || (bytes % 4 != 0)) +return false; + + if (TARGET_HARD_SH4) +{ + if (bytes 12) + return false; + else if (bytes == 12) + { + rtx func_addr_rtx = gen_reg_rtx (Pmode); + rtx r4 = gen_rtx_REG (SImode, 4); + rtx r5 = gen_rtx_REG (SImode, 5); + + function_symbol (func_addr_rtx, __movmemSI12_i4, SFUNC_STATIC); + force_into (XEXP (operands[0], 0), r4); + force_into (XEXP (operands[1], 0), r5); + emit_insn (gen_block_move_real_i4 (func_addr_rtx)); + return true; + } + else if (! optimize_size) + { + const char *entry_name; + rtx func_addr_rtx = gen_reg_rtx (Pmode); + int dwords; + rtx r4 = gen_rtx_REG (SImode, 4); + rtx r5 = gen_rtx_REG (SImode, 5); + rtx r6 = gen_rtx_REG (SImode, 6); + + entry_name = (bytes 4 ? __movmem_i4_odd : __movmem_i4_even); + function_symbol (func_addr_rtx, entry_name, SFUNC_STATIC); + force_into (XEXP (operands[0], 0), r4); + force_into (XEXP (operands[1], 0), r5); + + dwords = bytes 3; + emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1))); + emit_insn (gen_block_lump_real_i4 (func_addr_rtx)); + return true; + } + else + return false; +} + if (bytes 64) +{ + char entry[30]; + rtx func_addr_rtx = gen_reg_rtx (Pmode); + rtx r4 = gen_rtx_REG (SImode, 4); + rtx r5 = gen_rtx_REG (SImode, 5); + + sprintf (entry, __movmemSI%d, bytes); + function_symbol
[PATCH, SH] Add support for inlined builtin-strcmp (2/2)
Hello, This patch adds support to inline an optimized version of strcmp when not optimizing for size. The generated code makes use of the cmp/str instruction to test 4 bytes at a time when correctly aligned. note that a new pattern was added to match the cmp/str instruction, but no attempt was made to catch it from combine. This results in general cycles improvements (against both newlib and glibc implementations), one of which is a 10% cycle improvement for a famous strcmp-biased benchmark starting with a D , but still standard. This optimization can be disabled with -fno-builtin-strcmp. No regressions on sh4 in big and little endian, and sh2 (sh3, and sh4a are still running for big and little endian for sanity) OK for trunk Thanks Christian 2013-10-17 Christian Bruel christian.br...@st.com * gcc/config/sh/sh-mem.c (sh4_expand_cmpstr): New function. * gcc/config/sh/sh-protos.h (sh4_expand_cmpstr): Declare. * gcc/config/sh/sh.md (cmpstrsi, cmpstr_t): New patterns. (rotlhi3_8): Rename. --- gcc/config/sh/sh.md 2013-10-17 15:14:18.0 +0200 +++ gcc-new/config/sh/sh.md 2013-10-16 16:13:49.0 +0200 @@ -31,9 +31,6 @@ ;; ??? The MAC.W and MAC.L instructions are not supported. There is no ;; way to generate them. -;; ??? The cmp/str instruction is not supported. Perhaps it can be used -;; for a str* inline function. - ;; BSR is not generated by the compiler proper, but when relaxing, it ;; generates .uses pseudo-ops that allow linker relaxation to create ;; BSR. This is actually implemented in bfd/{coff,elf32}-sh.c @@ -4037,7 +4034,7 @@ DONE; }) -(define_insn *rotlhi3_8 +(define_insn rotlhi3_8 [(set (match_operand:HI 0 arith_reg_dest =r) (rotate:HI (match_operand:HI 1 arith_reg_operand r) (const_int 8)))] @@ -11912,6 +11909,41 @@ jsr @%0%# [(set_attr type sfunc) (set_attr needs_delay_slot yes)]) + +;; byte compare pattern +;; temp = a ^ b; +;; !((temp 0xF000) (temp 0x0F00) (temp 0x00F0) (temp 0x000F)) +(define_insn cmpstr_t + [(set (reg:SI T_REG) + (eq:SI (and:SI + (and:SI + (and:SI + (zero_extract:SI (xor:SI (match_operand:SI 0 arith_reg_operand r) + (match_operand:SI 1 arith_reg_operand r)) + (const_int 8) (const_int 0)) + (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1)) + (const_int 8) (const_int 8))) + (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1)) + (const_int 8) (const_int 16))) + (zero_extract:SI (xor:SI (match_dup 0) (match_dup 1)) + (const_int 8) (const_int 24))) (const_int 0)))] + TARGET_SH1 + cmp/str %0,%1 + [(set_attr type mt_group)]) + +(define_expand cmpstrsi + [(set (match_operand:SI 0 register_operand ) + (compare:SI (match_operand:BLK 1 memory_operand ) + (match_operand:BLK 2 memory_operand ))) + (use (match_operand 3 immediate_operand ))] + TARGET_SH1 + +{ + if (! optimize_insn_for_size_p () sh4_expand_cmpstr(operands)) + DONE; + else FAIL; +}) + ;; - ;; Floating point instructions. diff -ru gcc/config/sh/sh-mem.c gcc-new/config/sh/sh-mem.c --- gcc/config/sh/sh-mem.c 2013-10-17 14:59:02.0 +0200 +++ gcc-new/config/sh/sh-mem.c 2013-10-17 14:57:57.0 +0200 @@ -23,6 +23,7 @@ #include tm.h #include expr.h #include tm_p.h +#include basic-block.h /* Like force_operand, but guarantees that VALUE ends up in TARGET. */ static void @@ -174,3 +175,130 @@ return false; } + +/* Emit code to perform a strcmp. + + OPERANDS[0] is the destination. + OPERANDS[1] is the first string. + OPERANDS[2] is the second string. + OPERANDS[3] is the align. */ +bool +sh4_expand_cmpstr (rtx *operands) +{ + rtx s1 = copy_rtx (operands[1]); + rtx s2 = copy_rtx (operands[2]); + rtx s1_addr = copy_addr_to_reg (XEXP (s1, 0)); + rtx s2_addr = copy_addr_to_reg (XEXP (s2, 0)); + rtx tmp0 = gen_reg_rtx (SImode); + rtx tmp1 = gen_reg_rtx (SImode); + rtx tmp2 = gen_reg_rtx (SImode); + rtx tmp3 = gen_reg_rtx (SImode); + + rtx L_return = gen_label_rtx (); + rtx L_loop_byte = gen_label_rtx (); + rtx L_end_loop_byte = gen_label_rtx (); + rtx L_loop_long = gen_label_rtx (); + rtx L_end_loop_long = gen_label_rtx (); + + rtx jump, addr1, addr2; + int prob_unlikely = REG_BR_PROB_BASE / 10; + int prob_likely = REG_BR_PROB_BASE / 4; + + emit_insn (gen_iorsi3 (tmp1, s1_addr, s2_addr)); + emit_move_insn (tmp0, GEN_INT (3)); + + emit_insn (gen_tstsi_t (tmp0, tmp1)); + + emit_move_insn (tmp0, const0_rtx); + + jump = emit_jump_insn (gen_branch_false (L_loop_byte)); + add_int_reg_note (jump, REG_BR_PROB, prob_likely); + + addr1 = adjust_automodify_address (s1, SImode, s1_addr, 0); + addr2 = adjust_automodify_address (s2, SImode, s2_addr, 0); + + /* tmp2 is aligned, OK to load. */ + emit_move_insn (tmp3, addr2); + emit_move_insn (s2_addr, plus_constant (Pmode, s2_addr, 4)); + + /*start long loop. */ + emit_label (L_loop_long); + + emit_move_insn
Re: [PATCH i386 3/8] [AVX512] [19/n] Add AVX-512 patterns: Extracts and converts.
Hello, On 17 Oct 13:14, Uros Bizjak wrote: On Thu, Oct 17, 2013 at 12:47 PM, Kirill Yukhin kirill.yuk...@gmail.com wrote: I suspect gen_lowpart is bad turn when reload is completed, as far as it can create new pseudo. gen_lowpart () may call gen_reg_rtx (), which contain corresponging gcc_assert (). False. gen_lowpart is perfectly safe post-reload. Indeed, taking the subreg of a hard register should arrive x = gen_rtx_REG_offset (op, outermode, final_regno, final_offset); in simplify_subreg. Have you encountered some specific problem with gen_lowpart? Maybe the code in the pattern is buggy? Or is it a gen_lowpart? I think that original approach with gen_rtx_REG is correct and follows established practice in sse.md (please grep for gen_reg_RTX in sse.md). If this approach is necessary due to the deficiency of gen_lowpart, then the fix to gen_lowpart should be proposed in a follow-up patch. So, I've reverted changes in mult_vect patterns and added % to constraints. I've also reverted vec_extract_* (with slight update): ... + reload_completed + [(set (match_dup 0) (match_dup 1))] +{ + if (REG_P (operands[1])) +operands[1] = gen_rtx_REG (V16HImode, REGNO (operands[1])); + else +operands[1] = adjust_address (operands[1], V16HImode, 0); +}) Bootastrapped. AVX* tests pass (including new AVX-512) Is it ok now? -- Thanks, K --- gcc/config/i386/i386.md | 5 + gcc/config/i386/predicates.md | 40 ++ gcc/config/i386/sse.md| 873 +- 3 files changed, 912 insertions(+), 6 deletions(-) diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 10ca6cb..e7e9f2d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -831,6 +831,11 @@ (define_code_attr s [(sign_extend s) (zero_extend u)]) (define_code_attr u_bool [(sign_extend false) (zero_extend true)]) +;; Used in signed and unsigned truncations. +(define_code_iterator any_truncate [ss_truncate truncate us_truncate]) +;; Instruction suffix for truncations. +(define_code_attr trunsuffix [(ss_truncate s) (truncate ) (us_truncate us)]) + ;; Used in signed and unsigned fix. (define_code_iterator any_fix [fix unsigned_fix]) (define_code_attr fixsuffix [(fix ) (unsigned_fix u)]) diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index 06b2914..999d8ab 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -752,6 +752,11 @@ (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 6, 7 +;; Match 8 to 9. +(define_predicate const_8_to_9_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 8, 9 + ;; Match 8 to 11. (define_predicate const_8_to_11_operand (and (match_code const_int) @@ -762,16 +767,51 @@ (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 8, 15 +;; Match 10 to 11. +(define_predicate const_10_to_11_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 10, 11 + +;; Match 12 to 13. +(define_predicate const_12_to_13_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 12, 13 + ;; Match 12 to 15. (define_predicate const_12_to_15_operand (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 12, 15 +;; Match 14 to 15. +(define_predicate const_14_to_15_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 14, 15 + +;; Match 16 to 19. +(define_predicate const_16_to_19_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 16, 19 + ;; Match 16 to 31. (define_predicate const_16_to_31_operand (and (match_code const_int) (match_test IN_RANGE (INTVAL (op), 16, 31 +;; Match 20 to 23. +(define_predicate const_20_to_23_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 20, 23 + +;; Match 24 to 27. +(define_predicate const_24_to_27_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 24, 27 + +;; Match 28 to 31. +(define_predicate const_28_to_31_operand + (and (match_code const_int) + (match_test IN_RANGE (INTVAL (op), 28, 31 + ;; True if this is a constant appropriate for an increment or decrement. (define_predicate incdec_operand (match_code const_int) diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 623e919..e21cba3 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -87,6 +87,7 @@ ;; For AVX512F support UNSPEC_VPERMI2 UNSPEC_VPERMT2 + UNSPEC_UNSIGNED_FIX_NOTRUNC UNSPEC_UNSIGNED_PCMP UNSPEC_TESTM UNSPEC_TESTNM @@ -2994,6 +2995,34 @@ (set_attr prefix maybe_vex) (set_attr mode DI)]) +(define_insn cvtusi2ssescalarmodesuffix32 + [(set (match_operand:VF_128 0 register_operand =v) + (vec_merge:VF_128 +
Re: [wide-int] int_traits tree
On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. The answer's the same as always. RTL constants don't have a sign. Any time we extend an RTL constant, we need to say whether the extension should be sign extension or zero extension. So the natural model for RTL is that an SImode addition is done to SImode width, not some arbitrary wider width. RTL constants are sign-extended (whether you call them then signed is up to you). They have a precision. This is how they are valid reps for wide-ints, and that doesn't change. I was saying that if we make not _all_ wide-ints sign-extended then we can use the tree rep as-is. We'd then have the wide-int rep being either zero or sign extended but not arbitrary random bits outside of the precision (same as the tree rep). OK, but does that have any practical value over leaving them arbitrary, as in Kenny's original implementation? Saying that upper bits can be either signs or zeros means that readers wouldn't be able to rely on one particular extension (so would have to do the work that they did in Kenny's implementation). At the same time it means that writers can't leave the upper bits in an arbitrary state, so would have to make sure that they are signs or zeros (presumably having a free choice of which). At least you can easily optimize for existing zero / sign extension. What I see is really bad code for the simple integer-cst predicates in tree.c. I don't mind in what way we fix it, but avoiding the copy on every tree integer constant read looks required to me. Richard.
Re: [PATCH][2/3] Re-submission of Altera Nios II port, testsuite parts
On 07/14/2013 09:54 AM, Chung-Lin Tang wrote: These are nios2 patches for the gcc testsuite. Some new testcases were added since the last posting. Index: gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c === --- gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c(revision 200946) +++ gcc/testsuite/gcc.c-torture/execute/builtins/lib/chk.c(working copy) @@ -124,16 +124,17 @@ __memmove_chk (void *dst, const void *src, __SIZE_ void * memset (void *dst, int c, __SIZE_TYPE__ n) { + while (n-- != 0) +n[(char *) dst] = c; + /* Single-byte memsets should be done inline when optimisation - is enabled. */ + is enabled. Do this after the copy in case we're being called to + initialize bss. */ #ifdef __OPTIMIZE__ if (memset_disallowed inside_main n 2) abort (); #endif - while (n-- != 0) -n[(char *) dst] = c; - return dst; } I'm not sure I understand this change. Is nios2 the only target calling memset to initialize bss, and memset_disallowed is nonzero at the start of execution? Index: gcc/testsuite/gcc.target/nios2/nios2-int-types.c === --- gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) +++ gcc/testsuite/gcc.target/nios2/nios2-int-types.c (revision 0) @@ -0,0 +1,34 @@ +/* Test that various types are all derived from int. */ +/* { dg-do compile { target nios2-*-* } } */ I think you can lose the { target nios2-*-* } for everything inside gcc.target/nios2. Bernd
Re: [wide-int] int_traits tree
Richard Biener rguent...@suse.de writes: What I see is really bad code for the simple integer-cst predicates in tree.c. I don't mind in what way we fix it, but avoiding the copy on every tree integer constant read looks required to me. But we got rid of the copy with yesterday's patch. Are you talking about the code after that patch or before? Richard
Re: [wide-int] int_traits tree
On 10/17/2013 09:16 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Kenneth Zadeck wrote: On 10/17/2013 08:29 AM, Richard Biener wrote: On Thu, 17 Oct 2013, Richard Sandiford wrote: Richard Biener rguent...@suse.de writes: The new tree representation can have a length greater than max_len for an unsigned tree constant that occupies a whole number of HWIs. The tree representation of an unsigned 0x8000 is 0x00 0x80 0x00. When extended to max_wide_int the representation is the same. But a 2-HWI addr_wide_int would be 0x80 0x00, without the leading zero. The MIN trims the length from 3 to 2 in the last case. Oh, so it was the tree rep that changed? _Why_ was it changed? We still cannot use it directly from wide-int and the extra word is redundant because we have access to TYPE_UNSIGNED (TREE_TYPE ()). It means that we can now use the tree's HWI array directly, without any copying, for addr_wide_int and max_wide_int. The only part of decompose () that does a copy is the small_prec case, which is trivially compiled out for addr_wide_int and max_wide_int. 2) addr_wide_int. This is a fixed size representation that is guaranteed to be large enough to compute any bit or byte sized address calculation on the target. Currently the value is 64 + 4 bits rounded up to the next number even multiple of HOST_BITS_PER_WIDE_INT (but this can be changed when the first port needs more than 64 bits for the size of a pointer). This flavor can be used for all address math on the target. In this representation, the values are sign or zero extended based on their input types to the internal precision. All math is done in this precision and then the values are truncated to fit in the result type. Unlike most gimple or rtl intermediate code, it is not useful to perform the address arithmetic at the same precision in which the operands are represented because there has been no effort by the front ends to convert most addressing arithmetic to canonical types. In the addr_wide_int, all numbers are represented as signed numbers. There are enough bits in the internal representation so that no infomation is lost by representing them this way. so I guess from that that addr_wide_int.get_precision is always that 64 + 4 rounded up. Thus decompose gets that constant precision input and the extra zeros make the necessary extension always a no-op. Aha. it is until someone comes up with a port that this will not work for, then they will have to add some machinery to sniff the port and make this bigger. I am hoping to be retired by the time this happens. For max_wide_int the same rules apply, just its size is larger. Ok. So the reps are only canonical wide-int because we only ever use them with precision xprecision (maybe we should assert that). It is now asserted for (as of a few days ago). Btw, we are not using them directly, but every time we actually build a addr_wide_int / max_wide_int we copy them anyway: /* Initialize the storage from integer X, in precision N. */ template int N template typename T inline fixed_wide_int_storage N::fixed_wide_int_storage (const T x) { /* Check for type compatibility. We don't want to initialize a fixed-width integer from something like a wide_int. */ WI_BINARY_RESULT (T, FIXED_WIDE_INT (N)) *assertion ATTRIBUTE_UNUSED; wide_int_ref xi (x, N); len = xi.len; for (unsigned int i = 0; i len; ++i) val[i] = xi.val[i]; } it avoids a 2nd copy though, which shows nicely what was rummaging in my head for the last two days - that the int_trais abstraction was somehow at the wrong level - it should have been traits that are specific to the storage model? or the above should use int_traits::decompose manually with it always doing the copy (that would also optimize away one copy and eventually would make the extra zeros not necessary). this came in with richard's storage manager patch.In my older code, we tried and succeeded many times to just borrow the underlying rep.I think that richard needs to work this out. I originally thought that extra zeros get rid of all copying from trees to all wide-int kinds. What's the reason again to not use my original proposed encoding of the MSB being the sign bit? RTL constants simply are all signed then. Just you have to also sign-extend in functions like lts_p as not all constants are sign-extended. But we can use both tree (with the now appended zero) and RTL constants representation unchanged. I am not following you here. In trees, the msb is effectively a sign bit, even for unsigned numbers because we add that extra block. but inside of wide int, we do not add extra blocks beyond the precision. That would be messy for a lot of other reasons. Can you elaborate? It would make tree and RTX reps directly usable, only wide-int-to-tree and wide-int-to-rtx need special
Re: [RFC] By default if-convert only basic blocks that will be vectorized
On Thu, Oct 17, 2013 at 11:26:56AM +0200, Richard Biener wrote: On Wed, 16 Oct 2013, pins...@gmail.com wrote: On Oct 15, 2013, at 5:32 AM, Jakub Jelinek ja...@redhat.com wrote: Especially on i?86/x86_64 if-conversion pass seems to be often a pessimization, but the vectorization relies on it and without it we can't vectorize a lot of the loops. I think on many other targets it actually helps. I know for one it helps on octeon even though octeon has no vector instructions. I think it helps most arm targets too. The main issue is that it has no cost model - the only cost model being that it can successfully if-convert all conditional code in a loop, resulting in a single-BB loop. So it is clearly vectorization targeted. It's infrastructure may be useful to do a more sensible if-conversion on GIMPLE level on scalar code. Of course even the infrastructure needs some TLC (and some better generic machinery of keeping track and simplifying of a predicate combination). Yeah, or, if tree if-conversion is a net win for some port even when not vectorizing, supposedly such a port should at least until the above is implemented just enable flag_tree_loop_if_convert by default, at least at some -O* levels. Of course the question is why would it be beneficial only in inner loops and not elsewhere (other loops, or stright line code), and when it is desirable and when not (for vectorization we of course try hard to if-convert the whole loop, because otherwise we are not able to vectorize it, but otherwise, is it beneficial just for a couple of conditionalized stmts at most, or is it fine to conditionalize say 1000 arithmetic statements?). Jakub
Re: [patch] omp-low.h
On Tue, Oct 15, 2013 at 10:46:43PM -0400, Andrew MacLeod wrote: Bootstraps on 86_64-unknown-linux-gnu and no new regressions. OK? Andrew * tree-flow.h (struct omp_region): Move to omp-low.c Missing dot at the end of line. Remove omp_ prototypes and variables. * gimple.h (omp_reduction_init): Move prototype to omp-low.h. (copy_var_decl): Relocate prototype from tree-flow.h. * gimple.c (copy_var_decl): Relocate from omp-low.c. * tree.h: Move prototype to omp-low.h. * omp-low.h: New File. Relocate prototypes here. Missing space after . * omp-low.c (struct omp_region): Make local here. (root_omp_region): Make static. (copy_var_decl) Move to gimple.c. (new_omp_region): Make static. (make_gimple_omp_edges): New. Refactored from tree-cfg.c make_edges. * tree-cfg.c: Include omp-low.h. (make_edges): Factor out OMP specific bits to make_gimple_omp_edges. * gimplify.c: Include omp-low.h. * tree-parloops.c: Include omp-low.h. Use Likewise. for the second. c * c-parser.c: Include omp-low.h. * c-typeck.c: Include omp-low.h. Likewise. cp * parser.c: Include omp-low.h. * semantics.c: Include omp-low.h. Likewise. fortran * trans-openmp.c: Include omp-low.h. *** struct omp_for_data *** 135,141 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); --- 175,181 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! static struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); Why? --- 912,917 *** debug_all_omp_regions (void) *** 1219,1225 /* Create a new parallel region starting at STMT inside region PARENT. */ ! struct omp_region * new_omp_region (basic_block bb, enum gimple_code type, struct omp_region *parent) { --- 1238,1244 /* Create a new parallel region starting at STMT inside region PARENT. */ ! static struct omp_region * new_omp_region (basic_block bb, enum gimple_code type, struct omp_region *parent) { Likewise. + case GIMPLE_OMP_CONTINUE: ... + + default: + gcc_unreachable (); Bad indentation here, default: should be indented just by 4 spaces and gcc_unreachable () by 6. Otherwise LGTM. Jakub
Re: [SKETCH] Refactor implicit function template implementation and fix 58534, 58536, 58548, 58549 and 58637.
On 10/15/2013 05:21 PM, Adam Butcher wrote: On Wed, 25 Sep 2013 11:01:26 -0500, Jason Merrill wrote: 1) Build up the type as normal and use tsubst to replace the non-pack template parameter with a pack if needed. The problem I've hit with this (and other hacks I've tried that involve finishing the type first and rewriting afterward) is that, with parm types such as pairauto,auto..., the specialization pairauto,auto is indexed in pt.c by hash_specialization into type_specializations before the '...' is seen. I'm not sure why that's a problem; we just produce a new specialization that uses different template arguments in tsubst. Specifically, the pack 'auto' needs to be a different TEMPLATE_TYPE_PARM from the non-pack 'auto' produced during parsing. Jason
Go patch committed: Don't deref unknown type when importing anon field
This patch fixes the Go frontend to not dereference an unknown type when importing an anonymous field. This fixes a bug in a recent patch I committed. I have a test case ready to commit to the master repository after the Go 1.2 release is made. This patch bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline and 4.8 branch. Ian diff -r 742104f0d4c7 go/types.cc --- a/go/types.cc Wed Oct 16 06:37:07 2013 -0700 +++ b/go/types.cc Thu Oct 17 08:39:50 2013 -0700 @@ -5263,11 +5263,25 @@ // that an embedded builtin type is accessible from another // package (we know that all the builtin types are not // exported). - if (name.empty() ftype-deref()-named_type() != NULL) + // This is called during parsing, before anything is + // lowered, so we have to be careful to avoid dereferencing + // an unknown type name. + if (name.empty()) { - const std::string fn(ftype-deref()-named_type()-name()); - if (fn[0] = 'a' fn[0] = 'z') - name = '.' + imp-package()-pkgpath() + '.' + fn; + Type *t = ftype; + if (t-classification() == Type::TYPE_POINTER) + { + // Very ugly. + Pointer_type* ptype = static_castPointer_type*(t); + t = ptype-points_to(); + } + std::string tname; + if (t-forward_declaration_type() != NULL) + tname = t-forward_declaration_type()-name(); + else if (t-named_type() != NULL) + tname = t-named_type()-name(); + if (!tname.empty() tname[0] = 'a' tname[0] = 'z') + name = '.' + imp-package()-pkgpath() + '.' + tname; } Struct_field sf(Typed_identifier(name, ftype, imp-location()));
Re: [patch] omp-low.h
On Thu, Oct 17, 2013 at 11:52:21AM -0400, Andrew MacLeod wrote: On 10/17/2013 11:15 AM, Jakub Jelinek wrote: *** struct omp_for_data *** 135,141 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); --- 175,181 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! static struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); Why? It should be static now since it is no longer exported outside the file... and can't be now that struct omp_region is declared in omp-low.c Ah, just misread your change, thought you are adding struct keyword while you are actually adding static. Sorry. Jakub
Merge from 4.8 branch to gccgo branch
I've merged revision 203772 from the GCC 4.8 branch to the gccgo branch. Ian
Re: [patch] omp-low.h
On 10/17/2013 11:15 AM, Jakub Jelinek wrote: *** struct omp_for_data *** 135,141 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); --- 175,181 static splay_tree all_contexts; static int taskreg_nesting_level; static int target_nesting_level; ! static struct omp_region *root_omp_region; static bitmap task_shared_vars; static void scan_omp (gimple_seq *, omp_context *); Why? It should be static now since it is no longer exported outside the file... and can't be now that struct omp_region is declared in omp-low.c --- 912,917 *** debug_all_omp_regions (void) *** 1219,1225 /* Create a new parallel region starting at STMT inside region PARENT. */ ! struct omp_region * new_omp_region (basic_block bb, enum gimple_code type, struct omp_region *parent) { --- 1238,1244 /* Create a new parallel region starting at STMT inside region PARENT. */ ! static struct omp_region * new_omp_region (basic_block bb, enum gimple_code type, struct omp_region *parent) { Likewise. again, because it is no longer an exported function... only tree-cfg.c::make_edges needed it, and it no longer does. Thanks. Andrew
Re: [PATCH] reimplement -fstrict-volatile-bitfields v4, part 2/2
where in the C standard did you read the requirement that every bit field must be complete? (This is a serious question). The spec doesn't say each field must be complete, but neither does it say that the structure must be as big as the type used. If you specify int foo:1 then the compile is allowed to make the struct smaller than int. extern struct { volatile unsigned int b : 1; } bf3; on my compiler this structure occupies 4 bytes. and it is aligned at 4 bytes. That is OK for me and AAPCS. On my target, the structure occupies 1 byte. I suspect gcc is rounding up to WORD_MODE, which is 4 bytes for you and 1 for me. If so, you're relying on a coincidence, not a standard. But the access bf3.b=1 is SI mode with Sandra's patch (part 1, which just obeys the AAPCS and does nothing else) and QI mode without this patch, which is just a BUG. I am quite surprised how your target manages to avoid it? It doesn't, but I wouldn't expect it to, because IMHO that test is invalid - you have not given a precise enough test to expect consistent results. It is as Sandra said, at least on ARM -fstrict-volatile-bitfields does not function at all. And the C++11 memory model wins all the time. Are we talking about N2429? I read through the changes and it didn't preclude honoring the user's types. Looking through the tests, most of them combine packed with mismatched types. IMHO, those tests are invalid. I dont think so. They are simply packed. And volatile just says that the value may be changed by a different thread. It has a great impact on loop optimizations. Nothing you've said so far precludes honoring the user's types when the user gives you a consistent structure. Consider: typedef struct { unsigned char b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1; } Bits; extern volatile struct { Bits reg1; /* read status without resetting them */ Bits reg2; /* read status and atomically reset them */ } interrupt_handler; Without -fstrict-volatile-bitfields, gcc is allowed to use a 16-bit access to read reg1, but this causes an unexpected read to volatile reg2, which may have unintended consequences. The spec doesn't say the compiler *must* read reg2, it just *doesn't* say that it *can't*. The -fstrict-volatile-bitfields flag tells the compiler that it *can't*. IMHO this doesn't violate the spec, it just limits what the compiler can do within the spec. If ARM wants fast word-sized volatile bitfields, use int and structure your bitfields so that no int field overlaps a natural int boundary. When I added the option, I considered making it an error() to define a strict volatile bitfield that spanned the allowed boundary of the type specified, but I figured that would be too much. I've not objected to fixing -fstrict-volatile-bitfields, or making the -fno-strict-volatile-bitfields case match the standard. I've only objected to breaking my targets by making that flag not the default. Fine. Why cant we just get this fixed? Dunno. I'm only opposing the patch that disabled it on my targets.
Re: Fwd: [PATCH] Scheduling result adjustment to enable macro-fusion
On Thu, Oct 17, 2013 at 12:35 AM, Marek Polacek pola...@redhat.com wrote: On Wed, Oct 16, 2013 at 04:25:58PM -0700, Wei Mi wrote: +/* Return true if target platform supports macro-fusion. */ + +static bool +ix86_macro_fusion_p () +{ + if (TARGET_FUSE_CMP_AND_BRANCH) +return true; + else +return false; +} That looks weird, why not just static bool ix86_macro_fusion_p (void) { return TARGET_FUSE_CMP_AND_BRANCH; } ? Marek Thanks, fixed. Wei Mi.
Re: [RESEND] Enable building of libatomic on AArch64
On 17 October 2013 10:43, Michael Hudson-Doyle michael.hud...@linaro.org wrote: Resending as the previous attempt went missing... 2013-10-04 Michael Hudson-Doyle michael.hud...@linaro.org * libatomic/configure.tgt (aarch64*): Remove code preventing build. * gcc/testsuite/lib/target-supports.exp (check_effective_target_sync_long_long): AArch64 supports atomic operations on long long. (check_effective_target_sync_long_long_runtime): AArch64 can execute atomic operations on long long. OK, and committed. /Marcus
[C++ Patch] PR 58596
Hi, in this ICE on valid, 4.8/4.9 Regression, the problem is that, for a lambda in an NSDMI context, the early return of lambda_expr_this_capture: /* In unevaluated context this isn't an odr-use, so just return the nearest 'this'. */ if (cp_unevaluated_operand) return lookup_name (this_identifier); fails to find a this_identifier. This is expected, in a NSDMI context, as also clarified from the comment a few lines below. Thus I'm handling the issue the same way, that is by way of scope_chain-x_current_class_ptr. Tested x86_64-linux. Thanks! Paolo. /cp 2013-10-17 Paolo Carlini paolo.carl...@oracle.com PR c++/58596 * lambda.c (lambda_expr_this_capture): Handle NSDMIs in the cp_unevaluated_operand case. /testsuite 2013-10-17 Paolo Carlini paolo.carl...@oracle.com PR c++/58596 * g++.dg/cpp0x/lambda/lambda-nsdmi5.C: New Index: cp/lambda.c === --- cp/lambda.c (revision 203744) +++ cp/lambda.c (working copy) @@ -628,7 +628,14 @@ lambda_expr_this_capture (tree lambda) /* In unevaluated context this isn't an odr-use, so just return the nearest 'this'. */ if (cp_unevaluated_operand) -return lookup_name (this_identifier); +{ + /* In an NSDMI the fake 'this' pointer that we're using for +parsing is in scope_chain. */ + if (LAMBDA_EXPR_EXTRA_SCOPE (lambda) + TREE_CODE (LAMBDA_EXPR_EXTRA_SCOPE (lambda)) == FIELD_DECL) + return scope_chain-x_current_class_ptr; + return lookup_name (this_identifier); +} /* Try to default capture 'this' if we can. */ if (!this_capture Index: testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C === --- testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C (revision 0) +++ testsuite/g++.dg/cpp0x/lambda/lambda-nsdmi5.C (working copy) @@ -0,0 +1,7 @@ +// PR c++/58596 +// { dg-do compile { target c++11 } } + +struct A +{ + int i = [] { return decltype(i)(); }(); +};
Re: [PATCH][i386]Fix PR 57756
On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote: On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote: How is Google going to change its patch commit policies to ensure that this does not happen again? There is nothing to change. Google follows http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed the oversight and if there is any other fallout from his patch, he will address it. I don't see anything out of the ordinary here. Diego. FYI: I just want to make sure everyone is aware that there are still broken targets. rs6000 may be fixed but mips still doesn't work and I presume other platforms like sparc are also still broken. /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In function 'void cpp_atomic_builtins(cpp_reader*)': /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: error: 'target_flags_explicit' was not declared in this scope make[1]: *** [c-family/c-cppbuiltin.o] Error 1 Sriraman, are you looking at how to fix this globally or are the target maintainers expected to fix this? Currently I am using this one line patch to get things building, but I presume this is not what we want long term. % git diff opth-gen.awk diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk index 01c5ab6..46bd570 100644 --- a/gcc/opth-gen.awk +++ b/gcc/opth-gen.awk @@ -114,6 +114,7 @@ print }; print extern struct gcc_options global_options; print extern const struct gcc_options global_options_init; print extern struct gcc_options global_options_set; +print #define target_flags_explicit global_options_set.x_target_flag s print #endif print #endif print Steve Ellcey sell...@mips.com
[AArch64] Fix types for vcvtsd_n intrinsics.
Hi, I spotted that the types of arguments to these intrinsics are wrong, which results in all sorts of fun issues! Fixed thusly, regression tested with aarch64.exp on aarch64-none-elf with no issues. OK? Thanks, James --- 2013-10-17 James Greenhalgh james.greenha...@arm.com * config/aarch64/arm_neon.h (vcvtds_n_fsu32,64_fsu32,64): Correct argument types. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index f7c9db6..55aa742 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -5442,7 +5442,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ int64_t a_ = (a);\ - int64_t result; \ + float64_t result;\ __asm__ (scvtf %d0,%d1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5454,7 +5454,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ uint64_t a_ = (a); \ - uint64_t result; \ + float64_t result;\ __asm__ (ucvtf %d0,%d1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5466,7 +5466,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ float64_t a_ = (a); \ - float64_t result;\ + int64_t result; \ __asm__ (fcvtzs %d0,%d1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5478,7 +5478,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ float64_t a_ = (a); \ - float64_t result;\ + uint64_t result; \ __asm__ (fcvtzu %d0,%d1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5586,7 +5586,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ int32_t a_ = (a);\ - int32_t result; \ + float32_t result;\ __asm__ (scvtf %s0,%s1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5598,7 +5598,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ uint32_t a_ = (a); \ - uint32_t result; \ + float32_t result;\ __asm__ (ucvtf %s0,%s1,%2 \ : =w(result) \ : w(a_), i(b) \ @@ -5610,7 +5610,7 @@ static float32x2_t vdup_n_f32 (float32_t); __extension__ \ ({ \ float32_t a_ = (a); \ - float32_t result;\ + int32_t result; \ __asm__ (fcvtzs %s0,%s1,%2
Re: [PATCH][i386]Fix PR 57756
On Thu, Oct 17, 2013 at 9:28 AM, Steve Ellcey sell...@mips.com wrote: On Thu, 2013-10-17 at 06:03 -0700, Diego Novillo wrote: On Wed, Oct 16, 2013 at 6:06 PM, David Edelsohn dje@gmail.com wrote: How is Google going to change its patch commit policies to ensure that this does not happen again? There is nothing to change. Google follows http://gcc.gnu.org/contribute.html, like everyone else. Sri just fixed the oversight and if there is any other fallout from his patch, he will address it. I don't see anything out of the ordinary here. Diego. FYI: I just want to make sure everyone is aware that there are still broken targets. rs6000 may be fixed but mips still doesn't work and I presume other platforms like sparc are also still broken. /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c: In function 'void cpp_atomic_builtins(cpp_reader*)': /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:594:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:606:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:618:7: error: 'target_flags_explicit' was not declared in this scope /local/home/sellcey/nightly_full/src/gcc/gcc/c-family/c-cppbuiltin.c:630:7: error: 'target_flags_explicit' was not declared in this scope make[1]: *** [c-family/c-cppbuiltin.o] Error 1 Sriraman, are you looking at how to fix this globally or are the target maintainers expected to fix this? Currently I am using this one line patch to get things building, but I presume this is not what we want long term. Yes, I am on it. I will get back asap. Sri % git diff opth-gen.awk diff --git a/gcc/opth-gen.awk b/gcc/opth-gen.awk index 01c5ab6..46bd570 100644 --- a/gcc/opth-gen.awk +++ b/gcc/opth-gen.awk @@ -114,6 +114,7 @@ print }; print extern struct gcc_options global_options; print extern const struct gcc_options global_options_init; print extern struct gcc_options global_options_set; +print #define target_flags_explicit global_options_set.x_target_flag s print #endif print #endif print Steve Ellcey sell...@mips.com
Re: [C++ Patch] PR 58596
OK. Jason
[AArch64,PATCH] Adjust preferred_reload_class of SP+C
Hi, This patch addresses an issue in reload triggered by the gfortran.dg/loc_2.f90 regression test at -O3 with LRA disabled. The patch is based on work done by Ian Bolton here at ARM which I've dusted down and submitted. Following SFP elimination and under heavy register pressure, reload attempts the reload of SP+offset into a V register. The AArch64 instruction set does not support such an operation. We considered two solutions to this issue: 1) Detect the SP+offset pattern in secondary reload and use an intermediate X register. 2) Detect the SP+offset-V pattern inpreferred_reload_class and return NO_REG before secondary reload gets involved. The latter looks like a simpler and more intuitive solution to me than the first. I also note that the i386 backend implementation of preferred_reload_class contains equivalent code. I intend to leave this patch on the list for a few days before committing to give folks knowledgable on reload and the associated target hooks the opportunity to comment. Thanks /Marcus 2013-10-17 Ian Bolton ian.bol...@arm.com Marcus Shawcroft marcus.shawcr...@arm.com * config/aarch64/aarch64.c (aarch64_preferred_reload_class): Special case reload SP+C into none GENERAL_REGS.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 7fce7a0..cc9ecdd 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4237,6 +4237,24 @@ aarch64_preferred_reload_class (rtx x, reg_class_t regclass) !aarch64_simd_imm_scalar_p (x, GET_MODE (x))) return NO_REGS; + /* Register eliminiation can result in a request for + SP+constant-FP_REGS. We cannot support such operations which + use SP as source and an FP_REG as destination, so reject out + right now. */ + if (! reg_class_subset_p (regclass, GENERAL_REGS) GET_CODE (x) == PLUS) +{ + rtx lhs = XEXP (x, 0); + + /* Look through a possible SUBREG introduced by ILP32. */ + if (GET_CODE (lhs) == SUBREG) + lhs = SUBREG_REG (lhs); + + gcc_assert (REG_P (lhs)); + gcc_assert (reg_class_subset_p (REGNO_REG_CLASS (REGNO (lhs)), + POINTER_REGS)); + return NO_REGS; +} + return regclass; }
Re: [PATCH][AArch64] Implement %c output template
On 17 October 2013 12:13, Kyrill Tkachov kyrylo.tkac...@arm.com wrote: [gcc/] 2013-10-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_print_operand): Handle 'c'. [gcc/testsuite] 2013-10-17 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/c-output-template.c: New testcase. * gcc.target/aarch64/c-output-template-2.c: Likewise. * gcc.target/aarch64/c-output-template-3.c: Likewise. OK /Marcus
Re: [AArch64] Fix types for vcvtsd_n intrinsics.
On 17 October 2013 17:27, James Greenhalgh james.greenha...@arm.com wrote: Hi, I spotted that the types of arguments to these intrinsics are wrong, which results in all sorts of fun issues! Fixed thusly, regression tested with aarch64.exp on aarch64-none-elf with no issues. OK? Thanks, James --- 2013-10-17 James Greenhalgh james.greenha...@arm.com * config/aarch64/arm_neon.h (vcvtds_n_fsu32,64_fsu32,64): Correct argument types. OK /Marcus