Re: [PATCH/AARCH64] Add vulcan -mcpu support
Please find gcc ChangeLog in the correct format. ChangeLog: Virendra Pathak * config/aarch64/aarch64-cores.def (vulcan): New core * config/aarch64/aarch64-tune.md: Regenerate * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option. Thanks. with regards, Virendra Pathak On Wed, Jun 8, 2016 at 10:52 AM, Virendra Pathak wrote: > Hi gcc-patches group, > > ping, > > updated the ChangeLog with author's entry. > > ChangeLog: > Virendra Pathak (virendra.pat...@broadcom.com) > * config/aarch64/aarch64-cores.def (vulcan): New core > * config/aarch64/aarch64-tune.md: Regenerate > * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option. > > Thanks. > > with regards, > Virendra Pathak > > > On Sat, Jun 4, 2016 at 12:05 PM, Virendra Pathak > wrote: >> Hi gcc-patches group, >> >> Please find the basic patch for adding -mcpu=vulcan support in the gcc. >> Broadcom's vulcan is an armv8-a aarch64 based server processor. >> >> At present we are using schedule model of cortex-a57 but soon we will >> be submitting a schedule model for vulcan. >> >> Please review the patch (attached with this mail) and kindly merge it >> in the gcc-6-branch. >> >> Tested the patch with aarch64-linux-gnu cross build, >> aarch64-unknown-linux-gnu native build and make check. >> We have also obtained company wide agreement with FSF for contributing >> to gcc project. >> >> Thanks. >> >> >> ChangeLog: >> * config/aarch64/aarch64-cores.def (vulcan): New core >> * config/aarch64/aarch64-tune.md: Regenerate >> * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option. >> >> >> >> with regards, >> Virendra Pathak From 8d065016856606740a3928518ed6a3f9933fb130 Mon Sep 17 00:00:00 2001 From: Virendra Pathak Date: Wed, 1 Jun 2016 03:15:33 -0700 Subject: [PATCH] [AArch64] Add -mcpu vulcan support --- gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/doc/invoke.texi | 4 ++-- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 251a3eb..b0acad9 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -46,6 +46,7 @@ AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AA AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08") AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, "0x53", "0x001") AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800") +AARCH64_CORE("vulcan", vulcan,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516") AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000") diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index cbc6f48..c758a5f 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53" + "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,vulcan,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 821f8fd..146042d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12955,8 +12955,8 @@ processors implementing the target architecture. Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57}, -@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx}, -@samp{xgene1}. +@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{vulcan}, +@samp{thunderx}, @samp{xgene1}. Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible values for this -- 2.1.4
Re: [PATCH/AARCH64] Add vulcan -mcpu support
Hi gcc-patches group, ping, updated the ChangeLog with author's entry. ChangeLog: Virendra Pathak (virendra.pat...@broadcom.com) * config/aarch64/aarch64-cores.def (vulcan): New core * config/aarch64/aarch64-tune.md: Regenerate * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option. Thanks. with regards, Virendra Pathak On Sat, Jun 4, 2016 at 12:05 PM, Virendra Pathak wrote: > Hi gcc-patches group, > > Please find the basic patch for adding -mcpu=vulcan support in the gcc. > Broadcom's vulcan is an armv8-a aarch64 based server processor. > > At present we are using schedule model of cortex-a57 but soon we will > be submitting a schedule model for vulcan. > > Please review the patch (attached with this mail) and kindly merge it > in the gcc-6-branch. > > Tested the patch with aarch64-linux-gnu cross build, > aarch64-unknown-linux-gnu native build and make check. > We have also obtained company wide agreement with FSF for contributing > to gcc project. > > Thanks. > > > ChangeLog: > * config/aarch64/aarch64-cores.def (vulcan): New core > * config/aarch64/aarch64-tune.md: Regenerate > * doc/invoke.texi (AARCH64/mtune): Document vulcan as an available option. > > > > with regards, > Virendra Pathak From 8d065016856606740a3928518ed6a3f9933fb130 Mon Sep 17 00:00:00 2001 From: Virendra Pathak Date: Wed, 1 Jun 2016 03:15:33 -0700 Subject: [PATCH] [AArch64] Add -mcpu vulcan support --- gcc/config/aarch64/aarch64-cores.def | 1 + gcc/config/aarch64/aarch64-tune.md | 2 +- gcc/doc/invoke.texi | 4 ++-- 3 files changed, 4 insertions(+), 3 deletions(-) diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def index 251a3eb..b0acad9 100644 --- a/gcc/config/aarch64/aarch64-cores.def +++ b/gcc/config/aarch64/aarch64-cores.def @@ -46,6 +46,7 @@ AARCH64_CORE("cortex-a57", cortexa57, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AA AARCH64_CORE("cortex-a72", cortexa72, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08") AARCH64_CORE("exynos-m1", exynosm1, exynosm1, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1, "0x53", "0x001") AARCH64_CORE("qdf24xx", qdf24xx, cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800") +AARCH64_CORE("vulcan", vulcan,cortexa57, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x42", "0x516") AARCH64_CORE("thunderx",thunderx, thunderx, 8A, AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx, "0x43", "0x0a1") AARCH64_CORE("xgene1", xgene1,xgene1,8A, AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000") diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md index cbc6f48..c758a5f 100644 --- a/gcc/config/aarch64/aarch64-tune.md +++ b/gcc/config/aarch64/aarch64-tune.md @@ -1,5 +1,5 @@ ;; -*- buffer-read-only: t -*- ;; Generated automatically by gentune.sh from aarch64-cores.def (define_attr "tune" - "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53" + "cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,vulcan,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53" (const (symbol_ref "((enum attr_tune) aarch64_tune)"))) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 821f8fd..146042d 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12955,8 +12955,8 @@ processors implementing the target architecture. Specify the name of the target processor for which GCC should tune the performance of the code. Permissible values for this option are: @samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57}, -@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{thunderx}, -@samp{xgene1}. +@samp{cortex-a72}, @samp{exynos-m1}, @samp{qdf24xx}, @samp{vulcan}, +@samp{thunderx}, @samp{xgene1}. Additionally, this option can specify that GCC should tune the performance of the code for a big.LITTLE system. Permissible values for this -- 2.1.4
Re: [PATCH] c/70883 - inconsistent error message for calls to __builtin_add_overflow with too few arguments
Ping: This is a trivial patch to make the text of an error message for an insufficient number of arguments to a built-in consistent across different built-ins. Rather that sometimes saying "not enough arguments" and others "too few arguments" the patch makes GCC print "too few" in all cases. https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00093.html On 06/01/2016 03:09 PM, Martin Sebor wrote: Jakub, As you requested in the discussion of my arithmetic overflow built-in patch, attached is the subset of the patch to make consistent with other such diagnostics the text of the error message issued for insufficient numbers of arguments in calls to built-in functions such as __builtin_add_overflow, __builtin_constant_p, and others. Thanks Martin
[PATCH 9/9] rs6000: Separate shrink-wrapping
This implements the hooks for separate shrink-wrapping for rs6000. It handles GPRs and LR. The GPRs get a concern number corresponding to their register number; LR gets concern number 0. This improves specint by 0.9%, specfp by 0.8%, some separate benchmarks much more (on POWER8). It improves the hot path in various interpreters, and e.g. in glibc's malloc. 2016-06-07 Segher Boessenkool * config/rs6000/rs6000.c (machine_function): Add new fields gpr_is_wrapped_separately and lr_is_wrapped_separately. (TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS, TARGET_SHRINK_WRAP_CONCERNS_FOR_BB, TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS, TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS, TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS, TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS): Define. (rs6000_get_separate_concerns): New function. (rs6000_concerns_for_bb): New function. (rs6000_disqualify_concerns): New function. (rs6000_emit_prologue_concerns): New function. (rs6000_emit_epilogue_concerns): New function. (rs6000_set_handled_concerns): New function. (rs6000_emit_prologue): Don't emit LR save if lr_is_wrapped_separately. Don't emit GPR saves if gpr_is_wrapped_separately for that register. (restore_saved_lr): Don't restore LR if lr_is_wrapped_separately. (rs6000_emit_epilogue): Don't emit GPR restores if gpr_is_wrapped_separately for that register. Don't make a REG_CFA_RESTORE note for registers we did not restore, either. --- gcc/config/rs6000/rs6000.c | 257 ++--- 1 file changed, 242 insertions(+), 15 deletions(-) diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index c6b2b6a..af56d8e 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -152,6 +152,10 @@ typedef struct GTY(()) machine_function bool split_stack_argp_used; /* Flag if r2 setup is needed with ELFv2 ABI. */ bool r2_setup_needed; + /* The concerns already handled by separate shrink-wrapping, which should + not be considered by the prologue and epilogue. */ + bool gpr_is_wrapped_separately[32]; + bool lr_is_wrapped_separately; } machine_function; /* Support targetm.vectorize.builtin_mask_for_load. */ @@ -1511,6 +1515,19 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_SET_UP_BY_PROLOGUE #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue +#undef TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS +#define TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS rs6000_get_separate_concerns +#undef TARGET_SHRINK_WRAP_CONCERNS_FOR_BB +#define TARGET_SHRINK_WRAP_CONCERNS_FOR_BB rs6000_concerns_for_bb +#undef TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS +#define TARGET_SHRINK_WRAP_DISQUALIFY_CONCERNS rs6000_disqualify_concerns +#undef TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS +#define TARGET_SHRINK_WRAP_EMIT_PROLOGUE_CONCERNS rs6000_emit_prologue_concerns +#undef TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS +#define TARGET_SHRINK_WRAP_EMIT_EPILOGUE_CONCERNS rs6000_emit_epilogue_concerns +#undef TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS +#define TARGET_SHRINK_WRAP_SET_HANDLED_CONCERNS rs6000_set_handled_concerns + #undef TARGET_EXTRA_LIVE_ON_ENTRY #define TARGET_EXTRA_LIVE_ON_ENTRY rs6000_live_on_entry @@ -26111,6 +26128,201 @@ rs6000_global_entry_point_needed_p (void) return cfun->machine->r2_setup_needed; } +/* Implement TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS. */ +static sbitmap +rs6000_get_separate_concerns (void) +{ + rs6000_stack_t *info = rs6000_stack_info (); + + if (!(info->savres_strategy & SAVE_INLINE_GPRS) + || !(info->savres_strategy & REST_INLINE_GPRS) + || WORLD_SAVE_P (info)) +return NULL; + + sbitmap concerns = sbitmap_alloc (32); + bitmap_clear (concerns); + + /* The GPRs we need saved to the frame. */ + int reg_size = TARGET_32BIT ? 4 : 8; + int offset = info->gp_save_offset; + if (info->push_p) +offset += info->total_size; + + for (unsigned regno = info->first_gp_reg_save; regno < 32; regno++) +{ + if (IN_RANGE (offset, -0x8000, 0x7fff) + && rs6000_reg_live_or_pic_offset_p (regno)) + bitmap_set_bit (concerns, regno); + + offset += reg_size; +} + + /* Don't mess with the hard frame pointer. */ + if (frame_pointer_needed) +bitmap_clear_bit (concerns, HARD_FRAME_POINTER_REGNUM); + + /* Don't mess with the fixed TOC register. */ + if ((TARGET_TOC && TARGET_MINIMAL_TOC) + || (flag_pic == 1 && DEFAULT_ABI == ABI_V4) + || (flag_pic && DEFAULT_ABI == ABI_DARWIN)) +bitmap_clear_bit (concerns, RS6000_PIC_OFFSET_TABLE_REGNUM); + + /* Optimize LR save and restore if we can. This is concern 0. */ + if (info->lr_save_p + && !(flag_pic && (DEFAULT_ABI == ABI_V4 || DEFAULT_ABI == ABI_DARWIN))) +{ + offset = info->lr_save_offset; + if (info->push_p) + offset += info->total_size; +
[PATCH 8/9] shrink-wrap: shrink-wrapping for separate concerns
This is the main substance of this patch series. The big comment before the new code says: /* Separate shrink-wrapping Instead of putting all of the prologue and epilogue in one spot, we can put parts of it in places that are executed less frequently. The following code does this, for concerns that can have more than one prologue and epilogue, and where those pro- and epilogues can be executed more than once. What exactly is a concern is target-dependent. The more usual ones are simple saves of callee-saved registers to the frame. This code treats it abstractly (as an sbitmap of concerns), letting the target handle all details. Prologue concerns are placed in such a way that they are executed as infrequently as possible. Epilogue concerns are put everywhere where there is an edge from a bb dominated by such a prologue concern to a bb not dominated by one. Prologues and epilogues are preferably placed into a block, either at the beginning or end of it, if it is needed for all predecessor resp. successor edges; or placed on the edge otherwise. If the placement of any prologue/epilogue leads to a situation we cannot handle (for example, an abnormal edge would need to be split, or some targets want to use some specific registers that may not be available where we want to put them), separate shrink-wrapping for that concern is aborted. */ 2016-06-07 Segher Boessenkool * function.c (thread_prologue_and_epilogue_insns): Recompute the live info. Call try_shrink_wrapping_separate. Compute the prologue_seq afterwards, if it has possibly changed. Compute the split_prologue_seq and epilogue_seq later, too. * shrink-wrap.c: #include cfgbuild.h. (dump_concern): New function. (struct sw): New struct. (SW): New function. (init_separate_shrink_wrap): New function. (fini_separate_shrink_wrap): New function. (place_prologue_for_one_concern): New function. (spread_concerns): New function. (disqualify_problematic_concerns): New function. (do_common_heads_for_concerns): New function. (do_common_tails_for_concerns): New function. (insert_prologue_epilogue_for_concerns): New function. (try_shrink_wrapping_separate): New function. * shrink-wrap.h: Declare try_shrink_wrapping_separate. --- gcc/function.c| 15 +- gcc/shrink-wrap.c | 647 ++ gcc/shrink-wrap.h | 1 + 3 files changed, 661 insertions(+), 2 deletions(-) diff --git a/gcc/function.c b/gcc/function.c index c15d47d..cc5fa6b 100644 --- a/gcc/function.c +++ b/gcc/function.c @@ -5909,6 +5909,12 @@ make_epilogue_seq (void) void thread_prologue_and_epilogue_insns (void) { + if (optimize > 1) +{ + df_live_add_problem (); + df_live_set_all_dirty (); +} + df_analyze (); /* Can't deal with multiple successors of the entry block at the @@ -5919,9 +5925,7 @@ thread_prologue_and_epilogue_insns (void) edge entry_edge = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)); edge orig_entry_edge = entry_edge; - rtx_insn *split_prologue_seq = make_split_prologue_seq (); rtx_insn *prologue_seq = make_prologue_seq (); - rtx_insn *epilogue_seq = make_epilogue_seq (); /* Try to perform a kind of shrink-wrapping, making sure the prologue/epilogue is emitted only around those parts of the @@ -5929,6 +5933,13 @@ thread_prologue_and_epilogue_insns (void) try_shrink_wrapping (&entry_edge, prologue_seq); + df_analyze (); + try_shrink_wrapping_separate (entry_edge->dest); + if (crtl->shrink_wrapped_separate) +prologue_seq = make_prologue_seq (); + + rtx_insn *split_prologue_seq = make_split_prologue_seq (); + rtx_insn *epilogue_seq = make_epilogue_seq (); rtl_profile_for_bb (EXIT_BLOCK_PTR_FOR_FN (cfun)); diff --git a/gcc/shrink-wrap.c b/gcc/shrink-wrap.c index b85b1c3..6e157d0 100644 --- a/gcc/shrink-wrap.c +++ b/gcc/shrink-wrap.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "output.h" #include "tree-pass.h" #include "cfgrtl.h" +#include "cfgbuild.h" #include "params.h" #include "bb-reorder.h" #include "shrink-wrap.h" @@ -1006,3 +1007,649 @@ try_shrink_wrapping (edge *entry_edge, rtx_insn *prologue_seq) BITMAP_FREE (bb_with); free_dominance_info (CDI_DOMINATORS); } + +/* Separate shrink-wrapping + + Instead of putting all of the prologue and epilogue in one spot, we + can put parts of it in places that are executed less frequently. The + following code does this, for concerns that can have more than one + prologue and epilogue, and where those pro- and epilogues can be + executed more than once. + + What exactly is a concern is target-dependent. The more usual ones + are simple saves of callee-saved registers to the frame. This code + treats it abstractly (as an sbitmap of concer
[PATCH 5/9] regrename: Don't run if function was separately shrink-wrapped
Unfortunately even after the previous patch there are still a few cases where regrename creates invalid code when used together with separate shrink-wrapping. I haven't found the cause of those. This patch disables regrename for functions that were separately shrink-wrapped. 2016-06-07 Segher Boessenkool * regrename.c (gate): Disable regrename if shrink_wrapped_separate is set in crtl. --- gcc/regrename.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/gcc/regrename.c b/gcc/regrename.c index 00a5d02..5740b1a 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -1956,7 +1956,11 @@ public: /* opt_pass methods: */ virtual bool gate (function *) { - return (optimize > 0 && (flag_rename_registers)); + /* regrename creates wrong code for exception handling, if used together + with separate shrink-wrapping. Disable for now, until we have +figured out what exactly is going on. */ + return (optimize > 0 && flag_rename_registers + && !crtl->shrink_wrapped_separate); } virtual unsigned int execute (function *) { return regrename_optimize (); } -- 1.9.3
[PATCH 6/9] sel-sched: Don't mess with register restores
If selective scheduling copies register restores it confuses dwarf2cfi. 2016-06-07 Segher Boessenkool * sel-sched-ir.c (init_global_and_expr_for_insn): Don't copy instructions with a REG_CFA_RESTORE note. --- gcc/sel-sched-ir.c | 1 + 1 file changed, 1 insertion(+) diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c index 83f813a..4a3984a 100644 --- a/gcc/sel-sched-ir.c +++ b/gcc/sel-sched-ir.c @@ -3015,6 +3015,7 @@ init_global_and_expr_for_insn (insn_t insn) /* TRAP_IF though have an INSN code is control_flow_insn_p (). */ || control_flow_insn_p (insn) || volatile_insn_p (PATTERN (insn)) + || find_reg_note (insn, REG_CFA_RESTORE, NULL) || (targetm.cannot_copy_insn_p && targetm.cannot_copy_insn_p (insn))) force_unique_p = true; -- 1.9.3
[PATCH 7/9] cprop: Leave RTX_FRAME_RELATED_P instructions alone
Doing cprop on frame-related instructions blows up spectacularly. So don't. 2016-06-07 Segher Boessenkool * regcprop.c (copyprop_hardreg_forward_1): Don't change RTX_FRAME_RELATED_P instructions. --- gcc/regcprop.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/regcprop.c b/gcc/regcprop.c index 933cc8a..0c01aab 100644 --- a/gcc/regcprop.c +++ b/gcc/regcprop.c @@ -829,6 +829,10 @@ copyprop_hardreg_forward_1 (basic_block bb, struct value_data *vd) } } + /* Don't change prologue instructions. */ + if (RTX_FRAME_RELATED_P (insn)) + set = NULL; + /* Special-case plain move instructions, since we may well be able to do the move from a different register class. */ if (set && REG_P (SET_SRC (set))) -- 1.9.3
[PATCH 4/9] regrename: Don't rename restores
A restore is supposed to restore some certain register. Restoring it into some other register will not work. Don't. 2016-06-07 Segher Boessenkool * regrename.c (build_def_use): Invalidate chains that have a REG_CFA_RESTORE on some instruction. --- gcc/regrename.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/gcc/regrename.c b/gcc/regrename.c index 54c7768..00a5d02 100644 --- a/gcc/regrename.c +++ b/gcc/regrename.c @@ -1867,6 +1867,12 @@ build_def_use (basic_block bb) scan_rtx (insn, &XEXP (note, 0), NO_REGS, terminate_dead, OP_IN); } + + /* Step 8: Kill the chains involving register restores. Those +should restore _that_ register. */ + for (note = REG_NOTES (insn); note; note = XEXP (note, 1)) + if (REG_NOTE_KIND (note) == REG_CFA_RESTORE) + scan_rtx (insn, &XEXP (note, 0), NO_REGS, mark_all_read, OP_IN); } else if (DEBUG_INSN_P (insn) && !VAR_LOC_UNKNOWN_P (INSN_VAR_LOCATION_LOC (insn))) -- 1.9.3
[PATCH 1/9] separate shrink-wrap: New command-line flag, status flag, hooks, and doc
This patch adds a new command-line flag "-fshrink-wrap-separate", a status flag "shrink_wrapped_separate", hooks for abstracting the target concerns, and documentation for all those. 2016-06-07 Segher Boessenkool * common.opt (-fshrink-wrap-separate): New flag. * doc/invoke.texi: Document it. * doc/tm.texi.in (Shrink-wrapping separate concerns): New subsection. * doc/tm.texi: Regenerate. * emit-rtl.h (struct rtl_data): New field shrink_wrapped_separate. * target.def (shrink_wrap): New hook vector. (get_separate_concerns, concerns_for_bb, disqualify_concerns, emit_prologue_concerns, emit_epilogue_concerns, set_handled_concerns): New hooks. --- gcc/common.opt | 4 gcc/doc/invoke.texi | 11 ++- gcc/doc/tm.texi | 53 ++ gcc/doc/tm.texi.in | 29 +++ gcc/emit-rtl.h | 4 gcc/target.def | 56 + 6 files changed, 156 insertions(+), 1 deletion(-) diff --git a/gcc/common.opt b/gcc/common.opt index 2bb576c..b00f538 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2153,6 +2153,10 @@ Common Report Var(flag_shrink_wrap) Optimization Emit function prologues only before parts of the function that need it, rather than at the top of the function. +fshrink-wrap-separate +Common Report Var(flag_shrink_wrap_separate) Init(1) Optimization +Shrink-wrap parts of the prologue and epilogue separately. + fsignaling-nans Common Report Var(flag_signaling_nans) Optimization SetByCombined Disable optimizations observable by IEEE signaling NaNs. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e3a2399..21df5c5 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -394,7 +394,8 @@ Objective-C and Objective-C++ Dialects}. -fschedule-insns -fschedule-insns2 -fsection-anchors @gol -fselective-scheduling -fselective-scheduling2 @gol -fsel-sched-pipelining -fsel-sched-pipelining-outer-loops @gol --fsemantic-interposition -fshrink-wrap -fsignaling-nans @gol +-fsemantic-interposition -fshrink-wrap -fshrink-wrap-separate @gol +-fsignaling-nans @gol -fsingle-precision-constant -fsplit-ivs-in-unroller @gol -fsplit-paths @gol -fsplit-wide-types -fssa-backprop -fssa-phiopt @gol @@ -6263,6 +6264,7 @@ compilation time. -fmove-loop-invariants @gol -freorder-blocks @gol -fshrink-wrap @gol +-fshrink-wrap-separate @gol -fsplit-wide-types @gol -fssa-backprop @gol -fssa-phiopt @gol @@ -7180,6 +7182,13 @@ Emit function prologues only before parts of the function that need it, rather than at the top of the function. This flag is enabled by default at @option{-O} and higher. +@item -fshrink-wrap-separate +@opindex fshrink-wrap-separate +Shrink-wrap separate parts of the prologue and epilogue separately, so that +those parts are only executed when needed. +This option is on by default, but has no effect unless @option{-fshrink-wrap} +is also turned on. + @item -fcaller-saves @opindex fcaller-saves Enable allocation of values to registers that are clobbered by diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index b318615..c326599 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -2924,6 +2924,7 @@ This describes the stack layout and calling conventions. * Function Entry:: * Profiling:: * Tail Calls:: +* Shrink-wrapping separate concerns:: * Stack Smashing Protection:: * Miscellaneous Register Hooks:: @end menu @@ -4852,6 +4853,58 @@ This hook should add additional registers that are computed by the prologue to t True if a function's return statements should be checked for matching the function's return type. This includes checking for falling off the end of a non-void function. Return false if no such check should be made. @end deftypefn +@node Shrink-wrapping separate concerns +@subsection Shrink-wrapping separate concerns +@cindex shrink-wrapping separate concerns + +The prologue does a lot of separate things: save callee-saved registers, +do whatever needs to be done to be able to call things (save the return +address, align the stack, whatever; different for each target), set up a +stack frame, do whatever needs to be done for the static chain (if anything), +set up registers for PIC, etc. Using the following hooks those concerns +can be shrink-wrapped separately, so that the initialization (and possibly +teardown) for those concerns is not done on execution paths where it is +unnecessary. + +What exactly those concerns are is up to the target code; the generic +code treats them abstractly. + +@deftypefn {Target Hook} sbitmap TARGET_SHRINK_WRAP_GET_SEPARATE_CONCERNS (void) +This hook should return an @code{sbitmap} with the bits set for those +concerns that can be separately shrink-wrapped in the current function. +Return @code{NULL} if the current function should not get any separate +shrink-wrapping. +Don't define this hook if it would always retur
[PATCH 3/9] dce: Don't dead-code delete separately wrapped restores
Deleting restores (before a noreturn) that are dead confuses dwarf2cfi. 2016-06-07 Segher Boessenkool * dce.c (delete_unmarked_insns): Don't delete instructions with a REG_CFA_RESTORE note. --- gcc/dce.c | 9 + 1 file changed, 9 insertions(+) diff --git a/gcc/dce.c b/gcc/dce.c index ea3fb00..d510287 100644 --- a/gcc/dce.c +++ b/gcc/dce.c @@ -587,6 +587,15 @@ delete_unmarked_insns (void) if (!dbg_cnt (dce)) continue; + if (crtl->shrink_wrapped_separate + && find_reg_note (insn, REG_CFA_RESTORE, NULL)) + { + if (dump_file) + fprintf (dump_file, "DCE: NOT deleting insn %d, it's a " + "callee-save restore\n", INSN_UID (insn)); + continue; + } + if (dump_file) fprintf (dump_file, "DCE: Deleting insn %d\n", INSN_UID (insn)); -- 1.9.3
[PATCH 2/9] cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate
cfgcleanup would try to join noreturn paths with different concerns handled. This then fails in dwarf2cfi. 2016-06-07 Segher Boessenkool * cfgcleanup.c (outgoing_edges_match): Don't join noreturn calls if shrink_wrapped_separate. --- gcc/cfgcleanup.c | 5 + 1 file changed, 5 insertions(+) diff --git a/gcc/cfgcleanup.c b/gcc/cfgcleanup.c index 023b9d2..f08e4ee 100644 --- a/gcc/cfgcleanup.c +++ b/gcc/cfgcleanup.c @@ -1824,6 +1824,11 @@ outgoing_edges_match (int mode, basic_block bb1, basic_block bb2) || !find_reg_note (last1, REG_ARGS_SIZE, NULL))) return false; + /* If shrink-wrapping separate concerns, joining noreturn calls that + have different concerns set up will confuse dwarf2cfi. */ + if (!nonfakeedges && crtl->shrink_wrapped_separate) +return false; + /* fallthru edges must be forwarded to the same destination. */ if (fallthru1) { -- 1.9.3
[PATCH 0/9] separate shrink-wrapping
This patch series introduces separate shrink-wrapping. There are many things the prologue/epilogue of a function do, and most of those things can be done independently. For example, most of the time, for many targets, the save of callee-saved registers can be done later than the "main" prologue. Doing so helps quite a bit because the prologue is expensive for functions that do not need everything it does done for every path through the function; often, the hot paths do not need much at all, e.g. not those things the prologue needs to do for the function to call other functions. The first patch creates a command-line flag, some hooks, a status flag ("is this function wrapped separately", used by later passes), and documentation for these things. The next six patches are to prevent later passes from mishandling the epilogue instructions that now appear before the epilogue: mostly, you cannot do much to instructions with a REG_CFA_RESTORE note without confusing dwarf2cfi. The cprop one is for prologue instructions. Then, the main patch. And finally a patch for PowerPC that implements separate wrapping for GPRs and LR. Tested on powerpc64-linux (-m32/-m64, -mlra/-mno-lra), and on powerpc64le-linux. Previous versions of this series also tested on x86_64-linux. Is this okay for trunk? Segher Segher Boessenkool (9): separate shrink-wrap: New command-line flag, status flag, hooks, and doc cfgcleanup: Don't confuse CFI when -fshrink-wrap-separate dce: Don't dead-code delete separately wrapped restores regrename: Don't rename restores regrename: Don't run if function was separately shrink-wrapped sel-sched: Don't mess with register restores cprop: Leave RTX_FRAME_RELATED_P instructions alone shrink-wrap: shrink-wrapping for separate concerns rs6000: Separate shrink-wrapping gcc/cfgcleanup.c | 5 + gcc/common.opt | 4 + gcc/config/rs6000/rs6000.c | 257 -- gcc/dce.c | 9 + gcc/doc/invoke.texi| 11 +- gcc/doc/tm.texi| 53 gcc/doc/tm.texi.in | 29 ++ gcc/emit-rtl.h | 4 + gcc/function.c | 15 +- gcc/regcprop.c | 3 + gcc/regrename.c| 12 +- gcc/sel-sched-ir.c | 1 + gcc/shrink-wrap.c | 647 + gcc/shrink-wrap.h | 1 + gcc/target.def | 56 15 files changed, 1088 insertions(+), 19 deletions(-) -- 1.9.3
[PATCH] Add selftest for pretty-print.c (v2)
On Tue, 2016-06-07 at 12:02 +0200, Bernd Schmidt wrote: > On 06/06/2016 11:28 PM, David Malcolm wrote: > > + assert_pp_format ("0xcafebabe", "%wx", > > (HOST_WIDE_INT)0xcafebabe); > > More interesting tests would be to have multiple arguments to test > that > we really used the right size for the varargs. Maybe append a single > %d > arg with a unique bit pattern to all of them? > > Otherwise ok. Good idea. In the following I did it by adding 0x12345678 as a successor argument to each test. I chose that bit pattern on the grounds that each nybble is unique and non-zero. I printed them with %x to make it easier (I hope) to track down problems. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu. OK for trunk? gcc/ChangeLog: * pretty-print.c: Include "selftest.h". (pp_format): Fix comment. (identifier_to_locale): Likewise. (selftest::test_basic_printing): New function. (selftest::assert_pp_format): New function. (selftest::test_pp_format): New function. (selftest::pretty_print_c_tests): New function. * selftest-run-tests.c (selftest::run_tests): Call selftest::pretty_print_c_tests. * selftest.h (pretty_print_c_tests): New declaration. --- gcc/pretty-print.c | 164 ++- gcc/selftest-run-tests.c | 1 + gcc/selftest.h | 1 + 3 files changed, 165 insertions(+), 1 deletion(-) diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c index cc2b8cc..d805da4 100644 --- a/gcc/pretty-print.c +++ b/gcc/pretty-print.c @@ -24,6 +24,7 @@ along with GCC; see the file COPYING3. If not see #include "intl.h" #include "pretty-print.h" #include "diagnostic-color.h" +#include "selftest.h" #if HAVE_ICONV #include @@ -304,7 +305,7 @@ pp_indent (pretty_printer *pp) /* Formatting phases 1 and 2: render TEXT->format_spec plus TEXT->args_ptr into a series of chunks in pp_buffer (PP)->args[]. - Phase 3 is in pp_format_text. */ + Phase 3 is in pp_output_formatted_text. */ void pp_format (pretty_printer *pp, text_info *text) @@ -1203,3 +1204,164 @@ identifier_to_locale (const char *ident) return ret; } } + +#if CHECKING_P + +namespace selftest { + +/* Smoketest for pretty_printer. */ + +static void +test_basic_printing () +{ + pretty_printer pp; + pp_string (&pp, "hello"); + pp_space (&pp); + pp_string (&pp, "world"); + + ASSERT_STREQ ("hello world", pp_formatted_text (&pp)); +} + +/* Helper function for testing pp_format. + Verify that pp_format (FMT, ...) followed by pp_output_formatted_text + prints EXPECTED, assuming that pp_show_color is SHOW_COLOR. */ + +static void +assert_pp_format_va (const char *expected, bool show_color, const char *fmt, +va_list *ap) +{ + pretty_printer pp; + text_info ti; + rich_location rich_loc (line_table, UNKNOWN_LOCATION); + + ti.format_spec = fmt; + ti.args_ptr = ap; + ti.err_no = 0; + ti.x_data = NULL; + ti.m_richloc = &rich_loc; + + pp_show_color (&pp) = show_color; + pp_format (&pp, &ti); + pp_output_formatted_text (&pp); + ASSERT_STREQ (expected, pp_formatted_text (&pp)); +} + +/* Verify that pp_format (FMT, ...) followed by pp_output_formatted_text + prints EXPECTED, with show_color disabled. */ + +static void +assert_pp_format (const char *expected, const char *fmt, ...) +{ + va_list ap; + + va_start (ap, fmt); + assert_pp_format_va (expected, false, fmt, &ap); + va_end (ap); +} + +/* As above, but with colorization enabled. */ + +static void +assert_pp_format_colored (const char *expected, const char *fmt, ...) +{ + va_list ap; + + va_start (ap, fmt); + assert_pp_format_va (expected, true, fmt, &ap); + va_end (ap); +} + +/* Verify that pp_format works, for various format codes. */ + +static void +test_pp_format () +{ + /* Avoid introducing locale-specific differences in the results + by hardcoding open_quote and close_quote. */ + const char *old_open_quote = open_quote; + const char *old_close_quote = close_quote; + open_quote = "`"; + close_quote = "'"; + + /* Verify that plain text is passed through unchanged. */ + assert_pp_format ("unformatted", "unformatted"); + + /* Verify various individual format codes, in the order listed in the + comment for pp_format above. For each code, we append a second + argument with a known bit pattern (0x12345678), to ensure that we + are consuming arguments correctly. */ + assert_pp_format ("-27 12345678", "%d %x", -27, 0x12345678); + assert_pp_format ("-5 12345678", "%i %x", -5, 0x12345678); + assert_pp_format ("10 12345678", "%u %x", 10, 0x12345678); + assert_pp_format ("17 12345678", "%o %x", 15, 0x12345678); + assert_pp_format ("cafebabe 12345678", "%x %x", 0xcafebabe, 0x12345678); + assert_pp_format ("-27 12345678", "%ld %x", (long)-27, 0x12345678); + assert_pp_format ("-5 12345678", "%li %x", (long)-5, 0x12345678); + assert_pp_format ("10 12345678", "%lu %x", (long)10, 0x123456
Re: [PATCH] Selftest framework (v7)
On Tue, Jun 07, 2016 at 10:18:32AM -0400, David Malcolm wrote: > On Mon, 2016-06-06 at 22:14 -0400, Trevor Saunders wrote: > > On Mon, Jun 06, 2016 at 11:57:49PM +0200, Jakub Jelinek wrote: > > > On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote: > > > > > > As far as I can > > > > > > tell this just involves moving the start of namespace > > > > > > selftest > > > > > > upwards a > > > > > > bit in the files where we have tests. > > > > > > > > > > Yes, and it does seem cleaner to have all of the selftest code > > > > > start > > > > > like this: > > > > > > > > > > #if CHECKING_P > > > > > > > > What are we gaining by ifdefing this? I would think on reasonable > > > > systems the compiler would optimize out the call to the selftests > > > > in > > > > release builds and then the linker would gc all the unused > > > > functions. > > > > Do we really care about code size in places that doesn't happen > > > > enough > > > > to go through this? > > > > > > Not everyone is building the compiler with LTO, and if you don't, > > > then > > > how would you optimize that away? > > > > -ffunction-sections -Wl,--gc-sections should be enough I think. I > > guess > > we don't use those at the moment though. > > > > > And yes, not having the self-tests, especially if they are going to > > > grow > > > further, in release compilers is desirable, especially if it would > > > be > > > intermixed with hot code. > > > > That's fair, though turning on --gc-sections where we can should > > further > > help with that, and that should be more effective with > > -ffunction-sections -fdata-sections, so its seems to me like the > > right > > thing to do is add configure tests to enable those? And then its > > more > > of a non issue? > > I appreciate that you'd done a lot of work on eliminating preprocessor > use in gcc, and that we'd prefer to minimize the amount of #if code we > have - though it's relatively easy to test the with/without #if > CHECKING_P case (compared to all of the various target-specific > macros). yeah, I certainly agree CHECKING_P is one of the more defensable macros. > Historically gcc testing has largely been "black box" testing: run the > built programs with specific inputs and look for specific outputs. My > hope with -fself-tests is that we can build up our "white box" test > coverage to complement the above, with unit tests. My favorite example > is the testing for gengtype that I posted as a followup, which gives us > some immediate test coverage that gengtype is working as expected, > which is hard to do using our traditional approach to testing. I'd probably say testing hash tables / vec is more useful, but whatever ;) > I hope that we can add a lot more tests to -fself-test, in particular, > I want us to have unit tests for hot code, including code that's hidden > deep inside the implementation and that might normally get inlined > away. To play Devil's Advocate: if we find we're able to do a release > build with the selftests enabled and that it isn't slowing down the > release build, does that imply we need more unit tests for our hottest > code? (the counterargument being that the checked build still needs to > run in a bearable amount of time). by enabled do you mean we build the code or actually run the tests on every compile? I would think basically any tests are two slow for the latter. For the former I would say building the self test code should have very little effect on performance since its all separate functions. There's some of course because of call sites effecting inlining etc, but we can probably minimize that with -ffunction-sections --gc-sections and attribute cold? > That said, I want -fself-test to always run quickly (e.g. less than a > second); let's not put anything slow in there. Also, any unit tests > involving analyzing several gimple or RTL statements at once seem to be > easier to do via the gimple and RTL frontends that Prasad and I are > working on respectively. (I think we can use -fself-test for unit > -testing implementation details of passes that involve one statement at > a time, but as soon as we start dealing with multiple statements and > control flow, that it's probably better to express it using a > gimple/RTL dump in DejaGnu form). That makes sense. Trev > > > Hope the above sounds sane > Dave
Re: [C++ Patch] Fix some simple location issues
Hi, On 07/06/2016 21:40, Jason Merrill wrote: For diagnostics about an initializer, I think it would be better to give the location of the initializer rather than the declaration. Maybe use location_t loc = EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl)); Now that you are telling me this, I think we already briefly discussed it in the past... Anyway, I investigated it a little bit a couple of days ago and noticed that, in fact, other front-ends differ about many of those details, even when normally they have accurate locations. In some cases, for example, clang outputs the primary caret under the declaration and only the secondary wiggly line under the initializer, but only in some cases (see, for example, g++.dg/init/brace6.C, both in c++98 and c++11). Anyway, in practice, for this first round of changes, I tried your suggestion above in 4 places, in maybe_deduce_size_from_array_init and in check_initializer, but in practice doesn't currently make a difference, at least for the testcases, for lack of usable locations. Certainly could in the future! The below still passes testing... Thanks, Paolo. / Index: cp/decl.c === --- cp/decl.c (revision 237171) +++ cp/decl.c (working copy) @@ -1393,7 +1393,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool { if (DECL_INITIAL (olddecl)) inform (DECL_SOURCE_LOCATION (olddecl), - "previous definition of %q+D was here", olddecl); + "previous definition of %qD was here", olddecl); else inform (DECL_SOURCE_LOCATION (olddecl), "previous declaration of %qD was here", olddecl); @@ -5266,13 +5266,16 @@ maybe_deduce_size_from_array_init (tree decl, tree do_default); if (failure == 1) { - error ("initializer fails to determine size of %qD", decl); + error_at (EXPR_LOC_OR_LOC (initializer, +DECL_SOURCE_LOCATION (decl)), + "initializer fails to determine size of %qD", decl); } else if (failure == 2) { if (do_default) { - error ("array size missing in %qD", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "array size missing in %qD", decl); } /* If a `static' var's size isn't known, make it extern as well as static, so it does not get allocated. If it's not @@ -5283,7 +5286,8 @@ maybe_deduce_size_from_array_init (tree decl, tree } else if (failure == 3) { - error ("zero-size array %qD", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "zero-size array %qD", decl); } } @@ -5322,7 +5326,8 @@ layout_var_decl (tree decl) /* An automatic variable with an incomplete type: that is an error. Don't talk about array types here, since we took care of that message in grokdeclarator. */ - error ("storage size of %qD isn%'t known", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "storage size of %qD isn%'t known", decl); TREE_TYPE (decl) = error_mark_node; } #if 0 @@ -5345,7 +5350,8 @@ layout_var_decl (tree decl) constant_expression_warning (DECL_SIZE (decl)); else { - error ("storage size of %qD isn%'t constant", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "storage size of %qD isn%'t constant", decl); TREE_TYPE (decl) = error_mark_node; } } @@ -5954,7 +5960,8 @@ check_array_initializer (tree decl, tree type, tre if (!COMPLETE_TYPE_P (complete_type (element_type))) { if (decl) - error ("elements of array %q#D have incomplete type", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "elements of array %q#D have incomplete type", decl); else error ("elements of array %q#T have incomplete type", type); return true; @@ -6018,7 +6025,8 @@ check_initializer (tree decl, tree init, int flags } else if (!COMPLETE_TYPE_P (type)) { - error ("%q#D has incomplete type", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "%q#D has incomplete type", decl); TREE_TYPE (decl) = error_mark_node; return NULL_TREE; } @@ -6038,8 +6046,9 @@ check_initializer (tree decl, tree init, int flags } else if (init_len != 1 && TREE_CODE (type) != COMPLEX_TYPE) { - error ("scalar object %qD requires one element in initializer", -decl); + error_at (EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl)), + "scalar object %qD requires one element in
Re: [PATCH][2/3] Vectorize inductions that are live after the loop
On Tue, 7 Jun 2016, Alan Hayward wrote: > testsuite/ > * gcc.dg/vect/vect-live-2.c : Likewise > * gcc.dg/vect/vect-live-slp-3.c : Update effective target > * lib/target-supports.exp : Add aarch64 to vect_long "Likewise" entries usually follow those they refer to (i.e., while ChangeLog files are "latest first", any individual entry is read "top-down"). And no space after the filename, and a full stop at the end of each entry starting with a "*". Gerald
Re: [PATCH] integer overflow checking builtins in constant expressions
+The built-in functions promote the first two operands into infinite precision signed type +and perform addition on those promoted operands. The result is then +cast to the type the third argument. The above is missing an "of" (it should read "type of the third argument".) Martin
Re: [PATCH] Document POWER's -mhtm and -mno-htm options.
On 6/7/16 12:49 PM, David Edelsohn wrote: On Tue, Jun 7, 2016 at 1:18 PM, Peter Bergner wrote: It seems we (ok, me) forgot to document the -mhtm option for POWER. This bootstrapped fine and the generated docs looked good. Is this ok for trunk and the open release branches? Peter * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and -mno-htm. Okay. Ok, committed everywhere. Thanks! Peter
Re: C PATCH for c/71418 (ICE with bogus array dimension and _Alignas)
On Tue, 7 Jun 2016, Marek Polacek wrote: > Another error-recovery bug. This time TYPE was error_mark_node but > min_align_of_type accessed it via TYPE_ALIGN. So we better check that we're > operating on a type. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: C PATCH for c/71426 (ICE with bogus parameter)
On Tue, 7 Jun 2016, Marek Polacek wrote: > cc1 crashed on the following invalid code, because it tripped this assert. > The reason why b->nested was false for the FUNCTION_DECL 'x' was that when > doing pushdecl, we found an incompatible duplicate, so pushdecl just bound > 'x' to FUNCTION_DECL 'x', replacing the old binding, without the TREE_PUBLIC > test that sets 'nested'. > > At first I thought about removing the assert but in the end I decided to add > || seen_error -- for invalid code we might do the same as for ERROR_MARK. > > Bootstrapped/regtested on x86_64-linux, ok for trunk? OK. -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH] integer overflow checking builtins in constant expressions
On Tue, Jun 07, 2016 at 02:30:06PM -0600, Martin Sebor wrote: > I've also since discovered that the handling of (at least) > BUILTIN_LINE is important in C++ 98 mode and if I had run > the c-c++-common tests instead of just the strict C++ subset > I would have seen the c-c++-common/builtin_location.c test > fail with the same error. > > I suspect I added the other built-ins to the function to get > the Clang-compatibility typed built-ins to work in C++ 98 > (with the default argument). When it was decided that > the Clang built-ins shouldn't change I removed the tests > but not this change. Here is the updated patch, with the extra testcase (the previous C++ only testcases were only c++11 and c++14 effective targets) and the BUILTIN_{ADD,SUB,MUL}_OVERFLOW_P kept in cp/tree.c. Tested on x86_64-linux, ok for trunk? 2016-06-07 Martin Sebor Jakub Jelinek PR c++/70507 PR c/68120 * builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P, BUILT_IN_MUL_OVERFLOW_P): New builtins. * builtins.c: Include gimple-fold.h. (fold_builtin_arith_overflow): Handle BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. (fold_builtin_3): Likewise. * doc/extend.texi (Integer Overflow Builtins): Document __builtin_{add,sub,mul}_overflow_p. gcc/c/ * c-typeck.c (convert_arguments): Don't promote last argument of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. gcc/cp/ * constexpr.c: Include gimple-fold.h. (cxx_eval_internal_function): New function. (cxx_eval_call_expression): Call it. (potential_constant_expression_1): Handle integer arithmetic overflow built-ins. * tree.c (builtin_valid_in_constant_expr_p): Handle BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. gcc/c-family/ * c-common.c (check_builtin_function_arguments): Handle BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. gcc/testsuite/ * c-c++-common/builtin-arith-overflow-1.c: Add test cases. * c-c++-common/builtin-arith-overflow-2.c: New test. * g++.dg/ext/builtin-arith-overflow-1.C: New test. * g++.dg/cpp0x/constexpr-arith-overflow.C: New test. * g++.dg/cpp1y/constexpr-arith-overflow.C: New test. --- gcc/builtins.def.jj 2016-06-06 14:40:35.619347198 +0200 +++ gcc/builtins.def2016-06-07 17:46:52.034206039 +0200 @@ -710,6 +710,9 @@ DEF_C94_BUILTIN(BUILT_IN_TOWUPPE DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW, "add_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW, "sub_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW, "mul_overflow", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) +DEF_GCC_BUILTIN(BUILT_IN_ADD_OVERFLOW_P, "add_overflow_p", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) +DEF_GCC_BUILTIN(BUILT_IN_SUB_OVERFLOW_P, "sub_overflow_p", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) +DEF_GCC_BUILTIN(BUILT_IN_MUL_OVERFLOW_P, "mul_overflow_p", BT_FN_BOOL_VAR, ATTR_NOTHROW_TYPEGENERIC_LEAF) /* Clang compatibility. */ DEF_GCC_BUILTIN(BUILT_IN_SADD_OVERFLOW, "sadd_overflow", BT_FN_BOOL_INT_INT_INTPTR, ATTR_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_SADDL_OVERFLOW, "saddl_overflow", BT_FN_BOOL_LONG_LONG_LONGPTR, ATTR_NOTHROW_LEAF_LIST) --- gcc/builtins.c.jj 2016-06-06 14:40:35.601347427 +0200 +++ gcc/builtins.c 2016-06-07 17:46:51.958207014 +0200 @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3. #include "rtl-chkp.h" #include "internal-fn.h" #include "case-cfn-macros.h" +#include "gimple-fold.h" struct target_builtins default_target_builtins; @@ -7943,18 +7944,28 @@ fold_builtin_unordered_cmp (location_t l /* Fold __builtin_{,s,u}{add,sub,mul}{,l,ll}_overflow, either into normal arithmetics if it can never overflow, or into internal functions that return both result of arithmetics and overflowed boolean flag in - a complex integer result, or some other check for overflow. */ + a complex integer result, or some other check for overflow. + Similarly fold __builtin_{add,sub,mul}_overflow_p to just the overflow + checking part of that. */ static tree fold_builtin_arith_overflow (location_t loc, enum built_in_function fcode, tree arg0, tree arg1, tree arg2) { enum internal_fn ifn = IFN_LAST; - tree type = TREE_TYPE (TREE_TYPE (arg2)); - tree mem_arg2 = build_fold_indirect_ref_loc (loc, arg2); + /* The code of the expression corresponding to the type-generic + built-in, or ERROR_MARK for the type-specific ones. */ + enum tree_code opcode = ERROR_MARK; + bool ovf_only = false; + switch (fcode) { +case BUILT_IN_ADD_OVERFLOW_P: + ovf_only = true; + /* FALLTHRU */ case BUILT_IN_ADD_OVERFLOW: + opcode = PLUS_EXPR; + /* FALLTHRU */ case BUILT_IN_SADD_OVERFLOW: case BUILT_IN_SADDL_OVERFLOW: case BU
Re: [PATCH] integer overflow checking builtins in constant expressions
On 06/07/2016 01:42 PM, Jakub Jelinek wrote: On Tue, Jun 07, 2016 at 03:35:28PM -0400, Jason Merrill wrote: Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in the testsuite, nor e.g. enum A { B = 1, C = 2, D = __builtin_add_overflow_p (B, C, C) }; int e[__builtin_add_overflow_p (B, C, C) + 1]; template int foo (int); void bar () { foo <__builtin_add_overflow_p (B, C, C) + 1> (0); } That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in various of them I see that without the change it actually sets *non_integral_constant_expression_p = true; or something similar, but I have no idea why it still works despite of that. Ah, that's the C++98 constant expression handling, which is a lot more restricted. C++11 and up don't care about that flag. Oops, actually, it seems even the cp/tree.c hunks are significant: I've only compiled the above testcase with the default -std=c++14, with -std=c++98 without the cp/tree.c bits I get: a.C:4:7: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a constant-expression D = __builtin_add_overflow_p (B, C, C) ^~~~ a.C:4:40: error: a function call cannot appear in a constant-expression D = __builtin_add_overflow_p (B, C, C) ^ a.C:6:45: error: array bound is not an integer constant before ‘]’ token int e[__builtin_add_overflow_p (B, C, C) + 1]; ^ a.C: In function ‘void bar()’: a.C:11:8: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a constant-expression foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^~~~ a.C:11:41: error: a function call cannot appear in a constant-expression foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ a.C:11:50: error: no matching function for call to ‘foo(int)’ foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ a.C:7:22: note: candidate: template int foo(int) template int foo (int); ^~~ a.C:7:22: note: template argument deduction/substitution failed: a.C:11:50: error: template argument 1 is invalid foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ Though, maybe it is only worth supporting the __builtin_*_overflow_p builtins for C++98 integer constant expression contexts and thus only handle there the 3 builtins instead of all the others. I guess I should add this testcase to the testsuite then. I've also since discovered that the handling of (at least) BUILTIN_LINE is important in C++ 98 mode and if I had run the c-c++-common tests instead of just the strict C++ subset I would have seen the c-c++-common/builtin_location.c test fail with the same error. I suspect I added the other built-ins to the function to get the Clang-compatibility typed built-ins to work in C++ 98 (with the default argument). When it was decided that the Clang built-ins shouldn't change I removed the tests but not this change. Martin
Re: tuple code simplification patch
Fully tested and committed to trunk. François On 06/06/2016 12:17, Jonathan Wakely wrote: On 02/06/16 23:00 +0200, François Dumont wrote: Hi I was trying to play with tuple implementation and was annoyed by repetition of _Head type when instantiating _Head_base so I thought about this patch. How do you like it ? I still need to run tests, will do before commit, ok ? Looks good, thanks. OK for trunk when testing passes.
Re: JIT patch: add gcc_jit_magic_int
On Tue, Jun 7, 2016 at 12:22 PM, Andrew Pinski wrote: > On Tue, Jun 7, 2016 at 12:19 PM, Andrew Pinski wrote: >> On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch >> wrote: >>> Hello All, >>> >>> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is >>> difficult (or tricky without using dirty tricks involving the GCC plugin >>> headers) to use GCCJIT to emit code equivalent to the following C file: >>> >>>extern int a; >>>int get_atomic_a (void) { >>> return __atomic_load_n (&a, __ATOMIC_SEQ_CST); >>>} >>> >>> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but >>> non-standard!) symbol which might not be available >>> (or might have a different value) in the C code for GCCJIT building such an >>> AST. >> >>> >>> So we need a function to retrieve some magic integral value from the GCCJIT >>> compiler. >> >> Huh?Why can't you just use the enum: >> enum memmodel >> { >> MEMMODEL_RELAXED = 0, >> MEMMODEL_CONSUME = 1, >> MEMMODEL_ACQUIRE = 2, >> MEMMODEL_RELEASE = 3, >> MEMMODEL_ACQ_REL = 4, >> MEMMODEL_SEQ_CST = 5, >> MEMMODEL_LAST = 6, >> MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC, >> MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC, >> MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC >> }; >> >> >> For the values? I don't understand why you need to have a direct >> relationship to preprocessed macros to these values here. >> Have your own lookup table inside your own JIT rather than a generic >> way of doing it. >> Having a generic way seems more incorrect way of doing it and even >> says that the JIT is supposed to JITting C code but it is not JITing C >> code. > > > That is move enum memmodel to be included in the JIT API too. Or rather have an API which the __atomic_* for you and all you need to supply is the the "JIT" memory model and the locations. Thanks, Andrew > >> >> Thanks, >> Andrew Pinski >> >>> >>> The attached patch (relative to trunk svn 236583) is a first attempt to >>> solve that issue >>> (and also give ability to query some other magic numbers). >>> >>> Proposed ChangeLog entry (in gcc/jit/) >>> >>> 2016-05-23 Basile Starynkevitch >>> * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro. >>> (gcc_jit_magic_int): New public function declaration. >>> >>> * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h" >>> (gcc_jit_magic_int): New function. >>> >>> * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6. >>> >>> Comments (or an ok to commit) are welcome. (I am not sure that >>> __SANITIZE_ADDRESS__ is correctly handled, >>> because I would believe that optimization flags are not globals in GCCJIT) >>> >>> Regards. >>> >>> -- >>> Basile STARYNKEVITCH http://starynkevitch.net/Basile/ >>> email: basilestarynkevitchnet mobile: +33 6 8501 2359 >>> 8, rue de la Faiencerie, 92340 Bourg La Reine, France >>> *** opinions {are only mine, sont seulement les miennes} *** >>>
Re: [PATCH] integer overflow checking builtins in constant expressions
On Tue, Jun 07, 2016 at 03:35:28PM -0400, Jason Merrill wrote: > >Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in > >the testsuite, nor e.g. > >enum A { > > B = 1, > > C = 2, > > D = __builtin_add_overflow_p (B, C, C) > >}; > >int e[__builtin_add_overflow_p (B, C, C) + 1]; > >template int foo (int); > >void > >bar () > >{ > > foo <__builtin_add_overflow_p (B, C, C) + 1> (0); > >} > >That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in > >various of them I see that without the change it actually sets > >*non_integral_constant_expression_p = true; > >or something similar, but I have no idea why it still works despite of that. > > Ah, that's the C++98 constant expression handling, which is a lot more > restricted. C++11 and up don't care about that flag. Oops, actually, it seems even the cp/tree.c hunks are significant: I've only compiled the above testcase with the default -std=c++14, with -std=c++98 without the cp/tree.c bits I get: a.C:4:7: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a constant-expression D = __builtin_add_overflow_p (B, C, C) ^~~~ a.C:4:40: error: a function call cannot appear in a constant-expression D = __builtin_add_overflow_p (B, C, C) ^ a.C:6:45: error: array bound is not an integer constant before ‘]’ token int e[__builtin_add_overflow_p (B, C, C) + 1]; ^ a.C: In function ‘void bar()’: a.C:11:8: error: ‘bool __builtin_add_overflow_p(...)’ cannot appear in a constant-expression foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^~~~ a.C:11:41: error: a function call cannot appear in a constant-expression foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ a.C:11:50: error: no matching function for call to ‘foo(int)’ foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ a.C:7:22: note: candidate: template int foo(int) template int foo (int); ^~~ a.C:7:22: note: template argument deduction/substitution failed: a.C:11:50: error: template argument 1 is invalid foo <__builtin_add_overflow_p (B, C, C) + 1> (0); ^ Though, maybe it is only worth supporting the __builtin_*_overflow_p builtins for C++98 integer constant expression contexts and thus only handle there the 3 builtins instead of all the others. I guess I should add this testcase to the testsuite then. Jakub
Re: [C++ Patch] Fix some simple location issues
For diagnostics about an initializer, I think it would be better to give the location of the initializer rather than the declaration. Maybe use location_t loc = EXPR_LOC_OR_LOC (init, DECL_SOURCE_LOCATION (decl)); ? Jason
Re: [PATCH] integer overflow checking builtins in constant expressions
On 06/07/2016 11:51 AM, Martin Sebor wrote: Unless I hear otherwise I'll submit a separate patch removing the BUILTIN_FILE, FUNCTION, and LINE labels (and adjust the comment to more closely reflect the function's purpose). Since Jakub pointed out that this function is used by the parser for C++98 constant-expressions, I think we can leave these labels alone. A comment tweak wouldn't hurt, though. Jason
Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
On 26 May 2016 at 11:53, Kyrill Tkachov wrote: > Hi all, > > In this PR we want to optimise: > int foo (int i) > { > return (i == 0) ? N : __builtin_clz (i); > } > > on targets where CLZ is defined at zero to the constant 'N'. > This is determined at the RTL level through the CLZ_DEFINED_VALUE_AT_ZERO > macro. > The obvious place to implement this would be in combine through simplify-rtx > where we'd > recognise an IF_THEN_ELSE of the form: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:SI r2) > (const_int 0 [0])) >(clz:SI (reg:SI r2)) >(const_int 32))) > > and if CLZ_DEFINED_VALUE_AT_ZERO is defined to 32 for SImode we'd simplify > it into > just (clz:SI (reg:SI r2)). > However, I found this doesn't quite happen for a couple of reasons: > 1) This depends on ifcvt or some other pass to have created a conditional > move of the > two branches that provide the IF_THEN_ELSE to propagate the const_int and > clz operation into. > > 2) Combine will refuse to propagate r2 from the above example into both the > condition and the > CLZ at the same time, so the most we see is: > (set (reg:SI r1) > (if_then_else:SI (ne (reg:CC cc) > (const_int 0)) >(clz:SI (reg:SI r2)) >(const_int 32))) > > which is not enough information to perform the simplification. > > This patch implements the optimisation in ce1 using the noce ifcvt > framework. > During ifcvt noce_process_if_block can see that we're trying to optimise > something > of the form (x == 0 ? const_int : CLZ (x)) and so it has visibility of all > the information > needed to perform the transformation. > > The transformation is performed by adding a new noce_try* function that > tries to put the > condition and the 'then' and 'else' arms into an IF_THEN_ELSE rtx and try to > simplify that > using the simplify-rtx machinery. That way, we can implement the > simplification logic in > simplify-rtx.c where it belongs. > > A similar transformation for CTZ is implemented as well. > So for code: > int foo (int i) > { > return (i == 0) ? 32 : __builtin_clz (i); > } > > On aarch64 we now emit: > foo: > clz w0, w0 > ret > > instead of: > foo: > mov w1, 32 > clz w2, w0 > cmp w0, 0 > cselw0, w2, w1, ne > ret > > and for arm similarly we generate: > foo: > clz r0, r0 > bx lr > > instead of: > foo: > cmp r0, #0 > clzne r0, r0 > moveq r0, #32 > bx lr > > > and for x86_64 with -O2 -mlzcnt we generate: > foo: > xorl%eax, %eax > lzcntl %edi, %eax > ret > > instead of: > foo: > xorl%eax, %eax > movl$32, %edx > lzcntl %edi, %eax > testl %edi, %edi > cmove %edx, %eax > ret > > > I tried getting this to work on other targets as well, but encountered > difficulties. > For example on powerpc the two arms of the condition seen during ifcvt are: > > (insn 4 22 11 4 (set (reg:DI 156 [ ]) > (const_int 32 [0x20])) clz.c:3 434 {*movdi_internal64} > (nil)) > and > (insn 10 9 23 3 (set (subreg/s/u:SI (reg:DI 156 [ ]) 0) > (clz:SI (subreg/u:SI (reg/v:DI 157 [ i ]) 0))) clz.c:3 132 {clzsi2} > (expr_list:REG_DEAD (reg/v:DI 157 [ i ]) > (nil))) > > So the setup code in noce_process_if_block sees that the set destination is > not the same > ((reg:DI 156 [ ]) and (subreg/s/u:SI (reg:DI 156 [ ]) 0)) > so it bails out on the rtx_interchangeable_p (x, SET_DEST (set_b)) check. > I suppose that's a consequence of how SImode operations are represented in > early RTL > on powerpc, I don't know what to do there. Perhaps that part of ivcvt can be > taught to handle > destinations that are subregs of one another, but that would be a separate > patch. > > Anyway, is this patch ok for trunk? > > Bootstrapped and tested on arm-none-linux-gnueabihf, aarch64-none-linux-gnu, > x86_64-pc-linux-gnu. > > Thanks, > Kyrill > > 2016-05-26 Kyrylo Tkachov > > PR middle-end/37780 > * ifcvt.c (noce_try_ifelse_collapse): New function. > Declare prototype. > (noce_process_if_block): Call noce_try_ifelse_collapse. > * simplify-rtx.c (simplify_cond_clz_ctz): New function. > (simplify_ternary_operation): Use the above to simplify > conditional CLZ/CTZ expressions. > > 2016-05-26 Kyrylo Tkachov > > PR middle-end/37780 > * gcc.c-torture/execute/pr37780.c: New test. > * gcc.target/aarch64/pr37780_1.c: Likewise. > * gcc.target/arm/pr37780_1.c: Likewise. Hi Kyrylo, I've noticed that gcc.target/arm/pr37780_1.c fails on arm if arch < v6. I first tried to fix the effective-target guard (IIRC, the doc says clz is available starting with v5t), but that isn't sufficient. When compiling for armv5-t, the scan-assembler directives fail. It seems to work with v6t2, so I am wondering whether it's just a matter of increasing the effective-target ar
Re: [PATCH] integer overflow checking builtins in constant expressions
On 06/07/2016 12:34 PM, Jakub Jelinek wrote: On Tue, Jun 07, 2016 at 09:51:05AM -0600, Martin Sebor wrote: On 06/07/2016 08:32 AM, Jason Merrill wrote: On 06/06/2016 08:36 AM, Jakub Jelinek wrote: @@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_ case BUILT_IN_FUNCTION: case BUILT_IN_LINE: + /* The following built-ins are valid in constant expressions + when their arguments are. */ +case BUILT_IN_ADD_OVERFLOW: +case BUILT_IN_ADD_OVERFLOW_P: Why is this necessary? builtin_valid_in_constant_expr_p is only needed for builtins that can have constant values even if their operands are non-constant, which doesn't apply here. For that matter, I don't see why we needed to add _FUNCTION et al, either. I don't remember what prompted me to change the function in either patch. Perhaps it was the comment above the function that says this: /* Test whether DECL is a builtin that may appear in a constant-expression. */ But removing the change doesn't seem to affect the C++ test results for the two features so it looks like the change may not be needed. Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in the testsuite, nor e.g. enum A { B = 1, C = 2, D = __builtin_add_overflow_p (B, C, C) }; int e[__builtin_add_overflow_p (B, C, C) + 1]; template int foo (int); void bar () { foo <__builtin_add_overflow_p (B, C, C) + 1> (0); } That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in various of them I see that without the change it actually sets *non_integral_constant_expression_p = true; or something similar, but I have no idea why it still works despite of that. Ah, that's the C++98 constant expression handling, which is a lot more restricted. C++11 and up don't care about that flag. Is the patch ok for trunk without the cp/tree.c hunk? Yes. Jason
Re: [PATCH, rs6000] Add support for char, short, and int versions of vec_mul
On Tue, Jun 07, 2016 at 12:47:52PM -0500, Bill Seurer wrote: > Bootstrapped and tested on powerpc64le-unknown-linux-gnu and > powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk? Okay, thanks! Segher
Update probabilities in predict.def to match reality
Hello, Maritn Liska measured branch predictor hitrates on current tree and SPEC2006. CPU2006 HEURISTICS BRANCHES (REL) HITRATE COVERAGE COVERAGE (REL) loop iv compare33 0.1% 20.27% / 86.24% 30630826 30.63M 0.0% no prediction 10406 19.5% 33.41% / 84.76% 139755242456 139.76G 14.1% early return (on trees) 6328 11.9% 54.20% / 86.48% 33569991740 33.57G 3.4% guessed loop iterations 112 0.2% 62.06% / 64.49% 958458522 958.46M 0.1% fail alloc595 1.1% 62.18% / 100.00% 595 595.00 0.0% opcode values positive (on trees)4266 8.0% 64.30% / 91.28% 16931889792 16.93G 1.7% opcode values nonequal (on trees)6600 12.4% 66.23% / 80.60% 71483051282 71.48G 7.2% continue 507 0.9% 66.66% / 82.85% 10086808016 10.09G 1.0% call11351 21.3% 67.16% / 92.24% 34680666103 34.68G 3.5% loop iterations 2689 5.0% 67.99% / 67.99% 408309517405 408.31G 41.3% DS theory 26385 49.4% 68.62% / 85.44% 146974369890 146.97G 14.9% const return 271 0.5% 69.39% / 87.09% 301566712 301.57M 0.0% pointer (on trees) 6230 11.7% 69.59% / 87.18% 16667735314 16.67G 1.7% combined53398 100.0% 70.31% / 80.36% 989164856862 989.16G 100.0% goto 78 0.1% 70.36% / 96.96% 951041538 951.04M 0.1% first match 16607 31.1% 78.00% / 78.42% 702435244516 702.44G 71.0% extra loop exit 141 0.3% 82.80% / 88.17% 16969469421.70G 0.2% null return 393 0.7% 91.47% / 93.08% 32686781973.27G 0.3% loop exit9909 18.6% 91.80% / 92.81% 282927773783 282.93G 28.6% guess loop iv compare 178 0.3% 97.81% / 97.85% 43750864534.38G 0.4% negative return 277 0.5% 97.94% / 99.23% 10621190281.06G 0.1% noreturn call2372 4.4% 100.00% / 100.00% 83565623238.36G 0.8% overflow 1282 2.4% 100.00% / 100.00% 175074177 175.07M 0.0% zero-sized array 677 1.3% 100.00% / 100.00% 112723803 112.72M 0.0% unconditional jump103 0.2% 100.00% / 100.00% 491001 491.00K 0.0% We used to track SPEC2000 until 2008 but then the infrastructure broke. The numbers show some differences to 2008 results: HEURISTICS BRANCHES (REL) HITRATE COVERAGE (REL) DS theory 42611 57.1% 74.54% / 89.71% 9237799352 28.7% combined 74578 100.0% 72.88% / 90.59% 32201983315 100.0% opcode values nonequal (on trees)14544 19.5% 72.03% / 88.64% 3387233627 10.5% early return (on trees) 11078 14.9% 61.23% / 89.25% 2349499033 7.3% first match 13249 17.8% 89.11% / 93.08% 15876522911 49.3% guessed loop iterations2722 3.6% 86.50% / 90.76% 7308035517 22.7% no prediction 18718 25.1% 34.36% / 86.14% 7087661052 22.0% call 23937 32.1% 71.38% / 93.08% 3829002205 11.9% opcode values positive (on trees) 2515 3.4% 72.77% / 86.49% 927995806 2.9% loop branch 378 0.5% 87.61% / 95.54% 1491510452 4.6% loop exit 8833 11.8% 91.43% / 94.52% 6538486043 20.3% loop iterations 912 1.2% 99.11% / 99.11%396451321 1.2% noreturn call 890 1.2% 99.99% / 99.99%205957905 0.6% pointer (on trees) 8394 11.3% 85.09% / 94.80% 1315262058 4.1% negative return 272 0.4% 96.47% / 99.74% 49156319 0.1% const return551 0.7% 67.92% / 68.97% 96082001 0.3% __builtin_expect 20 0.0% 0% / 0%0 0.0% null return 566 0.8% 96.58% / 98.77% 87555632 0.3% There is some degradation in the combined heuristicshitrate (72.8->70) which may be caused simply by fact that new spec is harder to guess. Main decrease seems to be in opcode_positive/nonequal which may be also attributed to the fact that early opts now optimize out more code before we do the statistics. There are bugs in few predictors - goto predictor is dead because the FE code was dropped, return predictor is bit random because CFG is optimized (it should probably be done in FE), loop iv compare seems bogus and fortran fail alloc
Re: JIT patch: add gcc_jit_magic_int
On Tue, Jun 7, 2016 at 12:19 PM, Andrew Pinski wrote: > On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch > wrote: >> Hello All, >> >> As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is >> difficult (or tricky without using dirty tricks involving the GCC plugin >> headers) to use GCCJIT to emit code equivalent to the following C file: >> >>extern int a; >>int get_atomic_a (void) { >> return __atomic_load_n (&a, __ATOMIC_SEQ_CST); >>} >> >> The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but >> non-standard!) symbol which might not be available >> (or might have a different value) in the C code for GCCJIT building such an >> AST. > >> >> So we need a function to retrieve some magic integral value from the GCCJIT >> compiler. > > Huh?Why can't you just use the enum: > enum memmodel > { > MEMMODEL_RELAXED = 0, > MEMMODEL_CONSUME = 1, > MEMMODEL_ACQUIRE = 2, > MEMMODEL_RELEASE = 3, > MEMMODEL_ACQ_REL = 4, > MEMMODEL_SEQ_CST = 5, > MEMMODEL_LAST = 6, > MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC, > MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC, > MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC > }; > > > For the values? I don't understand why you need to have a direct > relationship to preprocessed macros to these values here. > Have your own lookup table inside your own JIT rather than a generic > way of doing it. > Having a generic way seems more incorrect way of doing it and even > says that the JIT is supposed to JITting C code but it is not JITing C > code. That is move enum memmodel to be included in the JIT API too. > > Thanks, > Andrew Pinski > >> >> The attached patch (relative to trunk svn 236583) is a first attempt to >> solve that issue >> (and also give ability to query some other magic numbers). >> >> Proposed ChangeLog entry (in gcc/jit/) >> >> 2016-05-23 Basile Starynkevitch >> * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro. >> (gcc_jit_magic_int): New public function declaration. >> >> * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h" >> (gcc_jit_magic_int): New function. >> >> * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6. >> >> Comments (or an ok to commit) are welcome. (I am not sure that >> __SANITIZE_ADDRESS__ is correctly handled, >> because I would believe that optimization flags are not globals in GCCJIT) >> >> Regards. >> >> -- >> Basile STARYNKEVITCH http://starynkevitch.net/Basile/ >> email: basilestarynkevitchnet mobile: +33 6 8501 2359 >> 8, rue de la Faiencerie, 92340 Bourg La Reine, France >> *** opinions {are only mine, sont seulement les miennes} *** >>
Re: JIT patch: add gcc_jit_magic_int
On Mon, May 23, 2016 at 5:26 AM, Basile Starynkevitch wrote: > Hello All, > > As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it is > difficult (or tricky without using dirty tricks involving the GCC plugin > headers) to use GCCJIT to emit code equivalent to the following C file: > >extern int a; >int get_atomic_a (void) { > return __atomic_load_n (&a, __ATOMIC_SEQ_CST); >} > > The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but > non-standard!) symbol which might not be available > (or might have a different value) in the C code for GCCJIT building such an > AST. > > So we need a function to retrieve some magic integral value from the GCCJIT > compiler. Huh?Why can't you just use the enum: enum memmodel { MEMMODEL_RELAXED = 0, MEMMODEL_CONSUME = 1, MEMMODEL_ACQUIRE = 2, MEMMODEL_RELEASE = 3, MEMMODEL_ACQ_REL = 4, MEMMODEL_SEQ_CST = 5, MEMMODEL_LAST = 6, MEMMODEL_SYNC_ACQUIRE = MEMMODEL_ACQUIRE | MEMMODEL_SYNC, MEMMODEL_SYNC_RELEASE = MEMMODEL_RELEASE | MEMMODEL_SYNC, MEMMODEL_SYNC_SEQ_CST = MEMMODEL_SEQ_CST | MEMMODEL_SYNC }; For the values? I don't understand why you need to have a direct relationship to preprocessed macros to these values here. Have your own lookup table inside your own JIT rather than a generic way of doing it. Having a generic way seems more incorrect way of doing it and even says that the JIT is supposed to JITting C code but it is not JITing C code. Thanks, Andrew Pinski > > The attached patch (relative to trunk svn 236583) is a first attempt to > solve that issue > (and also give ability to query some other magic numbers). > > Proposed ChangeLog entry (in gcc/jit/) > > 2016-05-23 Basile Starynkevitch > * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro. > (gcc_jit_magic_int): New public function declaration. > > * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag-types.h" > (gcc_jit_magic_int): New function. > > * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6. > > Comments (or an ok to commit) are welcome. (I am not sure that > __SANITIZE_ADDRESS__ is correctly handled, > because I would believe that optimization flags are not globals in GCCJIT) > > Regards. > > -- > Basile STARYNKEVITCH http://starynkevitch.net/Basile/ > email: basilestarynkevitchnet mobile: +33 6 8501 2359 > 8, rue de la Faiencerie, 92340 Bourg La Reine, France > *** opinions {are only mine, sont seulement les miennes} *** >
Re: JIT patch: add gcc_jit_magic_int
On Mon, 2016-05-23 at 14:26 +0200, Basile Starynkevitch wrote: > Hello All, > > As I explained in https://gcc.gnu.org/ml/jit/2016-q2/msg00042.html it > is > difficult (or tricky without using dirty tricks involving the GCC > plugin > headers) to use GCCJIT to emit code equivalent to the following C > file: > > extern int a; > int get_atomic_a (void) { > return __atomic_load_n (&a, __ATOMIC_SEQ_CST); > } > > The issue is that __ATOMIC_SEQ_CST is a magic preprocessor (but non > -standard!) symbol which might not be available > (or might have a different value) in the C code for GCCJIT building > such an AST. > > So we need a function to retrieve some magic integral value from the > GCCJIT compiler. > > The attached patch (relative to trunk svn 236583) is a first attempt > to solve that issue > (and also give ability to query some other magic numbers). > > Proposed ChangeLog entry (in gcc/jit/) > > 2016-05-23 Basile Starynkevitch > * libgccjit.h (LIBGCCJIT_HAVE_gcc_jit_magic_int): New macro. > (gcc_jit_magic_int): New public function declaration. > > * libgccjit.c: Include "cppbuiltin.h", "options.h", "flag > -types.h" > (gcc_jit_magic_int): New function. > > * libgccjit.map: Add gcc_jit_magic_int to LIBGCCJIT_ABI_6. > > Comments (or an ok to commit) are welcome. (I am not sure that > __SANITIZE_ADDRESS__ is correctly handled, > because I would believe that optimization flags are not globals in > GCCJIT) Sorry about the delay in responding to this. > Index: gcc/jit/libgccjit.h > === > --- gcc/jit/libgccjit.h (revision 236583) > +++ gcc/jit/libgccjit.h (working copy) > @@ -1387,6 +1387,27 @@ > gcc_jit_rvalue_set_bool_require_tail_call (gcc_jit_rvalue *call, > int require_tail_call); > > + > + /* Magical integer values useful in the compiler; similar to > + predefined C macros like __GNUC__, __GNUC_MINOR__, > + __GNUC_PATCHLEVEL__, __ATOMIC_RELAXED, __ATOMIC_SEQ_CST, > + __ATOMIC_ACQUIRE, __ATOMIC_RELEASE, __ATOMIC_ACQ_REL, > + __ATOMIC_CONSUME, __PIC__, __PIE__, etc. Typical usage would be: > + > +bool err=false; > +int mypic = gcc_jit_magic_int("__PIC__", &err); > +if (err) somethinggotwrong(); > + > +This function is expected to be rarely called, typically once at > +initialization time. > + > + This API entrypoint was added in LIBGCCJIT_ABI_6; you can test for its > + presence using > + #ifdef LIBGCCJIT_HAVE_gcc_jit_magic_int > + */ > +#define LIBGCCJIT_HAVE_gcc_jit_magic_int > +extern int gcc_jit_magic_int(const char*name, bool*errp); Does the client code need to be able to check to see if a name is defined, or can they assume it always is? If the latter, how about: extern int gcc_jit_context_get_predefined_int (gcc_jit_context *ctxt, const char *name); and have it set an error on the context if the name isn't recognized? If they need to be able to check it, I prefer: extern int gcc_jit_context_get_predefined_int (gcc_jit_context *ctxt, const char *name, int *out); i.e. have the return value contain the "was name recognized" flag and *out be written to (I'd use "bool", but we don't require its availability). I'm not fond of "magic int", but I'm not fond of my proposed name either. > #ifdef __cplusplus > } > #endif /* __cplusplus */ > Index: gcc/jit/libgccjit.c > === > --- gcc/jit/libgccjit.c (revision 236583) > +++ gcc/jit/libgccjit.c (working copy) > @@ -23,6 +23,9 @@ > #include "coretypes.h" > #include "timevar.h" > #include "typed-splay-tree.h" > +#include "cppbuiltin.h" > +#include "options.h" > +#include "flag-types.h" > > #include "libgccjit.h" > #include "jit-recording.h" > @@ -2970,3 +2973,44 @@ > >call->set_require_tail_call (require_tail_call); > } > + > + > +/* Public entrypoint. See description in libgccjit.h. */ > + > +int gcc_jit_magic_int(const char*name, bool*errp) > +{ > + static int major, minor, patchlevel; > + if (!major) /* call once: */ > +parse_basever (&major, &minor, &patchlevel); These vars are global state, and they've not guarded by a mutex. Can they be moved to the gcc_jit_context instead? (actually, to gcc::jit::recording::context). That way they'd be covered by the general policies on thread-safety here: https://gcc.gnu.org/onlinedocs/jit/topics/contexts.html#thread-safety That said, see the question below about the "__GNUC_*" vars. > + > + RETURN_VAL_IF_FAIL (name, > + errp?((*errp=true),0):0, The above line is too terse for my taste. > + NULL, NULL, > + "NULL name"); > + RETURN_VAL_IF_FAIL (name[0] == '_' && name[1] == '_', > + errp?(
Re: [PING] [PATCH] Fix asm X constraint (PR inline-asm/59155)
On 06/06/16 20:08, Jakub Jelinek wrote: > On Mon, Jun 06, 2016 at 12:04:04PM -0600, Jeff Law wrote: >> On 06/06/2016 12:01 PM, Jakub Jelinek wrote: >>> On Mon, Jun 06, 2016 at 11:54:04AM -0600, Jeff Law wrote: > As for recog.c, I can not approve this as I am not a maintainer of it. > I only can say that the code looks questionable to me. I think the question on the recog part is a matter of how we choose to interpret what the "X" constraint means. Does it literally mean accept anything, or accept some subset expressions? I tend to think the former, which means that things like reg_overlap_mentioned_p or its callers have to be bullet-proofed. AFACT this is not the only place where overly complex RTL trees can cause an ICE. >>> >>> I think it is a bad idea to accept really anything, even for debug insns, >>> which initially accepted arbitrarily large RTL expressions (and still accept >>> stuff like subregs otherwise considered invalid etc.) we found it is highly >>> undesirable, as it is not very good idea for the compile time complexity >>> etc., so now we are trying to limit the complexity of the expressions there >>> by splitting up more complex expressions into smaller ones using >>> temporaries. >>> So, even accept anything should always be accept anything reasonable, >>> because most of the RTL passes don't really expect arbitrarily deep >>> expressions, or expressions where the same reg can appear thousands of times >>> etc. >> The problem is how do you define this subset of expressions you're going to >> accept and those which you are going to reject. >> >> I first pondered accepting RTL leaf nodes (reg, subreg, constants) and >> rejecting everything else. But I couldn't convince myself that some port >> might reasonably expect (plus (x) (y)) to match the "X" constraint. > > It is always going to be arbitrary. > Perhaps RTL leaf nodes (if including MEM, then perhaps with valid address > only), and unary/binary/ternary RTL expressions with RTL leaf node operands? > Or union of what is accepted by any other constraint? > Or "g" plus any constants? Yes. That is what I think too, "X" is probably not used often in practice, but if it is allowed, it should at least not result in an ICE. "X" should allow to feed "whatsoever" valid C expressions to the asm input, and using the %-expression in the assembler string should not cause an ICE. And "X" really means different "whatsoever" things in asm statements and in .md files, even md.texi has different text for internally used "X" constraint and assembler "X" constraint, so that should be OK. According to md.texi "g" should already mean: @item @samp{g} Any register, memory or immediate integer operand is allowed, except for registers that are not general registers. Which is the same as general_operand(op, VOIDmode). And in other words, that should be everything that is in "r", "m" and "i" constraints. However if I compare commond.md ... (define_constraint "i" "Matches a general integer constant." (and (match_test "CONSTANT_P (op)") (match_test "!flag_pic || LEGITIMATE_PIC_OPERAND_P (op)"))) ... against recog.c, general_operand seems to allow less i.e. checks targetm.legitimate_constant_p additionally. So that is something I don't really understand, I mean if a constant is not a "legitimate" constant by the target's definition, why should it be safe to use in later in targetm.asm_out.print_operand? Bernd.
Re: [PATCH] Document POWER's -mhtm and -mno-htm options.
On Tue, Jun 7, 2016 at 1:18 PM, Peter Bergner wrote: > It seems we (ok, me) forgot to document the -mhtm option for POWER. > This bootstrapped fine and the generated docs looked good. > Is this ok for trunk and the open release branches? > > Peter > > > * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and > -mno-htm. Okay. Thanks, David
[PATCH, rs6000] Add support for char, short, and int versions of vec_mul
This patch adds support for the missing versions of the vec_mul altivec builtins from the Power Architecture 64-Bit ELF V2 ABI OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1). There are many of the builtins that are missing and this is part of a series of patches to add them. There aren't instructions for the {un}signed char, {un}signed short, and {un}signed int versions of vec_mul so the output code is built from other built-ins and operations that do have instructions. The new test case is an executable test which verifies that the generated code produces expected values. C macros were used so that the same test case could be used for all the various supported types. Bootstrapped and tested on powerpc64le-unknown-linux-gnu and powerpc64-unknown-linux-gnu with no regressions. Is this ok for trunk? [gcc] 2016-06-07 Bill Seurer * config/rs6000/altivec.h: Add __builtin_vec_mul. * config/rs6000/rs6000-builtin.def (vec_mul): Change vec_mul to a special case Altivec builtin. * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove VSX_BUILTIN_VEC_MUL (replaced with special case code). * config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add code for ALTIVEC_BUILTIN_VEC_MUL. * config/rs6000/rs6000.c (altivec_init_builtins): Add definition for __builtin_vec_mul. [gcc/testsuite] 2016-06-07 Bill Seurer * gcc.target/powerpc/vec-mul.c: New test. Index: gcc/config/rs6000/altivec.h === --- gcc/config/rs6000/altivec.h (revision 237175) +++ gcc/config/rs6000/altivec.h (working copy) @@ -229,6 +229,7 @@ #define vec_mladd __builtin_vec_mladd #define vec_msum __builtin_vec_msum #define vec_msums __builtin_vec_msums +#define vec_mul __builtin_vec_mul #define vec_mule __builtin_vec_mule #define vec_mulo __builtin_vec_mulo #define vec_nor __builtin_vec_nor Index: gcc/config/rs6000/rs6000-builtin.def === --- gcc/config/rs6000/rs6000-builtin.def(revision 237175) +++ gcc/config/rs6000/rs6000-builtin.def(working copy) @@ -1300,6 +1300,7 @@ BU_ALTIVEC_OVERLOAD_X (LVRX, "lvrx") BU_ALTIVEC_OVERLOAD_X (LVRXL, "lvrxl") BU_ALTIVEC_OVERLOAD_X (LVSL, "lvsl") BU_ALTIVEC_OVERLOAD_X (LVSR, "lvsr") +BU_ALTIVEC_OVERLOAD_X (MUL, "mul") BU_ALTIVEC_OVERLOAD_X (PROMOTE, "promote") BU_ALTIVEC_OVERLOAD_X (SLD, "sld") BU_ALTIVEC_OVERLOAD_X (SPLAT, "splat") @@ -1600,7 +1601,6 @@ BU_VSX_OVERLOAD_3V (XXPERMDI, "xxpermdi") BU_VSX_OVERLOAD_3V (XXSLDWI, "xxsldwi") /* 2 argument VSX overloaded builtin functions. */ -BU_VSX_OVERLOAD_2 (MUL, "mul") BU_VSX_OVERLOAD_2 (DIV, "div") BU_VSX_OVERLOAD_2 (XXMRGHW, "xxmrghw") BU_VSX_OVERLOAD_2 (XXMRGLW, "xxmrglw") Index: gcc/config/rs6000/rs6000-c.c === --- gcc/config/rs6000/rs6000-c.c(revision 237175) +++ gcc/config/rs6000/rs6000-c.c(working copy) @@ -1941,14 +1941,6 @@ const struct altivec_builtin_types altivec_overloa RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_VMINUB, ALTIVEC_BUILTIN_VMINUB, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_bool_V16QI, 0 }, - { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_XVMULSP, -RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, 0 }, - { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_XVMULDP, -RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, 0 }, - { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI, -RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, 0 }, - { VSX_BUILTIN_VEC_MUL, VSX_BUILTIN_MUL_V2DI, -RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, 0 }, { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULEUB, RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, 0 }, { ALTIVEC_BUILTIN_VEC_MULE, ALTIVEC_BUILTIN_VMULESB, @@ -4683,7 +4675,58 @@ assignment for unaligned loads and stores"); warning (OPT_Wdeprecated, "vec_lvsr is deprecated for little endian; use \ assignment for unaligned loads and stores"); + if (fcode == ALTIVEC_BUILTIN_VEC_MUL) +{ + /* vec_mul needs to be special cased because there are no instructions +for it for the {un}signed char, {un}signed short, and {un}signed int +types. */ + if (nargs != 2) + { + error ("vec_mul only accepts 2 arguments"); + return error_mark_node; + } + tree arg0 = (*arglist)[0]; + tree arg0_type = TREE_TYPE (arg0); + tree arg1 = (*arglist)[1]; + tree arg1_type = TREE_TYPE (arg1); + + /* Both arguments must be vectors and the types must match. */ + if (arg0_type != arg1_type) + goto bad; + if (TREE_CODE (arg0_type) != VECTOR_TYPE) + goto
[PATCH] Document POWER's -mhtm and -mno-htm options.
It seems we (ok, me) forgot to document the -mhtm option for POWER. This bootstrapped fine and the generated docs looked good. Is this ok for trunk and the open release branches? Peter * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mhtm and -mno-htm. Index: gcc/doc/invoke.texi === --- gcc/doc/invoke.texi (revision 237175) +++ gcc/doc/invoke.texi (working copy) @@ -1000,7 +1000,7 @@ See RS/6000 and PowerPC Options. -mpointers-to-nested-functions -mno-pointers-to-nested-functions @gol -msave-toc-indirect -mno-save-toc-indirect @gol -mpower8-fusion -mno-mpower8-fusion -mpower8-vector -mno-power8-vector @gol --mcrypto -mno-crypto -mdirect-move -mno-direct-move @gol +-mcrypto -mno-crypto -mhtm -mno-htm -mdirect-move -mno-direct-move @gol -mquad-memory -mno-quad-memory @gol -mquad-memory-atomic -mno-quad-memory-atomic @gol -mcompat-align-parm -mno-compat-align-parm @gol @@ -19991,7 +19991,7 @@ following options: -mpopcntb -mpopcntd -mpowerpc64 @gol -mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol -msimple-fpu -mstring -mmulhw -mdlmzb -mmfpgpr -mvsx @gol --mcrypto -mdirect-move -mpower8-fusion -mpower8-vector @gol +-mcrypto -mdirect-move -mhtm -mpower8-fusion -mpower8-vector @gol -mquad-memory -mquad-memory-atomic -mmodulo -mfloat128 -mfloat128-hardware @gol -mpower9-fusion -mpower9-vector -mpower9-dform} @@ -20165,6 +20165,14 @@ Generate code that uses (does not use) t between the general purpose registers and the vector/scalar (VSX) registers that were added in version 2.07 of the PowerPC ISA. +@item -mhtm +@itemx -mno-htm +@opindex mhtm +@opindex mno-htm +Enable (disable) the use of the built-in functions that allow direct +access to the Hardware Transactional Memory (HTM) instructions that +were added in version 2.07 of the PowerPC ISA. + @item -mpower8-fusion @itemx -mno-power8-fusion @opindex mpower8-fusion
[PATCH][AArch64][2/2] (Re)Implement vcopy_lane intrinsics
Hi all, This is the second part of James's patch from: https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html separated out. It reimplements the vcopyq_lane* intrinsics in C and adds implementations of the other missing vcopy_lane_ intrinsics. The differences from that patch are in the use of __aarch64_vset_lane_any and __aarch64_vget_lane_any rather than the typed variants of these that were used back in 2013 (and don't exist anymore). The testcase is also adjusted for the ABI change in GCC 5 where integer x1 vectors are now passed and returned in SIMD registers. The vcopy_laneq_f64 test in the testcase is currently XFAILed because it currently doesn't generate the optimal DUP instruction but instead emits a UMOV to a scalar register and then an FMOV. This is a GCC 7 regression tracked by PR 71307 and I think unrelated to this patch. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? Thanks, Kyrill 2016-06-07 Kyrylo Tkachov James Greenhalgh * config/aarch64/arm_neon.h (vcopyq_lane_f32, vcopyq_lane_f64, vcopyq_lane_p8, vcopyq_lane_p16, vcopyq_lane_s8, vcopyq_lane_s16, vcopyq_lane_s32, vcopyq_lane_s64, vcopyq_lane_u8, vcopyq_lane_u16, vcopyq_lane_u32, vcopyq_lane_u64): Reimplement in C. (vcopy_lane_f32, vcopy_lane_f64, vcopy_lane_p8, vcopy_lane_p16, vcopy_lane_s8, vcopy_lane_s16, vcopy_lane_s32, vcopy_lane_s64, vcopy_lane_u8, vcopy_lane_u16, vcopy_lane_u32, vcopy_lane_u64, vcopy_laneq_f32, vcopy_laneq_f64, vcopy_laneq_p8, vcopy_laneq_p16, vcopy_laneq_s8, vcopy_laneq_s16, vcopy_laneq_s32, vcopy_laneq_s64, vcopy_laneq_u8, vcopy_laneq_u16, vcopy_laneq_u32, vcopy_laneq_u64, vcopyq_laneq_f32, vcopyq_laneq_f64, vcopyq_laneq_p8, vcopyq_laneq_p16, vcopyq_laneq_s8, vcopyq_laneq_s16, vcopyq_laneq_s32, vcopyq_laneq_s64, vcopyq_laneq_u8, vcopyq_laneq_u16, vcopyq_laneq_u32, vcopyq_laneq_u64): New intrinsics. 2016-06-07 Kyrylo Tkachov James Greenhalgh * gcc.target/aarch64/vect_copy_lane_1.c: New test. diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h index d20caf0919356eb7a87e7c7a9cd336d8408db35b..e33ecdbb72c8e8a483f82adc6f0e28edb3e5f4e6 100644 --- a/gcc/config/aarch64/arm_neon.h +++ b/gcc/config/aarch64/arm_neon.h @@ -5869,162 +5869,6 @@ vaddlvq_u32 (uint32x4_t a) return result; } -#define vcopyq_lane_f32(a, b, c, d) \ - __extension__ \ -({ \ - float32x4_t c_ = (c);\ - float32x4_t a_ = (a);\ - float32x4_t result; \ - __asm__ ("ins %0.s[%2], %3.s[%4]"\ -: "=w"(result) \ -: "0"(a_), "i"(b), "w"(c_), "i"(d) \ -: /* No clobbers */); \ - result; \ - }) - -#define vcopyq_lane_f64(a, b, c, d) \ - __extension__ \ -({ \ - float64x2_t c_ = (c);\ - float64x2_t a_ = (a);\ - float64x2_t result; \ - __asm__ ("ins %0.d[%2], %3.d[%4]"\ -: "=w"(result) \ -: "0"(a_), "i"(b), "w"(c_), "i"(d) \ -: /* No clobbers */); \ - result; \ - }) - -#define vcopyq_lane_p8(a, b, c, d) \ - __extension__ \ -({ \ - poly8x16_t c_ = (c); \ - poly8x16_t a_ = (a); \ - poly8x16_t result; \ - __asm__ ("ins %0.b[%2], %3.b[%4]"\ -: "=w"(result) \ -: "0"(a_), "i"(b), "w"(c_), "i"(d) \ -: /* No clobbers */); \ - result; \ - }) - -#define vcopyq_lane_p16(a, b, c, d) \ - __extension__
[PATCH][AArch64][1/2] Add support INS (element) instruction to copy lanes between vectors
Hi all, This patch addresses an deficiency we have in handling vector lane-to-lane moves in the AArch64 backend. Generally we can use the INS (element) instruction but, as a user complains in https://gcc.gnu.org/ml/gcc-help/2016-05/msg00069.html we don't. James had a patch adding an appropriate combine pattern some time ago (https://gcc.gnu.org/ml/gcc-patches/2013-09/msg01068.html) but it never got applied. This patch is a rebase of that patch that adds necessary vec_merge+vec_duplicate+vec_select combine pattern. I chose to use a define_insn rather than the define_insn_and_split in that patch that just deletes the instruction when the source and destination registers are the same, as I think that's not he combine patterns job to delete the redundant instruction but rather some other passes job. Also, I was not able to create a testcase where it would make a difference. Also, this patch doesn't reimplement that vcopy*lane* intrinsics from inline assembly to a vget_lane+vset_lane combo. This can be done as a separate patch on top of this one. Bootstrapped and tested on aarch64-none-linux-gnu. Also tested on aarch64_be-none-elf. Ok for trunk? Thanks, Kyrill 2016-06-07 James Greenhalgh Kyrylo Tkachov * config/aarch64/aarch64-simd.md (*aarch64_simd_vec_copy_lane): New define_insn. (*aarch64_simd_vec_copy_lane_): Likewise. 2016-06-07 James Greenhalgh Kyrylo Tkachov * gcc.target/aarch64/vget_set_lane_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 6ea35bf487eaa47dd78742e3eae7507b6875ba1a..5600e5bd0a94fd7efd704a4b13d95d993fd5b62f 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -555,6 +555,49 @@ (define_insn "aarch64_simd_vec_set" [(set_attr "type" "neon_from_gp, neon_ins, neon_load1_1reg")] ) +(define_insn "*aarch64_simd_vec_copy_lane" + [(set (match_operand:VALL 0 "register_operand" "=w") + (vec_merge:VALL + (vec_duplicate:VALL + (vec_select: + (match_operand:VALL 3 "register_operand" "w") + (parallel + [(match_operand:SI 4 "immediate_operand" "i")]))) + (match_operand:VALL 1 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD" + { +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, INTVAL (operands[4]))); + +return "ins\t%0.[%p2], %3.[%4]"; + } + [(set_attr "type" "neon_ins")] +) + +(define_insn "*aarch64_simd_vec_copy_lane_" + [(set (match_operand:VALL 0 "register_operand" "=w") + (vec_merge:VALL + (vec_duplicate:VALL + (vec_select: + (match_operand: 3 "register_operand" "w") + (parallel + [(match_operand:SI 4 "immediate_operand" "i")]))) + (match_operand:VALL 1 "register_operand" "0") + (match_operand:SI 2 "immediate_operand" "i")))] + "TARGET_SIMD" + { +int elt = ENDIAN_LANE_N (mode, exact_log2 (INTVAL (operands[2]))); +operands[2] = GEN_INT (HOST_WIDE_INT_1 << elt); +operands[4] = GEN_INT (ENDIAN_LANE_N (mode, + INTVAL (operands[4]))); + +return "ins\t%0.[%p2], %3.[%4]"; + } + [(set_attr "type" "neon_ins")] +) + (define_insn "aarch64_simd_lshr" [(set (match_operand:VDQ_I 0 "register_operand" "=w") (lshiftrt:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") diff --git a/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c new file mode 100644 index ..07a77de319206c5c6dad1c0d2d9bcc998583f9c1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/vget_set_lane_1.c @@ -0,0 +1,72 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +#include "arm_neon.h" + +#define BUILD_TEST(TYPE1, TYPE2, Q1, Q2, SUFFIX, INDEX1, INDEX2) \ +TYPE1 __attribute__((noinline,noclone))\ +test_copy##Q1##_lane##Q2##_##SUFFIX (TYPE1 a, TYPE2 b) \ +{ \ + return vset##Q1##_lane_##SUFFIX (vget##Q2##_lane_##SUFFIX (b, INDEX2),\ +a, INDEX1);\ +} + +BUILD_TEST (poly8x8_t, poly8x8_t, , , p8, 7, 6) +BUILD_TEST (int8x8_t, int8x8_t, , , s8, 7, 6) +BUILD_TEST (uint8x8_t, uint8x8_t, , , u8, 7, 6) +/* { dg-final { scan-assembler-times "ins\\tv0.b\\\[7\\\], v1.b\\\[6\\\]" 3 } } */ +BUILD_TEST (poly16x4_t, poly16x4_t, , , p16, 3, 2) +BUILD_TEST (int16x4_t, int16x4_t, , , s16, 3, 2) +BUILD_TEST (uint16x4_t, uint16x4_t, , , u16, 3, 2) +/* { dg-final { scan-assembler-times "ins\\tv0.h\\\[3\\\], v1.h\\\[2\\\]" 3 } } */ +BUILD_TEST (float32x2_t, float32x2_t, , , f32, 1, 0) +BUILD_TEST (int32x2_t, int32x2_t, , , s32, 1, 0) +BUILD_TEST (uint32x2_t, uint32x2_t, , , u32, 1, 0) +/* { dg-final { scan-assembler-times "ins\\tv0.s\\\[1\\\], v1.s\\\[0\\\]" 3 } } */ + +BUILD_TEST (poly8x8_t, poly8x16_t, , q, p8, 7, 15) +BUILD_TEST (int8x8_t, int8x16_t, , q, s8, 7, 15) +BUILD_TEST (uint8x8_t, uint8x16_t, , q, u8, 7, 15
Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
On 07/06/16 17:47, Kyrill Tkachov wrote: On 07/06/16 08:37, Andreas Schwab wrote: Kyrill Tkachov writes: +static rtx +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val) +{ + if (cmp_code != EQ && cmp_code != NE) +return NULL_RTX; + + /* Result on X == 0 and X !=0 respectively. */ + rtx on_zero, on_nonzero; + if (cmp_code == EQ) +{ + on_zero = true_val; + on_nonzero = false_val; +} + else +{ + on_zero = false_val; + on_nonzero = true_val; +} + + rtx_code op_code = GET_CODE (on_nonzero); + if ((op_code != CLZ && op_code != CTZ) + || !rtx_equal_p (XEXP (on_nonzero, 0), x) + || !CONST_INT_P (on_zero)) +return NULL_RTX; + + HOST_WIDE_INT op_val; + machine_mode mode = GET_MODE (on_nonzero); + if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val)) +|| (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val))) + && op_val == INTVAL (on_zero)) +return on_nonzero; + + return NULL_RTX; +} ../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)': ../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable] machine_mode mode = GET_MODE (on_nonzero); ^~~~ Andreas. That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or CTZ_DEFINED_VALUE_AT_ZERO. defaults.h then defines them to just "0". I think GCC shouldn't warn in these cases as 'mode' is used in the function it's defined, it's just the macro where it's used as an argument discards it. Anyway, this patch removes that variable and fixes the warning. Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO to reproduce the warning and checking that it's fixed. Committing as obvious in the interest of fixing bootstrap on the relevant targets. With the ChangeLog entry: 2016-06-07 Kyrylo Tkachov * simplify-rtx.c (simplify_cond_clz_ctz): Delete 'mode' local variable. Thanks, Kyrill
Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
On 07/06/16 08:37, Andreas Schwab wrote: Kyrill Tkachov writes: +static rtx +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val) +{ + if (cmp_code != EQ && cmp_code != NE) +return NULL_RTX; + + /* Result on X == 0 and X !=0 respectively. */ + rtx on_zero, on_nonzero; + if (cmp_code == EQ) +{ + on_zero = true_val; + on_nonzero = false_val; +} + else +{ + on_zero = false_val; + on_nonzero = true_val; +} + + rtx_code op_code = GET_CODE (on_nonzero); + if ((op_code != CLZ && op_code != CTZ) + || !rtx_equal_p (XEXP (on_nonzero, 0), x) + || !CONST_INT_P (on_zero)) +return NULL_RTX; + + HOST_WIDE_INT op_val; + machine_mode mode = GET_MODE (on_nonzero); + if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val)) + || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val))) + && op_val == INTVAL (on_zero)) +return on_nonzero; + + return NULL_RTX; +} ../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)': ../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable] machine_mode mode = GET_MODE (on_nonzero); ^~~~ Andreas. That must be on targets that don't define CLZ_DEFINED_VALUE_AT_ZERO or CTZ_DEFINED_VALUE_AT_ZERO. defaults.h then defines them to just "0". I think GCC shouldn't warn in these cases as 'mode' is used in the function it's defined, it's just the macro where it's used as an argument discards it. Anyway, this patch removes that variable and fixes the warning. Tested on arm-none-eabi after manually undefining CLZ_DEFINED_VALUE_AT_ZERO and CTZ_DEFINED_VALUE_AT_ZERO to reproduce the warning and checking that it's fixed. Committing as obvious in the interest of fixing bootstrap on the relevant targets. Thanks, Kyrill diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 0f750bb08e817ba630d92566c6a991f81e095304..21adfcb531f3f3379653c67145ab0023c978fd82 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -5314,9 +5314,10 @@ simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val) return NULL_RTX; HOST_WIDE_INT op_val; - machine_mode mode = GET_MODE (on_nonzero); - if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val)) - || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val))) + if (((op_code == CLZ + && CLZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val)) + || (op_code == CTZ + && CTZ_DEFINED_VALUE_AT_ZERO (GET_MODE (on_nonzero), op_val))) && op_val == INTVAL (on_zero)) return on_nonzero;
Re: [PATCH] integer overflow checking builtins in constant expressions
On Tue, Jun 07, 2016 at 09:51:05AM -0600, Martin Sebor wrote: > On 06/07/2016 08:32 AM, Jason Merrill wrote: > >On 06/06/2016 08:36 AM, Jakub Jelinek wrote: > >>@@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_ > >> case BUILT_IN_FUNCTION: > >> case BUILT_IN_LINE: > >> > >>+ /* The following built-ins are valid in constant expressions > >>+ when their arguments are. */ > >>+case BUILT_IN_ADD_OVERFLOW: > >>+case BUILT_IN_ADD_OVERFLOW_P: > > > > > >Why is this necessary? builtin_valid_in_constant_expr_p is only needed > >for builtins that can have constant values even if their operands are > >non-constant, which doesn't apply here. > > > >For that matter, I don't see why we needed to add _FUNCTION et al, either. > > I don't remember what prompted me to change the function in either > patch. Perhaps it was the comment above the function that says > this: > > /* Test whether DECL is a builtin that may appear in a >constant-expression. */ > > But removing the change doesn't seem to affect the C++ test results > for the two features so it looks like the change may not be needed. Indeed, confirming removal of the cp/tree.c hunk doesn't affect anything in the testsuite, nor e.g. enum A { B = 1, C = 2, D = __builtin_add_overflow_p (B, C, C) }; int e[__builtin_add_overflow_p (B, C, C) + 1]; template int foo (int); void bar () { foo <__builtin_add_overflow_p (B, C, C) + 1> (0); } That said, builtin_valid_in_constant_expr_p is used in lots of spots, and in various of them I see that without the change it actually sets *non_integral_constant_expression_p = true; or something similar, but I have no idea why it still works despite of that. Is the patch ok for trunk without the cp/tree.c hunk? Jakub
PING^2: [PATCH] Load external function address via GOT slot
On Sun, May 29, 2016 at 3:13 PM, H.J. Lu wrote: > On Mon, May 9, 2016 at 10:37 AM, H.J. Lu wrote: >> On Fri, Apr 22, 2016 at 6:03 AM, Uros Bizjak wrote: >>> On Fri, Apr 22, 2016 at 2:54 PM, H.J. Lu wrote: For -fno-plt, we load the external function address via the GOT slot so that linker won't create an PLT entry for extern function address. Tested on x86-64. I also built GCC with -fno-plt. It removes 99% PLT entries. OK for trunk? H.J. -- gcc/ PR target/pr67400 * config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New. * config/i386/i386.c (ix86_force_load_from_GOT_p): New function. (ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL for ix86_force_load_from_GOT_p returns true. (ix86_print_operand_address): Support UNSPEC_GOTPCREL if ix86_force_load_from_GOT_p returns true. (ix86_expand_move): Load the external function address via the GOT slot if ix86_force_load_from_GOT_p returns true. * config/i386/predicates.md (x86_64_immediate_operand): Return false if ix86_force_load_from_GOT_p returns true. gcc/testsuite/ PR target/pr67400 * gcc.target/i386/pr67400-1.c: New test. * gcc.target/i386/pr67400-2.c: Likewise. * gcc.target/i386/pr67400-3.c: Likewise. * gcc.target/i386/pr67400-4.c: Likewise. >>> >>> Please get someone that knows this linker magic to review the >>> functionality first. Maybe Jakub can help? >>> >> >> Hi Jakub, >> >> Can you review this patch? >> >> Thanks. > > PING. > PING. -- H.J.
[C++ Patch] Fix some simple location issues
Hi, in parallel with other work, I'm auditing the front-end for simple location issues which can be fixed immediately. These I noticed in decl.c over the last couple of days. Slightly more interesting, for some error messages we didn't seem to have testcases, certainly not in g++.dg, thus I added some elementary ones covering the locations too. Tested x86_64-linux. Thanks, Paolo. /// /cp 2016-06-07 Paolo Carlini * decl.c (maybe_deduce_size_from_array_init): Use DECL_SOURCE_LOCATION in error_at. (layout_var_decl): Likewise. (check_array_initializer): Likewise. (check_initializer): Likewise. (duplicate_decls, check_elaborated_type_specifier): Tidy. /testsuite 2016-06-07 Paolo Carlini * g++.dg/init/array42.C: New. * g++.dg/init/array43.C: Likewise. * g++.dg/init/array44.C: Likewise. * g++.dg/init/array45.C: Likewise. * g++.dg/cpp0x/constexpr-ice10.C: Test column number too. * g++.dg/cpp0x/constexpr-incomplete1.C: Likewise. * g++.dg/cpp1y/auto-fn27.C: Likewise. * g++.dg/gomp/pr35751.C: Likewise. * g++.dg/init/array23.C: Likewise. * g++.dg/init/brace2.C: Likewise. * g++.dg/init/brace6.C: Likewise. Index: cp/decl.c === --- cp/decl.c (revision 237171) +++ cp/decl.c (working copy) @@ -1393,7 +1393,7 @@ duplicate_decls (tree newdecl, tree olddecl, bool { if (DECL_INITIAL (olddecl)) inform (DECL_SOURCE_LOCATION (olddecl), - "previous definition of %q+D was here", olddecl); + "previous definition of %qD was here", olddecl); else inform (DECL_SOURCE_LOCATION (olddecl), "previous declaration of %qD was here", olddecl); @@ -5266,13 +5266,15 @@ maybe_deduce_size_from_array_init (tree decl, tree do_default); if (failure == 1) { - error ("initializer fails to determine size of %qD", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "initializer fails to determine size of %qD", decl); } else if (failure == 2) { if (do_default) { - error ("array size missing in %qD", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "array size missing in %qD", decl); } /* If a `static' var's size isn't known, make it extern as well as static, so it does not get allocated. If it's not @@ -5283,7 +5285,8 @@ maybe_deduce_size_from_array_init (tree decl, tree } else if (failure == 3) { - error ("zero-size array %qD", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "zero-size array %qD", decl); } } @@ -5322,7 +5325,8 @@ layout_var_decl (tree decl) /* An automatic variable with an incomplete type: that is an error. Don't talk about array types here, since we took care of that message in grokdeclarator. */ - error ("storage size of %qD isn%'t known", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "storage size of %qD isn%'t known", decl); TREE_TYPE (decl) = error_mark_node; } #if 0 @@ -5345,7 +5349,8 @@ layout_var_decl (tree decl) constant_expression_warning (DECL_SIZE (decl)); else { - error ("storage size of %qD isn%'t constant", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "storage size of %qD isn%'t constant", decl); TREE_TYPE (decl) = error_mark_node; } } @@ -5954,7 +5959,8 @@ check_array_initializer (tree decl, tree type, tre if (!COMPLETE_TYPE_P (complete_type (element_type))) { if (decl) - error ("elements of array %q#D have incomplete type", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "elements of array %q#D have incomplete type", decl); else error ("elements of array %q#T have incomplete type", type); return true; @@ -6018,7 +6024,8 @@ check_initializer (tree decl, tree init, int flags } else if (!COMPLETE_TYPE_P (type)) { - error ("%q#D has incomplete type", decl); + error_at (DECL_SOURCE_LOCATION (decl), + "%q#D has incomplete type", decl); TREE_TYPE (decl) = error_mark_node; return NULL_TREE; } @@ -6038,8 +6045,9 @@ check_initializer (tree decl, tree init, int flags } else if (init_len != 1 && TREE_CODE (type) != COMPLEX_TYPE) { - error ("scalar object %qD requires one element in initializer", -decl); + error_at (DECL_SOURCE_LOCATION (decl), + "scalar object %qD req
PING^5: [PATCH] PR target/70454: Build x86 libgomp with -march=i486 or better
On Sun, May 29, 2016 at 3:14 PM, H.J. Lu wrote: > On Fri, May 20, 2016 at 8:04 AM, H.J. Lu wrote: >> On Mon, May 9, 2016 at 5:52 AM, H.J. Lu wrote: >>> On Mon, May 2, 2016 at 6:46 AM, H.J. Lu wrote: On Mon, Apr 25, 2016 at 1:36 PM, H.J. Lu wrote: > If x86 libgomp isn't compiled with -march=i486 or better, append > -march=i486 XCFLAGS for x86 libgomp build. > > Tested on i686 with and without --with-arch=i386. Tested on > x86-64 with and without --with-arch_32=i386. OK for trunk? > > > H.J. > --- > PR target/70454 > * configure.tgt (XCFLAGS): Append -march=i486 to compile x86 > libgomp if needed. > --- > libgomp/configure.tgt | 36 > 1 file changed, 16 insertions(+), 20 deletions(-) > > diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt > index 77e73f0..c876e80 100644 > --- a/libgomp/configure.tgt > +++ b/libgomp/configure.tgt > @@ -67,28 +67,24 @@ if test x$enable_linux_futex = xyes; then > ;; > > # Note that bare i386 is not included here. We need cmpxchg. > -i[456]86-*-linux*) > +i[456]86-*-linux* | x86_64-*-linux*) > config_path="linux/x86 linux posix" > - case " ${CC} ${CFLAGS} " in > - *" -m64 "*|*" -mx32 "*) > - ;; > - *) > - if test -z "$with_arch"; then > - XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}" > + # Need i486 or better. > + cat > conftestx.c < +#if defined __x86_64__ || defined __i486__ || defined __pentium__ \ > + || defined __pentiumpro__ || defined __pentium4__ \ > + || defined __geode__ || defined __SSE__ > +# error Need i486 or better > +#endif > +EOF > + if ${CC} ${CFLAGS} -c -o conftestx.o conftestx.c > /dev/null > 2>&1; then > + if test "${target_cpu}" = x86_64; then > + XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic" > + else > + XCFLAGS="${XCFLAGS} -march=i486 -mtune=${target_cpu}" > fi > - esac > - ;; > - > -# Similar jiggery-pokery for x86_64 multilibs, except here we > -# can't rely on the --with-arch configure option, since that > -# applies to the 64-bit side. > -x86_64-*-linux*) > - config_path="linux/x86 linux posix" > - case " ${CC} ${CFLAGS} " in > - *" -m32 "*) > - XCFLAGS="${XCFLAGS} -march=i486 -mtune=generic" > - ;; > - esac > + fi > + rm -f conftestx.c conftestx.o > ;; > > # Note that sparcv7 and sparcv8 is not included here. We need cas. > -- > 2.5.5 > PING. >>> >>> PING. >>> >> >> PING. >> > > PING. PING. -- H.J.
Re: [PATCH] integer overflow checking builtins in constant expressions
On 06/07/2016 08:32 AM, Jason Merrill wrote: On 06/06/2016 08:36 AM, Jakub Jelinek wrote: @@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_ case BUILT_IN_FUNCTION: case BUILT_IN_LINE: + /* The following built-ins are valid in constant expressions + when their arguments are. */ +case BUILT_IN_ADD_OVERFLOW: +case BUILT_IN_ADD_OVERFLOW_P: Why is this necessary? builtin_valid_in_constant_expr_p is only needed for builtins that can have constant values even if their operands are non-constant, which doesn't apply here. For that matter, I don't see why we needed to add _FUNCTION et al, either. I don't remember what prompted me to change the function in either patch. Perhaps it was the comment above the function that says this: /* Test whether DECL is a builtin that may appear in a constant-expression. */ But removing the change doesn't seem to affect the C++ test results for the two features so it looks like the change may not be needed. Unless I hear otherwise I'll submit a separate patch removing the BUILTIN_FILE, FUNCTION, and LINE labels (and adjust the comment to more closely reflect the function's purpose). I'll leave it to Jakub to tweak this patch. Martin
[C++ PATCH] Fix -Wunused-* regression (PR c++/71442)
Hi! Marek has recently added code to set TREE_USED bits on the elements of TREE_VEC referenced in SIZEOF_EXPR. But, as the testcase shows, it can be used on various parameter/argument packs, some of them have types as elements, others decls. And IMHO we want to set TREE_USED only on the decls listed in those, for types TREE_USED should be a property of the type regardless of whether the type is mentioned in sizeof... or not, otherwise we suddenly stop diagnosing any unused vars with those types. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2/5.5? 2016-06-07 Jakub Jelinek PR c++/71442 * pt.c (tsubst_copy): Only set TREE_USED on DECLs. * g++.dg/cpp0x/Wunused-variable-1.C: New test. --- gcc/cp/pt.c.jj 2016-06-01 14:17:12.0 +0200 +++ gcc/cp/pt.c 2016-06-07 14:29:16.608041125 +0200 @@ -14160,7 +14160,8 @@ tsubst_copy (tree t, tree args, tsubst_f len = TREE_VEC_LENGTH (expanded); /* Set TREE_USED for the benefit of -Wunused. */ for (int i = 0; i < len; i++) - TREE_USED (TREE_VEC_ELT (expanded, i)) = true; + if (DECL_P (TREE_VEC_ELT (expanded, i))) + TREE_USED (TREE_VEC_ELT (expanded, i)) = true; } if (expanded == error_mark_node) --- gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C.jj 2016-06-07 14:31:15.514486508 +0200 +++ gcc/testsuite/g++.dg/cpp0x/Wunused-variable-1.C 2016-06-07 14:32:13.526730026 +0200 @@ -0,0 +1,25 @@ +// PR c++/71442 +// { dg-do compile { target c++11 } } +// { dg-options "-Wunused-variable" } + +struct C +{ + template + int operator()(Ts &&...) + { +return sizeof...(Ts); + } +}; + +int +main () +{ + C {} (1, 1L, 1LL, 1.0); + char a; // { dg-warning "unused variable" } + short b; // { dg-warning "unused variable" } + int c; // { dg-warning "unused variable" } + long d; // { dg-warning "unused variable" } + long long e; // { dg-warning "unused variable" } + float f; // { dg-warning "unused variable" } + double g;// { dg-warning "unused variable" } +} Jakub
Ignore debug insns in memcmp optimization
This fixes a few PRs from the last few days. Fully tested on x86_64-linux. Ok? Bernd PR debug/71432 PR ada/71413 * tree-ssa-strlen.c (handle_builtin_memcmp): Ignore debug insns. PR debug/71432 PR ada/71413 * g++.dg/debug/pr71432.C: New test. Index: gcc/testsuite/g++.dg/debug/pr71432.C === --- gcc/testsuite/g++.dg/debug/pr71432.C (revision 0) +++ gcc/testsuite/g++.dg/debug/pr71432.C (working copy) @@ -0,0 +1,140 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fcompare-debug" } */ + +namespace std +{ + typedef long unsigned int size_t; + inline namespace __cxx11 + { + } } + +extern "C++" +{ + namespace std + { +template < typename _Tp > struct __is_char +{ +}; +template <> struct __is_char +{ + enum + { __value = 1 }; +}; + } namespace __gnu_cxx + { +template < bool, typename > struct __enable_if +{ +}; +template < typename _Tp > struct __enable_if +{ + typedef _Tp __type; +}; + } +} + +namespace __gnu_cxx +{ + template < typename _Tp > class new_allocator + { + }; +} + +namespace std +{ + template < typename _Tp > using __allocator_base = +__gnu_cxx::new_allocator < _Tp >; +template < typename _Tp > class allocator:public __allocator_base < _Tp > + { + }; + template < typename _Alloc > struct allocator_traits + { + }; + template < typename _Tp > struct allocator_traits > + { +using size_type = std::size_t; +template < typename _Up > using rebind_alloc = allocator < _Up >; + }; +} + +namespace __gnu_cxx +{ + template < typename _Alloc > struct __alloc_traits:std::allocator_traits <_Alloc > + { +typedef std::allocator_traits < _Alloc > _Base_type; + template < typename _Tp > struct rebind +{ + typedef typename _Base_type::template rebind_alloc < _Tp > other; + }; + }; +} + +namespace std +{ + template < class _CharT > struct char_traits; + namespace __cxx11 + { +template < typename _CharT, typename _Traits = + char_traits < _CharT >, typename _Alloc = + allocator < _CharT > >class basic_string; +typedef basic_string < char >string; + } +} + +namespace std +{ + template <> struct char_traits + { +typedef char char_type; + static int compare (const char_type * __s1, const char_type * __s2, + size_t __n) +{ + return __builtin_memcmp (__s1, __s2, __n); +} + }; + + namespace __cxx11 + { +template < typename _CharT, typename _Traits, typename _Alloc > +class basic_string +{ + typedef typename __gnu_cxx::__alloc_traits <_Alloc >::template rebind < _CharT >::other _Char_alloc_type; + typedef __gnu_cxx::__alloc_traits < _Char_alloc_type > _Alloc_traits; + typedef typename _Alloc_traits::size_type size_type; + +public: + size_type size ()const noexcept + { + } + const _CharT *data () const noexcept + { + } +}; + } + + template < typename _CharT > inline typename __gnu_cxx::__enable_if < +__is_char < _CharT >::__value, +bool >::__type operator== (const basic_string < _CharT > &__lhs, + const basic_string < _CharT > &__rhs) noexcept + { +return !std::char_traits < _CharT >::compare (__lhs.data (), + __rhs.data (), + __lhs.size ()); + } +}; + +class CLIParameterType +{ + const std::string & getSwitchOption (unsigned int i) const + { + } unsigned int getSwitchOptionCount () const + { + } int checkSwitched (const std::string & value) const; +}; + +int +CLIParameterType::checkSwitched (const std::string & value) const const +{ + int contains = false; + for (unsigned int i = 0; !contains && i < getSwitchOptionCount () ;) +contains = getSwitchOption (i) == value; +} Index: gcc/tree-ssa-strlen.c === --- gcc/tree-ssa-strlen.c (revision 237069) +++ gcc/tree-ssa-strlen.c (working copy) @@ -1867,6 +1867,8 @@ handle_builtin_memcmp (gimple_stmt_itera { gimple *ustmt = USE_STMT (use_p); + if (is_gimple_debug (ustmt)) + continue; if (gimple_code (ustmt) == GIMPLE_ASSIGN) { gassign *asgn = as_a (ustmt);
C PATCH for c/71418 (ICE with bogus array dimension and _Alignas)
Another error-recovery bug. This time TYPE was error_mark_node but min_align_of_type accessed it via TYPE_ALIGN. So we better check that we're operating on a type. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2016-06-07 Marek Polacek PR c/71418 * c-decl.c (grokdeclarator): Check TYPE_P. * gcc.dg/noncompile/pr71418.c: New test. diff --git gcc/c/c-decl.c gcc/c/c-decl.c index d79802e..ac83e2f 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -6313,7 +6313,7 @@ grokdeclarator (const struct c_declarator *declarator, } else if (TREE_CODE (type) == FUNCTION_TYPE) error_at (loc, "alignment specified for function %qE", name); - else if (declspecs->align_log != -1) + else if (declspecs->align_log != -1 && TYPE_P (type)) { alignas_align = 1U << declspecs->align_log; if (alignas_align < min_align_of_type (type)) diff --git gcc/testsuite/gcc.dg/noncompile/pr71418.c gcc/testsuite/gcc.dg/noncompile/pr71418.c index e69de29..e3243f3 100644 --- gcc/testsuite/gcc.dg/noncompile/pr71418.c +++ gcc/testsuite/gcc.dg/noncompile/pr71418.c @@ -0,0 +1,4 @@ +/* PR c/71418 */ +/* { dg-do compile } */ + +_Alignas (int) int a[7++]; /* { dg-error "lvalue required" } */ Marek
Re: OpenACC wait clause
On Tue, Jun 07, 2016 at 08:01:10AM -0700, Cesar Philippidis wrote: > On 06/07/2016 04:13 AM, Jakub Jelinek wrote: > > > I've noticed > > if ((mask & OMP_CLAUSE_WAIT) > > && !c->wait > > && gfc_match ("wait") == MATCH_YES) > > { > > c->wait = true; > > match_oacc_expr_list (" (", &c->wait_list, false); > > continue; > > } > > which looks just weird and confusing. Why isn't this instead: > > if ((mask & OMP_CLAUSE_WAIT) > > && !c->wait > > && (match_oacc_expr_list ("wait (", &c->wait_list, false) > > == MATCH_YES)) > > { > > c->wait = true; > > continue; > > } > > ? Otherwise you happily accept wait without following (, perhaps even > > combined with another clause without any space in between etc. > > Both acc wait and async accept optional parenthesis arguments. E.g., > > #pragma acc wait > > blocks for all of the async streams to complete before proceeding, whereas > > #pragma acc wait (1, 5) > > only blocks for async streams 1 and 5. But then you need to set need_space = true; if it doesn't have the ( after it. Jakub
Re: OpenACC wait clause
On 06/07/2016 04:13 AM, Jakub Jelinek wrote: > I've noticed > if ((mask & OMP_CLAUSE_WAIT) > && !c->wait > && gfc_match ("wait") == MATCH_YES) > { > c->wait = true; > match_oacc_expr_list (" (", &c->wait_list, false); > continue; > } > which looks just weird and confusing. Why isn't this instead: > if ((mask & OMP_CLAUSE_WAIT) > && !c->wait > && (match_oacc_expr_list ("wait (", &c->wait_list, false) > == MATCH_YES)) > { > c->wait = true; > continue; > } > ? Otherwise you happily accept wait without following (, perhaps even > combined with another clause without any space in between etc. Both acc wait and async accept optional parenthesis arguments. E.g., #pragma acc wait blocks for all of the async streams to complete before proceeding, whereas #pragma acc wait (1, 5) only blocks for async streams 1 and 5. Cesar
Re: [PATCH][2/3] Vectorize inductions that are live after the loop
On 07/06/2016 10:28, "Rainer Orth" wrote: >Alan Hayward writes: > >> On 05/06/2016 12:00, "Andreas Schwab" wrote: >> >>>Alan Hayward writes: >>> * gcc.dg/vect/vect-live-2.c: New test. >>> >>>This test fails on powerpc64 (with -m64, but not with -m32): >>> >>>$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect >>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0 >>>loops >>>in function. >>>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0 >>>loops >>>in function. >>> >>> >> >> "note: not vectorized: relevant stmt not supported: _1 = (long unsigned >> int) j_24;" >> >> >> This is failing because power does not support vectorising a cast from >>int >> to long. >> (It works on power 32bit because longs are 32bit and therefore no need >>to >> cast). >> >> Can someone please suggest a target-supports define (or another method) >>I >> can use to >> disable this test for power 64bit (but not 32bit) ? >> I tried using vect_multiple_sizes, but that will also disable the test >>on >> x86 without >> avx. > >I'm also seeing new FAILs on Solaris/SPARC: > >+FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects >scan-tree-dump-times v >ect "vectorized 1 loops" 1 >+FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vectorized 1 >loops" >1 > >32- and 64-bit: > >vect-live-2.c:16:3: note: not vectorized: relevant stmt not supported: _2 >= j.0_1 * 4; >vect-live-2.c:48:7: note: not vectorized: control flow in loop. >vect-live-2.c:35:3: note: not vectorized: loop contains function calls or >data references that cannot be analyzed > >and > >+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects >scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not >relevant" 4 >+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects >scan-tree-dump-times vect "vectorized 1 loops" 4 >+FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects >scan-tree-dump-times vect "vectorizing stmts using SLP" 4 >+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect >"vec_stmt_relevant_p: stmt live but not relevant" 4 >+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect >"vectorized 1 loops" 4 >+FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect >"vectorizing stmts using SLP" 4 > >vect-live-slp-3.c:29:1: note: not vectorized: no vectype for stmt: n0_29 >= *_4; >vect-live-slp-3.c:30:1: note: not vectorized: no vectype for stmt: n0_29 >= *_4; >vect-live-slp-3.c:31:1: note: not vectorized: no vectype for stmt: n0_29 >= *_4; >vect-live-slp-3.c:32:1: note: not vectorized: no vectype for stmt: n0_29 >= *_4; >vect-live-slp-3.c:62:4: note: not vectorized: control flow in loop. >vect-live-slp-3.c:45:3: note: not vectorized: loop contains function >calls or data references that cannot be analyzed > I’ve been trying both these tests on x86,aarch64,power,sparc vect-live-slp-3.c Fails on power 64 (altivec & vsx), sparc 64 (vis 2 & 3) - due to long int unsupported Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 2 & 3) vect-live-2.c Fails on power 64 (altivec & vsx), sparc 64 (vis 2 & 3) - due to long int unsupported Fails on sparc 32 (vis 2) - due to multiply/shift not supported Pass on x86, aarch64, power 32 (altivec & vsx), sparc 32 (vis 3) Therefore I think both tests should be gated on “vect_long”. In addition, vect-live-2.c should also be gated on “vect_shift” “vect_long” is not currently enabled for aarch64, but should be. Also “vect_shift” is not currently enabled for sparc 32 (vis 3), but probably should be. I leave this for a task for a sparc maintainer to add (as I’m unable to test). This patch fixes the targets for vect-live-slp-3.c and vect-live-2.c. It also adds aarch64 to vect_long. As a side consequence, the following vector tests are now enabled for aarch64: pr18425.c, pr30843.c, pr36493.c, pr42193.c and pr60656.c Tested on aarch64 and x86. Tested by inspection on power and sparc Ok to commit? testsuite/ * gcc.dg/vect/vect-live-2.c : Likewise * gcc.dg/vect/vect-live-slp-3.c : Update effective target * lib/target-supports.exp : Add aarch64 to vect_long diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-2.c b/gcc/testsuite/gcc.dg/vect/vect-live-2.c index 53adc3fee006e0577a4cf2f9ba8fe091d2a09353..9460624a515945bdd72f98a0b1a6751fd c7a75de 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-live-2.c +++ b/gcc/testsuite/gcc.dg/vect/vect-live-2.c @@ -1,4 +1,5 @@ -/* { dg-require-effective-target vect_int } */ +/* { dg-require-effective-target vect_long } */ +/* { dg-require-effective-target vect_shift } */ /* { dg-additional-options "-fno-tree-scev-cprop" } */ #include "tree-vect.h" diff --git a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c index aacf5cb98071f6fec1f4b522eeefeb6696787334..70947afcc45d30987fc6df2db634d002d 33b360f 100644 --- a/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c +++ b/gcc/testsuite/gcc.dg/vect/vect-live-slp-3.c @@ -1,4 +1,
Re: [PATCH] integer overflow checking builtins in constant expressions
On 06/06/2016 08:36 AM, Jakub Jelinek wrote: @@ -352,6 +352,35 @@ builtin_valid_in_constant_expr_p (const_ case BUILT_IN_FUNCTION: case BUILT_IN_LINE: + /* The following built-ins are valid in constant expressions +when their arguments are. */ +case BUILT_IN_ADD_OVERFLOW: +case BUILT_IN_ADD_OVERFLOW_P: Why is this necessary? builtin_valid_in_constant_expr_p is only needed for builtins that can have constant values even if their operands are non-constant, which doesn't apply here. For that matter, I don't see why we needed to add _FUNCTION et al, either. Jason
Re: [PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions
Hi Kelvin, On Mon, Jun 06, 2016 at 05:58:00PM -0600, Kelvin Nilsen wrote: > Index: gcc/config/rs6000/altivec.h > === > --- gcc/config/rs6000/altivec.h (revision 237045) > +++ gcc/config/rs6000/altivec.h (working copy) > @@ -401,6 +401,11 @@ > #define vec_vprtybq __builtin_vec_vprtybq > #endif > > +#define vec_adu __builtin_vec_vadu > +#define vec_adub __builtin_vec_vadub > +#define vec_aduh __builtin_vec_vaduh > +#define vec_aduw __builtin_vec_vaduw I think the API says vec_absd? Please double-check. > +;; Vector absolute difference unsigned > +(define_expand "vadu3" > + [(set (match_operand:VI 0 "register_operand" "") > +(unspec:VI [(match_operand:VI 1 "register_operand" "") > + (match_operand:VI 2 "register_operand" "")] > + UNSPEC_VADU))] > + "TARGET_P9_VECTOR") You can leave off those constraints "", they are default. > +;; Vector absolute difference unsigned > +(define_insn "*p9_vadu3" > + [(set (match_operand:VI 0 "register_operand" "=v") > +(unspec:VI [(match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v")] > + UNSPEC_VADU))] > + "TARGET_P9_VECTOR" > + "vabsdu %0, %1, %2" Don't use spaces after comma in asm. > + [(set_attr "type" "add") That's the wrong attr type; it should be something vector. "add" is for GPRs. > + (set_attr "length" "4")]) You don't need to say the default length. Segher
Re: [PATCH] Selftest framework (v7)
On Mon, 2016-06-06 at 22:14 -0400, Trevor Saunders wrote: > On Mon, Jun 06, 2016 at 11:57:49PM +0200, Jakub Jelinek wrote: > > On Mon, Jun 06, 2016 at 05:53:50PM -0400, Trevor Saunders wrote: > > > > > As far as I can > > > > > tell this just involves moving the start of namespace > > > > > selftest > > > > > upwards a > > > > > bit in the files where we have tests. > > > > > > > > Yes, and it does seem cleaner to have all of the selftest code > > > > start > > > > like this: > > > > > > > > #if CHECKING_P > > > > > > What are we gaining by ifdefing this? I would think on reasonable > > > systems the compiler would optimize out the call to the selftests > > > in > > > release builds and then the linker would gc all the unused > > > functions. > > > Do we really care about code size in places that doesn't happen > > > enough > > > to go through this? > > > > Not everyone is building the compiler with LTO, and if you don't, > > then > > how would you optimize that away? > > -ffunction-sections -Wl,--gc-sections should be enough I think. I > guess > we don't use those at the moment though. > > > And yes, not having the self-tests, especially if they are going to > > grow > > further, in release compilers is desirable, especially if it would > > be > > intermixed with hot code. > > That's fair, though turning on --gc-sections where we can should > further > help with that, and that should be more effective with > -ffunction-sections -fdata-sections, so its seems to me like the > right > thing to do is add configure tests to enable those? And then its > more > of a non issue? I appreciate that you'd done a lot of work on eliminating preprocessor use in gcc, and that we'd prefer to minimize the amount of #if code we have - though it's relatively easy to test the with/without #if CHECKING_P case (compared to all of the various target-specific macros). Historically gcc testing has largely been "black box" testing: run the built programs with specific inputs and look for specific outputs. My hope with -fself-tests is that we can build up our "white box" test coverage to complement the above, with unit tests. My favorite example is the testing for gengtype that I posted as a followup, which gives us some immediate test coverage that gengtype is working as expected, which is hard to do using our traditional approach to testing. I hope that we can add a lot more tests to -fself-test, in particular, I want us to have unit tests for hot code, including code that's hidden deep inside the implementation and that might normally get inlined away. To play Devil's Advocate: if we find we're able to do a release build with the selftests enabled and that it isn't slowing down the release build, does that imply we need more unit tests for our hottest code? (the counterargument being that the checked build still needs to run in a bearable amount of time). That said, I want -fself-test to always run quickly (e.g. less than a second); let's not put anything slow in there. Also, any unit tests involving analyzing several gimple or RTL statements at once seem to be easier to do via the gimple and RTL frontends that Prasad and I are working on respectively. (I think we can use -fself-test for unit -testing implementation details of passes that involve one statement at a time, but as soon as we start dealing with multiple statements and control flow, that it's probably better to express it using a gimple/RTL dump in DejaGnu form). Hope the above sounds sane Dave
[PATCH] Use __USER_LABEL_PREFIX__ in asm statement
A target may have a prefix in function symbol. Update interrrupt tests to use __USER_LABEL_PREFIX__ for function symbol in asm statement. Tested on Linux and Darwin. OK for trunk? Thanks. H.J. * gcc.dg/guality/pr68037-1.c (ASMNAME): New. (ASMNAME2): Likewise. (main): Replace fn in asm statement with ASMNAME ("fn"). * gcc.dg/guality/pr68037-2.c: Likewise. * gcc.dg/guality/pr68037-3.c: Likewise. * gcc.dg/torture/pr68037-1.c: Likewise. * gcc.dg/torture/pr68037-2.c: Likewise. * gcc.dg/torture/pr68037-3.c: Likewise. diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-1.c b/gcc/testsuite/gcc.dg/guality/pr68037-1.c index c3ea645..74f61ec 100644 --- a/gcc/testsuite/gcc.dg/guality/pr68037-1.c +++ b/gcc/testsuite/gcc.dg/guality/pr68037-1.c @@ -14,6 +14,8 @@ typedef unsigned int uword_t __attribute__ ((mode (__word__))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -53,7 +55,7 @@ main () push$" STRING (CS) "; \ push$" STRING (IP) "; \ push$" STRING (ERROR) ";\ - jmp fn"); + jmp " ASMNAME ("fn")); return 0; } diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-2.c b/gcc/testsuite/gcc.dg/guality/pr68037-2.c index 6f7e920..c3cd73d 100644 --- a/gcc/testsuite/gcc.dg/guality/pr68037-2.c +++ b/gcc/testsuite/gcc.dg/guality/pr68037-2.c @@ -13,6 +13,8 @@ typedef unsigned int uword_t __attribute__ ((mode (__word__))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -49,7 +51,7 @@ main () push$" STRING (FLAGS) ";\ push$" STRING (CS) "; \ push$" STRING (IP) "; \ - jmp fn"); + jmp " ASMNAME ("fn")); return 0; } diff --git a/gcc/testsuite/gcc.dg/guality/pr68037-3.c b/gcc/testsuite/gcc.dg/guality/pr68037-3.c index 504a931..6e05472 100644 --- a/gcc/testsuite/gcc.dg/guality/pr68037-3.c +++ b/gcc/testsuite/gcc.dg/guality/pr68037-3.c @@ -16,6 +16,8 @@ typedef int aligned __attribute__((aligned(64))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -65,7 +67,7 @@ main () push$" STRING (FLAGS) ";\ push$" STRING (CS) "; \ push$" STRING (IP) "; \ - jmp fn"); + jmp " ASMNAME ("fn")); return 0; } diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-1.c b/gcc/testsuite/gcc.dg/torture/pr68037-1.c index 15fe08c..23d7c6f 100644 --- a/gcc/testsuite/gcc.dg/torture/pr68037-1.c +++ b/gcc/testsuite/gcc.dg/torture/pr68037-1.c @@ -14,6 +14,8 @@ typedef unsigned int uword_t __attribute__ ((mode (__word__))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -53,6 +55,6 @@ main () push$" STRING (CS) "; \ push$" STRING (IP) "; \ push$" STRING (ERROR) ";\ - jmp fn"); + jmp " ASMNAME ("fn")); return 0; } diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-2.c b/gcc/testsuite/gcc.dg/torture/pr68037-2.c index 00ba7d4..18f9844 100644 --- a/gcc/testsuite/gcc.dg/torture/pr68037-2.c +++ b/gcc/testsuite/gcc.dg/torture/pr68037-2.c @@ -13,6 +13,8 @@ typedef unsigned int uword_t __attribute__ ((mode (__word__))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -49,6 +51,6 @@ main () push$" STRING (FLAGS) ";\ push$" STRING (CS) "; \ push$" STRING (IP) "; \ - jmp fn"); + jmp " ASMNAME ("fn")); return 0; } diff --git a/gcc/testsuite/gcc.dg/torture/pr68037-3.c b/gcc/testsuite/gcc.dg/torture/pr68037-3.c index abf8adb..86324f1 100644 --- a/gcc/testsuite/gcc.dg/torture/pr68037-3.c +++ b/gcc/testsuite/gcc.dg/torture/pr68037-3.c @@ -16,6 +16,8 @@ typedef int aligned __attribute__((aligned(64))); #define STRING(x) XSTRING(x) #define XSTRING(x) #x +#define ASMNAME(cname) ASMNAME2 (__USER_LABEL_PREFIX__, cname) +#define ASMNAME2(prefix, cname) XSTRING (prefix) cname struct interrupt_frame { @@ -65,6 +67,6 @@ main () push$" STRING (FLAGS) ";
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Tue, Jun 07, 2016 at 02:43:37PM +0200, Christophe Lyon wrote: > > No, why? This test doesn't test whether the function has been vectorized. > > It only tests whether it works. > > And the check_vect () is supposed to exit early if some extra flags were > > passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't > > support those. > > > But this makes the tests fails, rather than be unsupported, right? check_vect is supposed to exit (0) if the HW doesn't support vectorization, so the test would PASS. Jakub
C PATCH for c/71426 (ICE with bogus parameter)
cc1 crashed on the following invalid code, because it tripped this assert. The reason why b->nested was false for the FUNCTION_DECL 'x' was that when doing pushdecl, we found an incompatible duplicate, so pushdecl just bound 'x' to FUNCTION_DECL 'x', replacing the old binding, without the TREE_PUBLIC test that sets 'nested'. At first I thought about removing the assert but in the end I decided to add || seen_error -- for invalid code we might do the same as for ERROR_MARK. Bootstrapped/regtested on x86_64-linux, ok for trunk? 2016-06-07 Marek Polacek PR c/71426 * c-decl.c (get_parm_info): Don't crash on an assert on invalid code. * gcc.dg/noncompile/pr71426.c: New test. diff --git gcc/c/c-decl.c gcc/c/c-decl.c index d79802e..8ceb8ba 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -7054,9 +7054,9 @@ get_parm_info (bool ellipsis, tree expr) break; case FUNCTION_DECL: - /* FUNCTION_DECLs appear when there is an implicit function - declaration in the parameter list. */ - gcc_assert (b->nested); + /* FUNCTION_DECLs appear when there is an implicit function +declaration in the parameter list. */ + gcc_assert (b->nested || seen_error ()); goto set_shadowed; case CONST_DECL: diff --git gcc/testsuite/gcc.dg/noncompile/pr71426.c gcc/testsuite/gcc.dg/noncompile/pr71426.c index e69de29..874e189 100644 --- gcc/testsuite/gcc.dg/noncompile/pr71426.c +++ gcc/testsuite/gcc.dg/noncompile/pr71426.c @@ -0,0 +1,5 @@ +/* PR c/71426 */ +/* { dg-do compile } */ +/* { dg-options "-w" } */ + +int f (int x[x - x ()]); /* { dg-error "undeclared" } */ Marek
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On 7 June 2016 at 11:42, Jakub Jelinek wrote: > On Tue, Jun 07, 2016 at 10:36:25AM +0100, Ramana Radhakrishnan wrote: >> On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek wrote: >> > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote: >> >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj 2016-06-03 >> >> > 17:05:37.693475438 +0200 >> >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 >> >> > +0200 >> >> > @@ -0,0 +1,28 @@ >> >> > +/* PR tree-optimization/71259 */ >> >> > +/* { dg-do run } */ >> >> > +/* { dg-options "-O3" } */ >> > >> > Would changing this from dg-options to dg-additional-options help for the >> > ARM issues? >> > check_vect () is the standard way for testing for HW vectorization support >> > and hundreds of tests use it. >> >> >> all tests in gcc.dg/vect have some form of dg-require-effective-target > > No, at least 170+ tests don't. > >> - so I think this test should just have dg-require-effective-target >> "vect_int" . > > No, why? This test doesn't test whether the function has been vectorized. > It only tests whether it works. > And the check_vect () is supposed to exit early if some extra flags were > passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't > support those. > But this makes the tests fails, rather than be unsupported, right? > Jakub
Re: [PATCH][vectorizer] Remove blank debug lines after dump_gimple_stmt
On 06/06/2016 18:56, "Jeff Law" wrote: >On 06/06/2016 06:46 AM, Alan Hayward wrote: >> Lots of code calls dump_gimple_stmt then print a newline, however >> dump_gimple_stmt will print a newline itself. This makes the vectorizer >> debug >> file messy. I think the confusion is because dump_generic_expr does NOT >> print a >> newline. This patch removes all prints of a newline direcly after a >> dump_gimple_stmt. >> >> Tested by examining a selection of vect dump files. >> >> gcc/ >> \* tree-vect-data-refs.c (vect_analyze_data_refs): Remove debug >>newline. >> \* tree-vect-loop-manip.c (slpeel_make_loop_iterate_ntimes): likewise. >> (vect_can_advance_ivs_p): likewise. >> (vect_update_ivs_after_vectorizer): likewise. >> \* tree-vect-loop.c (vect_determine_vectorization_factor): likewise. >> (vect_analyze_scalar_cycles_1): likewise. >> (vect_analyze_loop_operations): likewise. >> (report_vect_op): likewise. >> (vect_is_slp_reduction): likewise. >> (vect_is_simple_reduction): likewise. >> (get_initial_def_for_induction): likewise. >> (vect_transform_loop): likewise. >> \* tree-vect-patterns.c (vect_recog_dot_prod_pattern): likewise. >> (vect_recog_sad_pattern): likewise. >> (vect_recog_widen_sum_pattern): likewise. >> (vect_recog_widening_pattern): likewise. >> (vect_recog_divmod_pattern): likewise. >> \* tree-vect-slp.c (vect-build-slp_tree_1): likewise. >> (vect_analyze_slp_instance): likewise. >> (vect_transform_slp_perm_load): likewise. >> (vect_schedule_slp_instance): likewise. >Wouldn't you also need to verify that the testsuite passes since it >could potentially have tests which assume the extra newlines in the >test's regexps? > >So I'm fine with this patch once the testsuite has been verified to run >without regressions on a target which exercises the vectorizer. > Sorry, I should have said that I have done a check run on x86 and aarch64, and there were no regressions. Alan.
Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option
On Tue, 7 Jun 2016, Jakub Jelinek wrote: > On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote: > > > > This fixes PR61564 by diagnosing (and ignoring) options not marked with > > 'Optimization' being applied to #pragma GCC optimize or via the > > optimize attribute. > > > > The reason is that while we save/restore option state for 'Optimize' > > marked options we don't do that for other options. Thus while such > > options do not end up in the per-function optimize state applying them > > still clobbers the global state. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Ok for trunk? > > Ok (though it surprises me we haven't done that from the beginning). It exposed that gcc.dg/ipa/pr70646.c uses a non-Optimization option #pragma (fixed) and that -ffast-math is not marked Optimization (but its "components" are, likewise -funsafe-math-optimizations is). Fixed both, bootstrapped / tested on x86_64-unknown-linux-gnu and applied. Richard. 2016-06-07 Richard Biener PR c/61564 * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION options and warn about others. * common.opt (ffast-math): Make Optimization. * gcc.dg/Wpragmas-1.c: New testcase. * gcc.dg/Wattributes-4.c: Likewise. * gcc.dg/ipa/pr70646.c: Drop optimize pragma in favor of dg-option entry. Index: gcc/c-family/c-common.c === *** gcc/c-family/c-common.c (revision 237167) --- gcc/c-family/c-common.c (working copy) *** parse_optimize_options (tree args, bool *** 9580,9585 --- 9580,9608 decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv, &decoded_options, &decoded_options_count); + /* Drop non-Optimization options. */ + unsigned j = 1; + for (i = 1; i < decoded_options_count; ++i) + { + if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION)) + { + ret = false; + if (attr_p) + warning (OPT_Wattributes, +"bad option %s to optimize attribute", +decoded_options[i].orig_option_with_args_text); + else + warning (OPT_Wpragmas, +"bad option %s to pragma attribute", +decoded_options[i].orig_option_with_args_text); + continue; + } + if (i != j) + decoded_options[j] = decoded_options[i]; + j++; + } + decoded_options_count = j; + /* And apply them. */ decode_options (&global_options, &global_options_set, decoded_options, decoded_options_count, input_location, global_dc); Index: gcc/testsuite/gcc.dg/Wpragmas-1.c === *** gcc/testsuite/gcc.dg/Wpragmas-1.c (revision 0) --- gcc/testsuite/gcc.dg/Wpragmas-1.c (working copy) *** *** 0 --- 1,11 + /* { dg-do compile } */ + + #pragma GCC push_options + #pragma GCC optimize ("-fno-lto") /* { dg-warning "bad option" } */ + int main(void){return 0;} + #pragma GCC pop_options + + /* ??? The FEs still apply the pragma string as optimize attribute to +all functions thus the diagnostic will be repeated for each function +affected. */ + /* { dg-message "bad option" "" { target *-*-* } 0 } */ Index: gcc/testsuite/gcc.dg/Wattributes-4.c === *** gcc/testsuite/gcc.dg/Wattributes-4.c(revision 0) --- gcc/testsuite/gcc.dg/Wattributes-4.c(working copy) *** *** 0 --- 1,3 + /* { dg-do compile } */ + + int __attribute__((optimize("no-lto"))) main(void){return 0;} /* { dg-warning "bad option" } */ Index: gcc/testsuite/gcc.dg/ipa/pr70646.c === *** gcc/testsuite/gcc.dg/ipa/pr70646.c (revision 237165) --- gcc/testsuite/gcc.dg/ipa/pr70646.c (working copy) *** *** 1,7 /* { dg-do run } */ ! /* { dg-options "-O2" } */ ! ! #pragma GCC optimize("no-unit-at-a-time") typedef unsigned char u8; typedef unsigned long long u64; --- 1,5 /* { dg-do run } */ ! /* { dg-options "-O2 -fno-unit-at-a-time" } */ typedef unsigned char u8; typedef unsigned long long u64; Index: gcc/common.opt === *** gcc/common.opt (revision 237165) --- gcc/common.opt (working copy) *** EnumValue *** 1287,1293 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD) ffast-math ! Common ffat-lto-objects Common Var(flag_fat_lto_objects) --- 1287,1293 Enum(excess_precision) String(standard) Value(EXCESS_PRECISION_STANDARD) ffast-math ! Common Optimization ffat-lto-objects Comm
Re: [PATCH, Fortran, OpenACC] Fix PR70598, Fortran host_data ICE
Ping. On 2016/5/11 02:57 AM, Bernhard Reutner-Fischer wrote: > On May 9, 2016 4:26:50 PM GMT+02:00, Chung-Lin Tang > wrote: >> Hi, this patch resolves an ICE for Fortran when using the OpenACC >> host_data directive. Actually, rather than say resolve, it's more like >> adjusting the front-end to same middle-end restrictions as C/C++, >> namely that we only support pointers or arrays for host_data right now. >> >> This patch contains a little bit of adjustments in >> fortran/openmp.c:resolve_omp_clauses(), >> and some testcase adjustments. This has been tested without regressions >> for Fortran. >> >> Is this okay for trunk? >> >> Thanks, >> Chung-Lin >> >> 2015-05-09 Chung-Lin Tang >> >> gcc/ >> * fortran/openmp.c (resolve_omp_clauses): Adjust use_device clause >> handling to only allow pointers and arrays. > > Fortran has it's own ChangeLog. The patch itself looks somewhat plausible to > me, fwiw, but Jakub or a FE maintainer has the say. >
Re: [PATCH] Improve avx_vec_concat
Hi Jakub, On 23 May 19:26, Jakub Jelinek wrote: > Hi! > > Not sure how to easily test these. > In any case, for the vinsert* case, we don't have vinserti128 nor > vinsertf128 in evex, so need to use vinsert[if]{64x4,32x4} or > for DQ {64x2,32x8}. For the case with zero in the other half, > we need AVX512VL and it isn't guaranteed for the output operand, > because it can be 512-bit mode too. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? > > 2016-05-23 Jakub Jelinek > > * config/i386/sse.md (avx_vec_concat): Add v=v,vm and > Yv=Yv,C alternatives. Patch is OK for trunk. -- Thanks, K > > --- gcc/config/i386/sse.md.jj 2016-05-23 15:42:49.0 +0200 > +++ gcc/config/i386/sse.md2016-05-23 16:25:58.434925572 +0200 > @@ -18178,10 +18178,10 @@ (define_insn "_ > (set_attr "mode" "")]) > > (define_insn "avx_vec_concat" > - [(set (match_operand:V_256_512 0 "register_operand" "=x,x") > + [(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv") > (vec_concat:V_256_512 > - (match_operand: 1 "register_operand" "x,x") > - (match_operand: 2 "vector_move_operand" "xm,C")))] > + (match_operand: 1 "register_operand" "x,v,x,v") > + (match_operand: 2 "vector_move_operand" > "xm,vm,C,C")))] >"TARGET_AVX" > { >switch (which_alternative) > @@ -18189,6 +18189,22 @@ (define_insn "avx_vec_concat" > case 0: >return "vinsert\t{$0x1, %2, %1, %0|%0, > %1, %2, 0x1}"; > case 1: > + if ( == 64) > + { > + if (TARGET_AVX512DQ && GET_MODE_SIZE (mode) == 4) > + return "vinsert32x8\t{$0x1, %2, %1, > %0|%0, %1, %2, 0x1}"; > + else > + return "vinsert64x4\t{$0x1, %2, %1, > %0|%0, %1, %2, 0x1}"; > + } > + else > + { > + if (TARGET_AVX512DQ && GET_MODE_SIZE (mode) == 8) > + return "vinsert64x2\t{$0x1, %2, %1, > %0|%0, %1, %2, 0x1}"; > + else > + return "vinsert32x4\t{$0x1, %2, %1, > %0|%0, %1, %2, 0x1}"; > + } > +case 2: > +case 3: >switch (get_attr_mode (insn)) > { > case MODE_V16SF: > @@ -18200,9 +18216,19 @@ (define_insn "avx_vec_concat" > case MODE_V4DF: > return "vmovapd\t{%1, %x0|%x0, %1}"; > case MODE_XI: > - return "vmovdqa\t{%1, %t0|%t0, %1}"; > + if (which_alternative == 2) > + return "vmovdqa\t{%1, %t0|%t0, %1}"; > + else if (GET_MODE_SIZE (mode) == 8) > + return "vmovdqa64\t{%1, %t0|%t0, %1}"; > + else > + return "vmovdqa32\t{%1, %t0|%t0, %1}"; > case MODE_OI: > - return "vmovdqa\t{%1, %x0|%x0, %1}"; > + if (which_alternative == 2) > + return "vmovdqa\t{%1, %x0|%x0, %1}"; > + else if (GET_MODE_SIZE (mode) == 8) > + return "vmovdqa64\t{%1, %x0|%x0, %1}"; > + else > + return "vmovdqa32\t{%1, %x0|%x0, %1}"; > default: > gcc_unreachable (); > } > @@ -18210,9 +18236,9 @@ (define_insn "avx_vec_concat" >gcc_unreachable (); > } > } > - [(set_attr "type" "sselog,ssemov") > - (set_attr "prefix_extra" "1,*") > - (set_attr "length_immediate" "1,*") > + [(set_attr "type" "sselog,sselog,ssemov,ssemov") > + (set_attr "prefix_extra" "1,1,*,*") > + (set_attr "length_immediate" "1,1,*,*") > (set_attr "prefix" "maybe_evex") > (set_attr "mode" "")]) > > > Jakub
RE: [PATCH][MIPS] P5600 scheduler fix
Hi, > > gcc/ > > * config/mips/p5600.md (p5600_fpu_fadd): Remove checking for > > `fabs' and `fneg' type attributes. > > (p5600_fpu_fabs): Add `fmove' to the comment. > > OK. > > Thanks, > Matthew Committed as r237173. Regards, Robert
Re: [Patch, avr] Fix PR 71151
Georg-Johann Lay writes: > Senthil Kumar Selvaraj schrieb: >> Hi, >> >> This patch fixes PR 71151 by eliminating the >> TARGET_ASM_FUNCTION_RODATA_SECTION hook and setting >> JUMP_TABLES_IN_TEXT_SECTION to 1. >> >> As described in the bugzilla entry, this hook assumed it will get >> called only for jumptable rodata for functions. This was true until >> 6.1, when a commit in varasm.c started calling the hook for mergeable >> string/constant data as well. >> >> This resulted in string constants ending up in a section intended for >> jumptables (flash), and broke code using those constants, which >> expects them to be present in rodata (SRAM). >> >> Given that the original reason for placing jumptables in a section was >> fixed by Johann in PR 63323, this patch restores the original >> behavior. Reg testing on both gcc-6-branch and trunk showed no regressions. > > Just for the record: > > The intention for jump-tables in function-rodata-section was to get > fine-grained section for the tables so that --gc-sections and > -ffunction-sections not only gc unused functions but also unused > jump-tables. As these tables had to reside in the lowest 64KiB of flash > (.progmem section) neither .rodata nor .text was a correct placement, > hence the hacking in TARGET_ASM_FUNCTION_RODATA_SECTION. > > Before using TARGET_ASM_FUNCTION_RODATA_SECTION, all jump tables were > put into .progmem.gcc_sw_table by ASM_OUTPUT_BEFORE_CASE_LABEL switching > to that section. > > We actually never had fump-tables in .text before... JUMP_TABLES_IN_TEXT_SECTION was 1 before r37465 - that was when the progmem.gcc_sw_table section was introduced. But yes, I understand that the target hook for FUNCTION_RODATA_SECTION was done to get them gc'ed along with the code. > > The purpose of PR63323 was to have more generic jump-table > implementation that also works if the table does NOT reside in the lower > 64KiB. This happens when moving whole whole TEXT section around like > for a bootloader. Understood. > >> As pointed out by Johann, this may end up increasing code >> size if there are lots of branches that cross the jump tables. I >> intend to propose a separate patch that gives additional information >> to the target hook (SECCAT_RODATA_{STRING,JUMPTABLE}) so it can know >> what type of function rodata is coming on. Johann also suggested >> handling jump table generation ourselves - I'll experiment with that >> some more. >> >> If ok, could someone commit please? Could you also backport to >> gcc-6-branch? >> >> Regards >> Senthil >> >> gcc/ChangeLog >> >> 2016-06-03 Senthil Kumar Selvaraj >> >> * config/avr/avr.c (avr_asm_function_rodata_section): Remove. >> * config/avr/avr.c (TARGET_ASM_FUNCTION_RODATA_SECTION): Remove. >> >> gcc/testsuite/ChangeLog >> >> 2016-06-03 Senthil Kumar Selvaraj >> >> * gcc/testsuite/gcc.target/avr/pr71151-1.c: New. >> * gcc/testsuite/gcc.target/avr/pr71151-2.c: New. >> >> diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c >> index ba5cd91..3cb8cb7 100644 >> --- gcc/config/avr/avr.c >> +++ gcc/config/avr/avr.c >> @@ -9488,65 +9488,6 @@ avr_asm_init_sections (void) >> } >> >> >> -/* Implement `TARGET_ASM_FUNCTION_RODATA_SECTION'. */ >> - >> -static section* >> -avr_asm_function_rodata_section (tree decl) >> -{ >> - /* If a function is unused and optimized out by -ffunction-sections >> - and --gc-sections, ensure that the same will happen for its jump >> - tables by putting them into individual sections. */ >> - >> - unsigned int flags; >> - section * frodata; >> - >> - /* Get the frodata section from the default function in varasm.c >> - but treat function-associated data-like jump tables as code >> - rather than as user defined data. AVR has no constant pools. */ >> - { >> -int fdata = flag_data_sections; >> - >> -flag_data_sections = flag_function_sections; >> -frodata = default_function_rodata_section (decl); >> -flag_data_sections = fdata; >> -flags = frodata->common.flags; >> - } >> - >> - if (frodata != readonly_data_section >> - && flags & SECTION_NAMED) >> -{ >> - /* Adjust section flags and replace section name prefix. */ >> - >> - unsigned int i; >> - >> - static const char* const prefix[] = >> -{ >> - ".rodata", ".progmem.gcc_sw_table", >> - ".gnu.linkonce.r.", ".gnu.linkonce.t." >> -}; >> - >> - for (i = 0; i < sizeof (prefix) / sizeof (*prefix); i += 2) >> -{ >> - const char * old_prefix = prefix[i]; >> - const char * new_prefix = prefix[i+1]; >> - const char * name = frodata->named.name; >> - >> - if (STR_PREFIX_P (name, old_prefix)) >> -{ >> - const char *rname = ACONCAT ((new_prefix, >> -name + strlen (old_prefix), >> NULL)); >> - flags &= ~SECTION_C
Re: [PATCH, i386, AVX-512] Add vectorizer support builtins
On 02 Jun 17:17, Ilya Verbin wrote: > On Mon, May 23, 2016 at 19:11:53 +0300, Ilya Verbin wrote: > > This patch adds missed 512-bit rounding builtins for vectorization. > > Regtested on x86_64-linux and i686-linux. OK for trunk? > > > > gcc/ > > * config/i386/i386-builtin-types.def: Add V16SI_FTYPE_V16SF, > > V8DF_FTYPE_V8DF_ROUND, V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND. > > * config/i386/i386.c (enum ix86_builtins): Add > > IX86_BUILTIN_CVTPS2DQ512_MASK, IX86_BUILTIN_FLOORPS512, > > IX86_BUILTIN_FLOORPD512, IX86_BUILTIN_CEILPS512, IX86_BUILTIN_CEILPD512, > > IX86_BUILTIN_TRUNCPS512, IX86_BUILTIN_TRUNCPD512, > > IX86_BUILTIN_CVTPS2DQ512, IX86_BUILTIN_VEC_PACK_SFIX512, > > IX86_BUILTIN_FLOORPS_SFIX512, IX86_BUILTIN_CEILPS_SFIX512, > > IX86_BUILTIN_ROUNDPS_AZ_SFIX512. > > (builtin_description bdesc_args): Add __builtin_ia32_floorps512, > > __builtin_ia32_ceilps512, __builtin_ia32_truncps512, > > __builtin_ia32_floorpd512, __builtin_ia32_ceilpd512, > > __builtin_ia32_truncpd512, __builtin_ia32_cvtps2dq512, > > __builtin_ia32_vec_pack_sfix512, __builtin_ia32_roundps_az_sfix512, > > __builtin_ia32_floorps_sfix512, __builtin_ia32_ceilps_sfix512. > > Change IX86_BUILTIN_CVTPS2DQ512 to IX86_BUILTIN_CVTPS2DQ512_MASK for > > __builtin_ia32_cvtps2dq512_mask. > > (ix86_expand_args_builtin): Handle V8DF_FTYPE_V8DF_ROUND, > > V16SF_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF_ROUND, V16SI_FTYPE_V16SF. > > (ix86_builtin_vectorized_function): Handle builtins mentioned above. > > * config/i386/sse.md > > (avx512f_fix_notruncv16sfv16si): > > Rename to ... > > (avx512f_fix_notruncv16sfv16si): ... this. > > (avx512f_cvtpd2dq512): Rename > > to ... > > (avx512f_cvtpd2dq512): ... this. > > (avx512f_vec_pack_sfix_v8df): New define_expand. > > (avx512f_roundpd512): Rename to ... > > (avx512f_round512): ... this. Change iterator. > > (avx512f_roundps512_sfix): New define_expand. > > (round2_sfix): Change iterator. > > gcc/testsuite/ > > * gcc.target/i386/avx512f-ceil-vec-1.c: New test. > > * gcc.target/i386/avx512f-ceil-vec-2.c: New test. > > * gcc.target/i386/avx512f-ceilf-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-ceilf-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-ceilf-vec-1.c: New test. > > * gcc.target/i386/avx512f-ceilf-vec-2.c: New test. > > * gcc.target/i386/avx512f-floor-vec-1.c: New test. > > * gcc.target/i386/avx512f-floor-vec-2.c: New test. > > * gcc.target/i386/avx512f-floorf-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-floorf-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-floorf-vec-1.c: New test. > > * gcc.target/i386/avx512f-floorf-vec-2.c: New test. > > * gcc.target/i386/avx512f-rint-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-rint-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-rintf-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-rintf-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-round-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-round-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-roundf-sfix-vec-1.c: New test. > > * gcc.target/i386/avx512f-roundf-sfix-vec-2.c: New test. > > * gcc.target/i386/avx512f-trunc-vec-1.c: New test. > > * gcc.target/i386/avx512f-trunc-vec-2.c: New test. > > * gcc.target/i386/avx512f-truncf-vec-1.c: New test. > > * gcc.target/i386/avx512f-truncf-vec-2.c: New test. > > Is it OK for gcc-6-branch? OK. > > -- Ilya -- Thanks, K
OpenACC wait clause
Hi! I've noticed if ((mask & OMP_CLAUSE_WAIT) && !c->wait && gfc_match ("wait") == MATCH_YES) { c->wait = true; match_oacc_expr_list (" (", &c->wait_list, false); continue; } which looks just weird and confusing. Why isn't this instead: if ((mask & OMP_CLAUSE_WAIT) && !c->wait && (match_oacc_expr_list ("wait (", &c->wait_list, false) == MATCH_YES)) { c->wait = true; continue; } ? Otherwise you happily accept wait without following (, perhaps even combined with another clause without any space in between etc. Jakub
Re: [PATCH 2/2][AArch64] Tests of AAPCS64 updates for alignment attribute
On Fri, Jan 15, 2016 at 02:17:43PM +, Alan Lawrence wrote: > Here I've added both tests using the abitest.h framework(which verifies values > are passed in the correct registers as specified by the AAPCS64), and separate > tests which verify that called functions read arguments from the same > locations > as they are passed. Hence, each test_align-N.c corresponds to rec_align-N.c. > > I've tried to stay consistent with the existing naming of e.g. test_10.c, > test_align-1.c, va_arg-10.c, but would be happy to change to another scheme > if preferred! (e.g. func-ret-1.c ?) > > Cheers, Alan These tests are OK. I'll commit them alongside patch 1/2 from this series tomorrow. Thanks, James > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/aapcs64/aapcs64.exp: Also execute rec_*.c > * gcc.target//aarch64/aapcs64/rec_align-5.c: New. > * gcc.target//aarch64/aapcs64/rec_align-6.c: New. > * gcc.target//aarch64/aapcs64/rec_align-7.c: New. > * gcc.target//aarch64/aapcs64/rec_align-8.c: New. > * gcc.target//aarch64/aapcs64/rec_align-9.c: New. > * gcc.target//aarch64/aapcs64/test_align-5.c: New. > * gcc.target//aarch64/aapcs64/test_align-6.c: New. > * gcc.target//aarch64/aapcs64/test_align-7.c: New. > * gcc.target//aarch64/aapcs64/test_align-8.c: New. > * gcc.target//aarch64/aapcs64/test_align-9.c: New. > * gcc.target//aarch64/aapcs64/rec_vaarg-1.c: New. > * gcc.target//aarch64/aapcs64/rec_vaarg-2.c: New.
Re: [PATCH 1/2][AArch64] Implement AAPCS64 updates for alignment attribute
On Fri, Jan 22, 2016 at 05:16:00PM +, Alan Lawrence wrote: > > On 21/01/16 17:23, Alan Lawrence wrote: > > On 18/01/16 17:10, Eric Botcazou wrote: > >> > >> Could you post the list of files that differ? How do they differ exactly? > > > > Hmmm. Well, I definitely had this failing to bootstrap once. I repeated > > that, to > > try to identify exactly what the differences wereand it succeeded even > > with > > my pre-AAPCS64-update host compiler. So, this is probably a false alarm; I'm > > bootstrapping again, after a rebase, to make sure... > > > > --Alan > > Ok, rebased onto a more recent build, and bootstrapping with Ada posed no > problems. Sorry for the noise. > > However, I had to drop the assert that TYPE_FIELDS was non-null because of > some > C++ testcases. > > Is this version OK for trunk? Now that we're in GCC7, this version of the patch is OK for trunk. From my reading of Richard's AAPCS update, this patch implements the rules as required. I'll give this a day for any last minute comments from Richard/Marcus, then commit this on your behalf tomorrow. Thanks, James > gcc/ChangeLog: > > * gcc/config/aarch64/aarch64.c (aarch64_function_arg_alignment): > Rewrite, looking one level down for records and arrays. > --- > gcc/config/aarch64/aarch64.c | 31 --- > 1 file changed, 16 insertions(+), 15 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 9142ac0..b084f83 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -1925,22 +1925,23 @@ aarch64_vfp_is_call_candidate (cumulative_args_t > pcum_v, machine_mode mode, > static unsigned int > aarch64_function_arg_alignment (machine_mode mode, const_tree type) > { > - unsigned int alignment; > + if (!type) > +return GET_MODE_ALIGNMENT (mode); > + if (integer_zerop (TYPE_SIZE (type))) > +return 0; > > - if (type) > -{ > - if (!integer_zerop (TYPE_SIZE (type))) > - { > - if (TYPE_MODE (type) == mode) > - alignment = TYPE_ALIGN (type); > - else > - alignment = GET_MODE_ALIGNMENT (mode); > - } > - else > - alignment = 0; > -} > - else > -alignment = GET_MODE_ALIGNMENT (mode); > + gcc_assert (TYPE_MODE (type) == mode); > + > + if (!AGGREGATE_TYPE_P (type)) > +return TYPE_ALIGN (TYPE_MAIN_VARIANT (type)); > + > + if (TREE_CODE (type) == ARRAY_TYPE) > +return TYPE_ALIGN (TREE_TYPE (type)); > + > + unsigned int alignment = 0; > + > + for (tree field = TYPE_FIELDS (type); field; field = DECL_CHAIN (field)) > +alignment = std::max (alignment, DECL_ALIGN (field)); > >return alignment; > } > -- > 1.9.1 >
Re: [PATCH, aarch64] Fix 70048
Hi Richard, On 16/03/16 21:25, Richard Henderson wrote: This fixes only the regression described in the PR. There was quite a bit of follow-up that points to new work that ought to be done during the gcc7 cycle, but isn't really appropriate now. Tested on aarch64-linux; committed as reviewed in the PR. r~ @@ -4953,74 +4963,43 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) { - HOST_WIDE_INT offset = INTVAL (XEXP (x, 1)); - HOST_WIDE_INT base_offset; + rtx base = XEXP (x, 0); + rtx offset_rtx XEXP (x, 1); I recently read the aarch64_legitimize_address function, and find a suspicious line of code in the above change. >> + rtx offset_rtx XEXP (x, 1); It's committed by you. It looks like a typo, and an assignment seems missing? James suggests me this is c++ initialization. Ah, yes it is! But I believe this is an coincident? As you have different initialization code above. I made an obvious patch to make it looks more intuitive, is it Okay? Regards, Renlin Li gcc/changelog: 2016-06-06 renlin li * config/aarch64/aarch64.c (aarch64_legitimize_address): Add assignment. commit 1fd77baf4ca918ed25dbce4678d7be7b7cd51be2 Author: Renlin Li Date: Mon Jun 6 11:24:39 2016 +0100 fix type diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ad07fe1..54e6813 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4949,7 +4949,7 @@ aarch64_legitimize_address (rtx x, rtx /* orig_x */, machine_mode mode) if (GET_CODE (x) == PLUS && CONST_INT_P (XEXP (x, 1))) { rtx base = XEXP (x, 0); - rtx offset_rtx XEXP (x, 1); + rtx offset_rtx = XEXP (x, 1); HOST_WIDE_INT offset = INTVAL (offset_rtx); if (GET_CODE (base) == PLUS)
Re: [PATCH] spellcheck.c: add test_find_closest_string
On 06/06/2016 11:03 PM, David Malcolm wrote: This adds another test case to -fself-test. Ok. Bernd
Re: [PATCH] Add selftest for pretty-print.c
On 06/06/2016 11:28 PM, David Malcolm wrote: + assert_pp_format ("0xcafebabe", "%wx", (HOST_WIDE_INT)0xcafebabe); More interesting tests would be to have multiple arguments to test that we really used the right size for the varargs. Maybe append a single %d arg with a unique bit pattern to all of them? Otherwise ok. Bernd
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Tue, Jun 07, 2016 at 10:36:25AM +0100, Ramana Radhakrishnan wrote: > On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek wrote: > > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote: > >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj 2016-06-03 > >> > 17:05:37.693475438 +0200 > >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 > >> > +0200 > >> > @@ -0,0 +1,28 @@ > >> > +/* PR tree-optimization/71259 */ > >> > +/* { dg-do run } */ > >> > +/* { dg-options "-O3" } */ > > > > Would changing this from dg-options to dg-additional-options help for the > > ARM issues? > > check_vect () is the standard way for testing for HW vectorization support > > and hundreds of tests use it. > > > all tests in gcc.dg/vect have some form of dg-require-effective-target No, at least 170+ tests don't. > - so I think this test should just have dg-require-effective-target > "vect_int" . No, why? This test doesn't test whether the function has been vectorized. It only tests whether it works. And the check_vect () is supposed to exit early if some extra flags were passed by vect.exp (like e.g. on i?86-linux -msse2) and the HW doesn't support those. Jakub
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Tue, Jun 7, 2016 at 10:28 AM, Jakub Jelinek wrote: > On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote: >> > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj 2016-06-03 >> > 17:05:37.693475438 +0200 >> > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200 >> > @@ -0,0 +1,28 @@ >> > +/* PR tree-optimization/71259 */ >> > +/* { dg-do run } */ >> > +/* { dg-options "-O3" } */ > > Would changing this from dg-options to dg-additional-options help for the > ARM issues? > check_vect () is the standard way for testing for HW vectorization support > and hundreds of tests use it. all tests in gcc.dg/vect have some form of dg-require-effective-target - so I think this test should just have dg-require-effective-target "vect_int" . Ramana > >> > +/* { dg-additional-options "-mavx" { target avx_runtime } } */ >> > + >> > +#include "tree-vect.h" >> > + >> > +long a, b[1][44][2]; >> > +long long c[44][17][2]; >> > + >> > +int >> > +main () >> > +{ >> > + int i, j, k; >> > + check_vect (); >> > + asm volatile ("" : : : "memory"); >> > + for (i = 0; i < 44; i++) >> > +for (j = 0; j < 17; j++) >> > + for (k = 0; k < 2; k++) >> > + c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - >> > 5105075050047261684; >> > + asm volatile ("" : : : "memory"); >> > + for (i = 0; i < 44; i++) >> > +for (j = 0; j < 17; j++) >> > + for (k = 0; k < 2; k++) >> > + if (c[i][j][k] != -5105075050047261684) >> > + __builtin_abort (); >> > + return 0; >> > +} >> > >> >> This new test fails on ARM targets where the default FPU is not Neon like. >> The error message I'm seeing is: >> In file included from >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0: >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h: >> In function 'check_vect': >> /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5: >> error: inconsistent operand constraints in an 'asm' >> >> Well, the same error message actually appears with other tests, I did >> notice this one because >> it is a new one. >> >> The arm code is: >> /* On some processors without NEON support, this instruction may >>be a no-op, on others it may trap, so check that it executes >>correctly. */ >> long long a = 0, b = 1; >> asm ("vorr %P0, %P1, %P2" >> : "=w" (a) >> : "0" (a), "w" (b)); >> >> ... which has been here since 2007 :( >> >> IIUC, its purpose is to check Neon availability, but this makes the >> tests fail instead of >> being unsupported. >> >> Why not use an effective-target check instead? > > Jakub
Re: [PATCH][2/3] Vectorize inductions that are live after the loop
Alan Hayward writes: > On 05/06/2016 12:00, "Andreas Schwab" wrote: > >>Alan Hayward writes: >> >>> * gcc.dg/vect/vect-live-2.c: New test. >> >>This test fails on powerpc64 (with -m64, but not with -m32): >> >>$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect >>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0 loops >>in function. >>../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0 loops >>in function. >> >> > > "note: not vectorized: relevant stmt not supported: _1 = (long unsigned > int) j_24;" > > > This is failing because power does not support vectorising a cast from int > to long. > (It works on power 32bit because longs are 32bit and therefore no need to > cast). > > Can someone please suggest a target-supports define (or another method) I > can use to > disable this test for power 64bit (but not 32bit) ? > I tried using vect_multiple_sizes, but that will also disable the test on > x86 without > avx. I'm also seeing new FAILs on Solaris/SPARC: +FAIL: gcc.dg/vect/vect-live-2.c -flto -ffat-lto-objects scan-tree-dump-times v ect "vectorized 1 loops" 1 +FAIL: gcc.dg/vect/vect-live-2.c scan-tree-dump-times vect "vectorized 1 loops" 1 32- and 64-bit: vect-live-2.c:16:3: note: not vectorized: relevant stmt not supported: _2 = j.0_1 * 4; vect-live-2.c:48:7: note: not vectorized: control flow in loop. vect-live-2.c:35:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed and +FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 4 +FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorized 1 loops" 4 +FAIL: gcc.dg/vect/vect-live-slp-3.c -flto -ffat-lto-objects scan-tree-dump-times vect "vectorizing stmts using SLP" 4 +FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect "vec_stmt_relevant_p: stmt live but not relevant" 4 +FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect "vectorized 1 loops" 4 +FAIL: gcc.dg/vect/vect-live-slp-3.c scan-tree-dump-times vect "vectorizing stmts using SLP" 4 vect-live-slp-3.c:29:1: note: not vectorized: no vectype for stmt: n0_29 = *_4; vect-live-slp-3.c:30:1: note: not vectorized: no vectype for stmt: n0_29 = *_4; vect-live-slp-3.c:31:1: note: not vectorized: no vectype for stmt: n0_29 = *_4; vect-live-slp-3.c:32:1: note: not vectorized: no vectype for stmt: n0_29 = *_4; vect-live-slp-3.c:62:4: note: not vectorized: control flow in loop. vect-live-slp-3.c:45:3: note: not vectorized: loop contains function calls or data references that cannot be analyzed Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Tue, Jun 07, 2016 at 11:23:01AM +0200, Christophe Lyon wrote: > > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj 2016-06-03 > > 17:05:37.693475438 +0200 > > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200 > > @@ -0,0 +1,28 @@ > > +/* PR tree-optimization/71259 */ > > +/* { dg-do run } */ > > +/* { dg-options "-O3" } */ Would changing this from dg-options to dg-additional-options help for the ARM issues? check_vect () is the standard way for testing for HW vectorization support and hundreds of tests use it. > > +/* { dg-additional-options "-mavx" { target avx_runtime } } */ > > + > > +#include "tree-vect.h" > > + > > +long a, b[1][44][2]; > > +long long c[44][17][2]; > > + > > +int > > +main () > > +{ > > + int i, j, k; > > + check_vect (); > > + asm volatile ("" : : : "memory"); > > + for (i = 0; i < 44; i++) > > +for (j = 0; j < 17; j++) > > + for (k = 0; k < 2; k++) > > + c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - > > 5105075050047261684; > > + asm volatile ("" : : : "memory"); > > + for (i = 0; i < 44; i++) > > +for (j = 0; j < 17; j++) > > + for (k = 0; k < 2; k++) > > + if (c[i][j][k] != -5105075050047261684) > > + __builtin_abort (); > > + return 0; > > +} > > > > This new test fails on ARM targets where the default FPU is not Neon like. > The error message I'm seeing is: > In file included from > /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0: > /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h: > In function 'check_vect': > /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5: > error: inconsistent operand constraints in an 'asm' > > Well, the same error message actually appears with other tests, I did > notice this one because > it is a new one. > > The arm code is: > /* On some processors without NEON support, this instruction may >be a no-op, on others it may trap, so check that it executes >correctly. */ > long long a = 0, b = 1; > asm ("vorr %P0, %P1, %P2" > : "=w" (a) > : "0" (a), "w" (b)); > > ... which has been here since 2007 :( > > IIUC, its purpose is to check Neon availability, but this makes the > tests fail instead of > being unsupported. > > Why not use an effective-target check instead? Jakub
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
Hi Jakub, On 3 June 2016 at 19:33, Jakub Jelinek wrote: > On Tue, Jan 12, 2016 at 05:21:37PM +0300, Ilya Enkovich wrote: >> > --- gcc/tree-vect-slp.c.jj 2016-01-08 21:45:57.0 +0100 >> > +++ gcc/tree-vect-slp.c 2016-01-11 12:07:19.633366712 +0100 >> > @@ -2999,12 +2999,9 @@ vect_get_constant_vectors (tree op, slp_ >> > gimple *init_stmt; >> > if (VECTOR_BOOLEAN_TYPE_P (vector_type)) >> > { >> > - gcc_assert (fold_convertible_p (TREE_TYPE >> > (vector_type), >> > - op)); >> > + gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op))); >> > init_stmt = gimple_build_assign (new_temp, NOP_EXPR, >> > op); >> >> In vect_init_vector we had to introduce COND_EXPR to choose between 0 and -1 >> for >> boolean vectors. Shouldn't we do similar in SLP? > > Apparently the answer to this is YES. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk/6.2? > > 2016-06-03 Jakub Jelinek > > PR tree-optimization/71259 > * tree-vect-slp.c (vect_get_constant_vectors): For > VECTOR_BOOLEAN_TYPE_P, return all ones constant instead of > one for constant op, and use COND_EXPR for non-constant. > > * gcc.dg/vect/pr71259.c: New test. > > --- gcc/tree-vect-slp.c.jj 2016-05-24 10:56:02.0 +0200 > +++ gcc/tree-vect-slp.c 2016-06-03 17:01:12.740955935 +0200 > @@ -3056,7 +3056,7 @@ vect_get_constant_vectors (tree op, slp_ > if (integer_zerop (op)) > op = build_int_cst (TREE_TYPE (vector_type), 0); > else if (integer_onep (op)) > - op = build_int_cst (TREE_TYPE (vector_type), 1); > + op = build_all_ones_cst (TREE_TYPE (vector_type)); > else > gcc_unreachable (); > } > @@ -3071,8 +3071,14 @@ vect_get_constant_vectors (tree op, slp_ > gimple *init_stmt; > if (VECTOR_BOOLEAN_TYPE_P (vector_type)) > { > + tree true_val > + = build_all_ones_cst (TREE_TYPE (vector_type)); > + tree false_val > + = build_zero_cst (TREE_TYPE (vector_type)); > gcc_assert (INTEGRAL_TYPE_P (TREE_TYPE (op))); > - init_stmt = gimple_build_assign (new_temp, NOP_EXPR, > op); > + init_stmt = gimple_build_assign (new_temp, COND_EXPR, > + op, true_val, > + false_val); > } > else > { > --- gcc/testsuite/gcc.dg/vect/pr71259.c.jj 2016-06-03 17:05:37.693475438 > +0200 > +++ gcc/testsuite/gcc.dg/vect/pr71259.c 2016-06-03 17:05:32.418544731 +0200 > @@ -0,0 +1,28 @@ > +/* PR tree-optimization/71259 */ > +/* { dg-do run } */ > +/* { dg-options "-O3" } */ > +/* { dg-additional-options "-mavx" { target avx_runtime } } */ > + > +#include "tree-vect.h" > + > +long a, b[1][44][2]; > +long long c[44][17][2]; > + > +int > +main () > +{ > + int i, j, k; > + check_vect (); > + asm volatile ("" : : : "memory"); > + for (i = 0; i < 44; i++) > +for (j = 0; j < 17; j++) > + for (k = 0; k < 2; k++) > + c[i][j][k] = (30995740 >= *(k + *(j + *b)) != (a != 8)) - > 5105075050047261684; > + asm volatile ("" : : : "memory"); > + for (i = 0; i < 44; i++) > +for (j = 0; j < 17; j++) > + for (k = 0; k < 2; k++) > + if (c[i][j][k] != -5105075050047261684) > + __builtin_abort (); > + return 0; > +} > This new test fails on ARM targets where the default FPU is not Neon like. The error message I'm seeing is: In file included from /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/pr71259.c:6:0: /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h: In function 'check_vect': /aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/testsuite/gcc.dg/vect/tree-vect.h:65:5: error: inconsistent operand constraints in an 'asm' Well, the same error message actually appears with other tests, I did notice this one because it is a new one. The arm code is: /* On some processors without NEON support, this instruction may be a no-op, on others it may trap, so check that it executes correctly. */ long long a = 0, b = 1; asm ("vorr %P0, %P1, %P2" : "=w" (a) : "0" (a), "w" (b)); ... which has been here since 2007 :( IIUC, its purpose is to check Neon availability, but this makes the tests fail instead of being unsupported. Why not use an effective-target check instead? Christophe. > > Jakub
Re: Unreviewed patches
Hi Gerald, > On Mon, 6 Jun 2016, Rainer Orth wrote: >> The following patches have remained unreviewed for a week: >> >> [gotools, libcc1] Update copyright dates >> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html >> >> Richard already approved the update-copyright.py changes, but the actual >> effects on gotools and libcc1 require either maintainer or release >> manager approval, I believe. > > I think applying those updates is a direct consequence of updating > the update-copyright.py script and you can just go ahead. done now. I'd just hoped to get word from the RMs what to do about default_dirs in the script so the same doesn't happen again next year. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] integer overflow checking builtins in constant expressions
On Mon, Jun 06, 2016 at 09:44:08PM +0200, Jakub Jelinek wrote: > Hi! > > On Mon, Jun 06, 2016 at 02:36:17PM +0200, Jakub Jelinek wrote: > > 2016-06-06 Martin Sebor > > Jakub Jelinek > > > > PR c++/70507 > > PR c/68120 > > * builtins.def (BUILT_IN_ADD_OVERFLOW_P, BUILT_IN_SUB_OVERFLOW_P, > > BUILT_IN_MUL_OVERFLOW_P): New builtins. > > * builtins.c: Include gimple-fold.h. > > (fold_builtin_arith_overflow): Handle > > BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. > > (fold_builtin_3): Likewise. > > * doc/extend.texi (Integer Overflow Builtins): Document > > __builtin_{add,sub,mul}_overflow_p. > > gcc/c/ > > * c-typeck.c (convert_arguments): Don't promote last argument > > of BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. > > gcc/cp/ > > * constexpr.c: Include gimple-fold.h. > > (cxx_eval_internal_function): New function. > > (cxx_eval_call_expression): Call it. > > (potential_constant_expression_1): Handle integer arithmetic > > overflow built-ins. > > * tree.c (builtin_valid_in_constant_expr_p): Likewise. > > gcc/c-family/ > > * c-common.c (check_builtin_function_arguments): Handle > > BUILT_IN_{ADD,SUB,MUL}_OVERFLOW_P. > > gcc/testsuite/ > > * c-c++-common/builtin-arith-overflow-1.c: Add test cases. > > * c-c++-common/builtin-arith-overflow-2.c: New test. > > * g++.dg/cpp0x/constexpr-arith-overflow.C: New test. > > * g++.dg/cpp1y/constexpr-arith-overflow.C: New test. > > Now successfully bootstrapped/regtested on x86_64-linux and i686-linux, ok > for trunk? I just played with this a bit -- nice. The c/ and c-family/ parts are OK. Marek
Re: [PATCH][2/3] Vectorize inductions that are live after the loop
On 05/06/2016 12:00, "Andreas Schwab" wrote: >Alan Hayward writes: > >> * gcc.dg/vect/vect-live-2.c: New test. > >This test fails on powerpc64 (with -m64, but not with -m32): > >$ grep 'vectorized.*loops' ./vect-live-2.c.149t.vect >../gcc/testsuite/gcc.dg/vect/vect-live-2.c:10:1: note: vectorized 0 loops >in function. >../gcc/testsuite/gcc.dg/vect/vect-live-2.c:29:1: note: vectorized 0 loops >in function. > > "note: not vectorized: relevant stmt not supported: _1 = (long unsigned int) j_24;" This is failing because power does not support vectorising a cast from int to long. (It works on power 32bit because longs are 32bit and therefore no need to cast). Can someone please suggest a target-supports define (or another method) I can use to disable this test for power 64bit (but not 32bit) ? I tried using vect_multiple_sizes, but that will also disable the test on x86 without avx. Thanks, Alan.
[AArch64] ARMv8.2 command line and feature macros support
ARMv8.2-A extends the ARMv8.1-A architecture, adding features that include an optional extension to support FP16 in floating-point and Adv.SIMD instructions. This patch adds the command line and feature flags for ARMv8.2-A and the FP16 support. The new command option -march=armv8.2-a will enable the new ARMv8.2-A feature macro, which is a superset of the ARMv8.1 features. The new feature "fp16" can be enabled by -march=armv8.2-a+fp16 which will enable ARMv8.2 FP16 extension support. The feature macro __ARM_FEATURE_FP16_SCALAR_ARITHMETIC is defined to be 1 if the scalar FP16 instructions are available, it is otherwise undefined. The feature macro __ARM_FEATURE_FP16_VECTOR_ARITHMETIC is defined to be 1 if the vector FP16 instructions are available, it is otherwise undefined. OK for trunk? Thanks. 2016-06-07 Matthew Wahab Jiong Wang * config/aarch64/aarch64-arches.def: Add "armv8.2-a". * config/aarch64/aarch64.h (AARCH64_FL_V8_2): New. (AARCH64_FL_F16): New. (AARCH64_FL_FOR_ARCH8_2): New. (AARCH64_ISA_8_2): New. (AARCH64_ISA_F16): New. (TARGET_FP_F16INST): New. (TARGET_SIMD_F16INST): New. * config/aarch64/aarch64-option-extensions.def: New entry for "fp16". * config/aarch64/aarch64-c.c (arch64_update_cpp_builtins): Conditionally define __ARM_FEATURE_FP16_SCALAR_ARITHMETIC and __ARM_FEATURE_FP16_VECTOR_ARITHMETIC. * doc/invoke.texi (AArch64 Options): Document "armv8.2-a" and "fp16". diff --git a/gcc/config/aarch64/aarch64-arches.def b/gcc/config/aarch64/aarch64-arches.def index 1e9d90b1b66a21ebbe4c238ce844c6fd3a192201..7dcf140411f6eb95504d9b92df9dadce50529a28 100644 --- a/gcc/config/aarch64/aarch64-arches.def +++ b/gcc/config/aarch64/aarch64-arches.def @@ -32,4 +32,5 @@ AARCH64_ARCH("armv8-a", generic, 8A, 8, AARCH64_FL_FOR_ARCH8) AARCH64_ARCH("armv8.1-a", generic, 8_1A, 8, AARCH64_FL_FOR_ARCH8_1) +AARCH64_ARCH("armv8.2-a", generic, 8_2A, 8, AARCH64_FL_FOR_ARCH8_2) diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c index e64dc7676ccae10e87ade3946904408fe425730d..3380ed6f2cd0ae35fd6a4e53177604256875e6de 100644 --- a/gcc/config/aarch64/aarch64-c.c +++ b/gcc/config/aarch64/aarch64-c.c @@ -95,6 +95,11 @@ aarch64_update_cpp_builtins (cpp_reader *pfile) else cpp_undef (pfile, "__ARM_FP"); + aarch64_def_or_undef (TARGET_FP_F16INST, + "__ARM_FEATURE_FP16_SCALAR_ARITHMETIC", pfile); + aarch64_def_or_undef (TARGET_SIMD_F16INST, + "__ARM_FEATURE_FP16_VECTOR_ARITHMETIC", pfile); + aarch64_def_or_undef (TARGET_SIMD, "__ARM_FEATURE_NUMERIC_MAXMIN", pfile); aarch64_def_or_undef (TARGET_SIMD, "__ARM_NEON", pfile); diff --git a/gcc/config/aarch64/aarch64-option-extensions.def b/gcc/config/aarch64/aarch64-option-extensions.def index e8706d1c2e798872b8028cce1d9d193df8fef0be..ff59dceabffe780adc84d953823800ba2abca5f3 100644 --- a/gcc/config/aarch64/aarch64-option-extensions.def +++ b/gcc/config/aarch64/aarch64-option-extensions.def @@ -55,3 +55,6 @@ AARCH64_OPT_EXTENSION("crc", AARCH64_FL_CRC, 0, 0, "crc32") /* Enabling or disabling "lse" only changes "lse". */ AARCH64_OPT_EXTENSION("lse", AARCH64_FL_LSE, 0, 0, "atomics") + +/* Enabling or disabling "fp16" only changes "fp16". */ +AARCH64_OPT_EXTENSION("fp16", AARCH64_FL_F16, 0, 0, "fp16") diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 15d7e4019adf207d5127ebba31af35a8b3437c5b..a71420382069b66e2faef15c238f07374683b4fc 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -135,6 +135,8 @@ extern unsigned aarch64_architecture_version; /* ARMv8.1 architecture extensions. */ #define AARCH64_FL_LSE (1 << 4) /* Has Large System Extensions. */ #define AARCH64_FL_V8_1 (1 << 5) /* Has ARMv8.1 extensions. */ +#define AARCH64_FL_V8_2 (1 << 8) /* Has ARMv8.2 features. */ +#define AARCH64_FL_F16 (1 << 9) /* Has ARMv8.2 FP16 extensions. */ /* Has FP and SIMD. */ #define AARCH64_FL_FPSIMD (AARCH64_FL_FP | AARCH64_FL_SIMD) @@ -146,6 +148,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_FL_FOR_ARCH8 (AARCH64_FL_FPSIMD) #define AARCH64_FL_FOR_ARCH8_1 \ (AARCH64_FL_FOR_ARCH8 | AARCH64_FL_LSE | AARCH64_FL_CRC | AARCH64_FL_V8_1) +#define AARCH64_FL_FOR_ARCH8_2 \ + (AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_V8_2) /* Macros to test ISA flags. */ @@ -155,6 +159,8 @@ extern unsigned aarch64_architecture_version; #define AARCH64_ISA_SIMD (aarch64_isa_flags & AARCH64_FL_SIMD) #define AARCH64_ISA_LSE (aarch64_isa_flags & AARCH64_FL_LSE) #define AARCH64_ISA_RDMA (aarch64_isa_flags & AARCH64_FL_V8_1) +#define AARCH64_ISA_V8_2 (aarch64_isa_flags & AARCH64_FL_V8_2) +#define AARCH64_ISA_F16 (aarch64_isa_flags & AARCH64_FL_F16) /* Crypto is an optional extension to AdvSIMD. */ #define TARGET_CRYPTO (TARGET_SIMD && AARCH64_ISA_CRYPT
Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
>> Please find the updated patch attached. >> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and >> arm-none-eabi. >> However the test-case added in the patch (neon-vect-div-1.c) fails to >> get vectorized at -O2 >> for armeb-none-linux-gnueabihf. >> Charles suggested me to try with -O3, which worked. >> It appears the test-case fails to get vectorized with >> -fvect-cost-model=cheap (which is default enabled at -O2) >> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic >> >> I can't figure out why it fails -fvect-cost-model=cheap. >> From the vect dump (attached): >> neon-vect-div-1.c:12:3: note: Setting misalignment to -1. >> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9 > Hi, > I think I have some idea why the test-case fails attached with patch > fail to get vectorized on armeb with -O2. > > Issue with big endian vectorizer: > The patch does not cause regressions on big endian vectorizer but > fails to vectorize the test-cases attached with the patch, while they > get vectorized on > litttle-endian. > Fails with armeb with the following message in dump: > note: not vectorized: unsupported unaligned load.*_9 > > The behavior of big and little endian vectorizer seems to be different > in arm_builtin_support_vector_misalignment() which overrides the hook > targetm.vectorize.support_vector_misalignment(). > > targetm.vectorize.support_vector_misalignment is called by > vect_supportable_dr_alignment () which in turn is called > by verify_data_refs_alignment (). > > Execution upto following condition is common between arm and armeb > in vect_supportable_dr_alignment(): > > if ((TYPE_USER_ALIGN (type) && !is_packed) > || targetm.vectorize.support_vector_misalignment (mode, type, > DR_MISALIGNMENT (dr), is_packed)) > /* Can't software pipeline the loads, but can at least do them. */ > return dr_unaligned_supported; > > For little endian case: > arm_builtin_support_vector_misalignment() is called with > V2SF mode and misalignment == -1, and the following condition > becomes true: > /* If the misalignment is unknown, we should be able to handle the access > so long as it is not to a member of a packed data structure. */ > if (misalignment == -1) > return true; > > Since the hook returned true we enter the condition above in > vect_supportable_dr_alignment() and return dr_unaligned_supported; > > For big-endian: > arm_builtin_support_vector_misalignment() is called with V2SF mode. > The following condition that gates the entire function body fails: > if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) > and the default hook gets called with V2SF mode and the default hook > returns false because > movmisalign_optab does not exist for V2SF mode. > > So the condition above in vect_supportable_dr_alignment() fails > and we come here: > /* Unsupported. */ > return dr_unaligned_unsupported; > > And hence we get the unaligned load not supported message in the dump > for armeb in verify_data_ref_alignment (): > > static bool > verify_data_ref_alignment (data_reference_p dr) > { > enum dr_alignment_support supportable_dr_alignment > = vect_supportable_dr_alignment (dr, false); > if (!supportable_dr_alignment) > { > if (dump_enabled_p ()) > { > if (DR_IS_READ (dr)) > dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, > "not vectorized: unsupported unaligned load."); > > With -O3, the test-cases vectorize for armeb, because loop peeling for > alignment > is turned on. > The above behavior is also reproducible with test-case which is > irrelevant to the patch. > for instance, we get the same unsupported unaligned load for following > test-case (replaced / with +) > > void > foo (int len, float * __restrict p, float *__restrict x) > { > len = len & ~31; > for (int i = 0; i < len; i++) > p[i] = p[i] + x[i]; > } > Is the patch OK to commit after bootstrap+test ? Thanks for the analysis - all the test needs is an additional marker to skip it on armeb (is there a helper for misaligned loads from the vectorizer ? ) - Ah probably vect_hw_misalign is sufficient for your usecase and you want to appropriately fix it for little endian arm with neon support enabled. From the patch. >>+ && flag_unsafe_math_optimizations && flag_reciprocal_math" Why do we need flag_unsafe_math_optimizations && flag_reciprocal_math ? flag_unsafe_math_optimizations should be sufficient since it enables flag_reciprocal_math - the reason for flag_unsafe_math_optimizations is to prevent loss of precision and the fact that on neon denormalized numbers are flushed to zero. Ok with that change and a quick test with vect_hw_misalign added to your testcase. Sorry about the delay in reviewing. Ramana > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh >>> >>> Thanks, >>> Ramana >>> Thanks,
Re: move increase_alignment from simple to regular ipa pass
On 3 June 2016 at 13:35, Jan Hubicka wrote: >> > fsection-anchors >> > Common Report Var(flag_section_anchors) >> > Access data in the same section from shared anchor points. >> >> Funny. I see the following on trunk: >> >> fsection-anchors >> Common Report Var(flag_section_anchors) Optimization >> Access data in the same section from shared anchor points. > > Aha, my local change from last year still inmy tree. Sorry. > Yep, having it as Optimization makes sense, but we need to be sure it works > as intended. >> >> > flag_section_anchors is not declared as Optimiation, so it can't be >> > function >> > specific right now. It probably should because it is an optimization. This >> > makes me wonder what happens when one function have anchors enabled and >> > other >> > doesn't? Probably anchroing or not anchoring the var will then depend on >> > what >> > function comes first in the compilation order and then we will need to make >> > backend grok the case where static var is anchored but >> > flag_section_anchors is >> > off. >> >> This is because we represent the anchor with DECL_RTL, right? Maybe >> DECL_RTL of globals needs to be re-computed for each function... > > I would rather anchor variable if it is used by at least one function that is > compiled > with anchors. Accessing anchors is IMO no slower than accessing symbols. But > I am not > that familiar witht his code... >> >> > I dunno what is the desired behaviour for LTOint together different code >> > models. >> >> Good question. There's always the choice to remove 'Optimization' and >> enforce same setting for all TUs we LTO in lto-wrapper. > > Yep. Not sure what is better - I did not really think of targets that use both > models. Um I am not really sure what to do next to convert increase_alignment to regular pass, I would be grateful for suggestions. Thanks, Prathamesh > > Honza >> >> Richard.
Re: [PATCH][AArch64] Model CSEL instruction in Cortex-A57 scheduling model
On Tue, Jun 07, 2016 at 09:22:05AM +0100, Ramana Radhakrishnan wrote: > > > On 06/06/16 17:10, Kyrill Tkachov wrote: > > Hi all, > > > > This small patch adds handling of the CSEL-type instructions to the > > Cortex-A57 scheduling model. > > It is treated the same as simple ALU instructions. > > > > With this patch I didn't see any overall differences in SPEC2006. > > > > Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-linux-gnu. > > > > Ok for trunk? > > > > The patch is very simple and the csel value isn't used in any arm > > instructions so I think > > just an aarch64 approval for this should be enough. > > > > Thanks, > > Kyrill > > > > This is ok by me - in these sorts of situations I think we should let the > aarch64 approvals be sufficient. I agree. I don't think the folder in which the file resides is the most important thing, it is the intent and risk of the patch that matters. This patch is OK for AArch64. Thanks, James > > 2016-06-06 Kyrylo Tkachov > > > > * config/arm/cortex-a57.md (cortex_a57_alu): > > Handle csel type. >
Re: [RFC] [2/2] divmod transform: override expand_divmod_libfunc for ARM and add test-cases
ping https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02008.html Thanks, Prathamesh On 25 May 2016 at 18:19, Prathamesh Kulkarni wrote: > On 23 May 2016 at 14:28, Prathamesh Kulkarni > wrote: >> Hi, >> This patch overrides expand_divmod_libfunc for ARM port and adds test-cases. >> I separated the SImode tests into separate file from DImode tests >> because certain arm configs (cortex-15) have hardware div insn for >> SImode but not for DImode, >> and for that config we want SImode tests to be disabled but not DImode tests. >> The patch therefore has two target-effective checks: divmod and >> divmod_simode. >> Cross-tested on arm*-*-*. >> Bootstrap+test on arm-linux-gnueabihf in progress. >> Does this patch look OK ? > Hi, > This version adds couple of more test-cases and fixes typo in > divmod-3-simode.c, divmod-4-simode.c > > Thanks, > Prathamesh >> >> Thanks, >> Prathamesh
Re: [ARM] implement division using vrecpe/vrecps with -funsafe-math-optimizations
On 30 May 2016 at 13:52, Prathamesh Kulkarni wrote: > On 23 May 2016 at 14:59, Prathamesh Kulkarni > wrote: >> On 5 February 2016 at 18:40, Prathamesh Kulkarni >> wrote: >>> On 4 February 2016 at 16:31, Ramana Radhakrishnan >>> wrote: On Sun, Jan 17, 2016 at 9:06 AM, Prathamesh Kulkarni wrote: > On 31 July 2015 at 15:04, Ramana Radhakrishnan > wrote: >> >> >> On 29/07/15 11:09, Prathamesh Kulkarni wrote: >>> Hi, >>> This patch tries to implement division with multiplication by >>> reciprocal using vrecpe/vrecps >>> with -funsafe-math-optimizations and -freciprocal-math enabled. >>> Tested on arm-none-linux-gnueabihf using qemu. >>> OK for trunk ? >>> >>> Thank you, >>> Prathamesh >>> >> >> I've tried this in the past and never been convinced that 2 iterations >> are enough to get to stability with this given that the results are only >> precise for 8 bits / iteration. Thus I've always believed you need 3 >> iterations rather than 2 at which point I've never been sure that it's >> worth it. So the testing that you've done with this currently is not >> enough for this to go into the tree. >> >> I'd like this to be tested on a couple of different AArch32 >> implementations with a wider range of inputs to verify that the results >> are acceptable as well as running something like SPEC2k(6) with atleast >> one iteration to ensure correctness. > Hi, > I got results of SPEC2k6 fp benchmarks: > a15: +0.64% overall, 481.wrf: +6.46% > a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% > a57: +0.35% overall, 481.wrf: +3.84% > The other benchmarks had (almost) identical results. Thanks for the benchmarking results - Please repost the patch with the changes that I had requested in my previous review - given it is now stage4 , I would rather queue changes like this for stage1 now. >>> Hi, >>> Please find the updated patch attached. >>> It passes testsuite for arm-none-linux-gnueabi, arm-none-linux-gnueabihf and >>> arm-none-eabi. >>> However the test-case added in the patch (neon-vect-div-1.c) fails to >>> get vectorized at -O2 >>> for armeb-none-linux-gnueabihf. >>> Charles suggested me to try with -O3, which worked. >>> It appears the test-case fails to get vectorized with >>> -fvect-cost-model=cheap (which is default enabled at -O2) >>> and passes for -fno-vect-cost-model / -fvect-cost-model=dynamic >>> >>> I can't figure out why it fails -fvect-cost-model=cheap. >>> From the vect dump (attached): >>> neon-vect-div-1.c:12:3: note: Setting misalignment to -1. >>> neon-vect-div-1.c:12:3: note: not vectorized: unsupported unaligned load.*_9 >> Hi, >> I think I have some idea why the test-case fails attached with patch >> fail to get vectorized on armeb with -O2. >> >> Issue with big endian vectorizer: >> The patch does not cause regressions on big endian vectorizer but >> fails to vectorize the test-cases attached with the patch, while they >> get vectorized on >> litttle-endian. >> Fails with armeb with the following message in dump: >> note: not vectorized: unsupported unaligned load.*_9 >> >> The behavior of big and little endian vectorizer seems to be different >> in arm_builtin_support_vector_misalignment() which overrides the hook >> targetm.vectorize.support_vector_misalignment(). >> >> targetm.vectorize.support_vector_misalignment is called by >> vect_supportable_dr_alignment () which in turn is called >> by verify_data_refs_alignment (). >> >> Execution upto following condition is common between arm and armeb >> in vect_supportable_dr_alignment(): >> >> if ((TYPE_USER_ALIGN (type) && !is_packed) >> || targetm.vectorize.support_vector_misalignment (mode, type, >> DR_MISALIGNMENT (dr), is_packed)) >> /* Can't software pipeline the loads, but can at least do them. */ >> return dr_unaligned_supported; >> >> For little endian case: >> arm_builtin_support_vector_misalignment() is called with >> V2SF mode and misalignment == -1, and the following condition >> becomes true: >> /* If the misalignment is unknown, we should be able to handle the access >> so long as it is not to a member of a packed data structure. */ >> if (misalignment == -1) >> return true; >> >> Since the hook returned true we enter the condition above in >> vect_supportable_dr_alignment() and return dr_unaligned_supported; >> >> For big-endian: >> arm_builtin_support_vector_misalignment() is called with V2SF mode. >> The following condition that gates the entire function body fails: >> if (TARGET_NEON && !BYTES_BIG_ENDIAN && unaligned_access) >> and the default hook gets called with V2SF mode and the default hook >> returns false because >> movmisalign_optab does not exist for V2SF mode. >> >> So the condition above in vect_supportable_dr_alignment() fails >> and we come here: >> /* Unsu
Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option
On Tue, 7 Jun 2016, Jakub Jelinek wrote: > On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote: > > > > This fixes PR61564 by diagnosing (and ignoring) options not marked with > > 'Optimization' being applied to #pragma GCC optimize or via the > > optimize attribute. > > > > The reason is that while we save/restore option state for 'Optimize' > > marked options we don't do that for other options. Thus while such > > options do not end up in the per-function optimize state applying them > > still clobbers the global state. > > > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > > > Ok for trunk? > > Ok (though it surprises me we haven't done that from the beginning). Yes, I was surprised by that as well... Richard. > > 2016-06-07 Richard Biener > > > > PR c/61564 > > * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION > > options and warn about others. > > > > * gcc.dg/Wpragmas-1.c: New testcase. > > * gcc.dg/Wattributes-4.c: Likewise.
Re: [PATCH] Fix PR61564 - optimize attribute/pragma accepting any option
On Tue, Jun 07, 2016 at 10:15:39AM +0200, Richard Biener wrote: > > This fixes PR61564 by diagnosing (and ignoring) options not marked with > 'Optimization' being applied to #pragma GCC optimize or via the > optimize attribute. > > The reason is that while we save/restore option state for 'Optimize' > marked options we don't do that for other options. Thus while such > options do not end up in the per-function optimize state applying them > still clobbers the global state. > > Bootstrap and regtest running on x86_64-unknown-linux-gnu. > > Ok for trunk? Ok (though it surprises me we haven't done that from the beginning). > 2016-06-07 Richard Biener > > PR c/61564 > * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION > options and warn about others. > > * gcc.dg/Wpragmas-1.c: New testcase. > * gcc.dg/Wattributes-4.c: Likewise. Jakub
Re: [PATCH][AArch64] Model CSEL instruction in Cortex-A57 scheduling model
On 06/06/16 17:10, Kyrill Tkachov wrote: > Hi all, > > This small patch adds handling of the CSEL-type instructions to the > Cortex-A57 scheduling model. > It is treated the same as simple ALU instructions. > > With this patch I didn't see any overall differences in SPEC2006. > > Bootstrapped and tested on arm-none-linux-gnueabihf and aarch64-linux-gnu. > > Ok for trunk? > > The patch is very simple and the csel value isn't used in any arm > instructions so I think > just an aarch64 approval for this should be enough. > > Thanks, > Kyrill > > 2016-06-06 Kyrylo Tkachov > > * config/arm/cortex-a57.md (cortex_a57_alu): > Handle csel type. This is ok by me - in these sorts of situations I think we should let the aarch64 approvals be sufficient. regards Ramana
Re: [PATCH][ARM] Add initial support for Cortex-A73
On 06/06/16 17:18, Kyrill Tkachov wrote: > Hi all, > > This patch adds initial support for the Cortex-A73 processor through the > cortex-a73, cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to > -mcpu and -mtune. > > The Cortex-A73 is an ARMv8-A processor. > > Bootstrapped and tested on arm-none-linux-gnueabihf with an appropriately > patched binutils that understands the relevant -mcpu argument. > > Ok for trunk? > Ok. Any reason why we can't move the ARM port also to a world where we work with .arch directives instead of .cpu directives like the aarch64 port.? Thanks, Ramana > Thanks, > Kyrill > > 2016-06-06 Kyrylo Tkachov > > * config/arm/arm.c (arm_cortex_a73_tune): New struct. > * config/arm/arm-cores.def (cortex-a73): New entry. > (cortex-a73.cortex-a35): Likewise. > (cortex-a73.cortex-a53): Likewise. > * config/arm/arm-tables.opt: Regenerate. > * config/arm/arm-tune.md: Likewise. > * config/arm/bpabi.h (BE8_LINK_SPEC): Handle mcpu=cortex-a73, > mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53. > * config/arm/t-aprofile: Handle mcpu=cortex-a73, > mcpu=cortex-a73.cortex-a35 and mcpu=cortex-a73.cortex-a53. > * doc/invoke.texi (ARM Options): Document cortex-a73, > cortex-a73.cortex-a35 and cortex-a73.cortex-a53.
[PATCH] Fix PR61564 - optimize attribute/pragma accepting any option
This fixes PR61564 by diagnosing (and ignoring) options not marked with 'Optimization' being applied to #pragma GCC optimize or via the optimize attribute. The reason is that while we save/restore option state for 'Optimize' marked options we don't do that for other options. Thus while such options do not end up in the per-function optimize state applying them still clobbers the global state. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Ok for trunk? Note this also points to the unfortunate ways the FEs apply the pragma to functions which seems to be via adding optimize attributes literally rather than copying a tree optimization node constructable by parse_optimize_options. That results in excessive diagnostics also reproducable without the patch if you use sth like #pragma GCC optimize ("-Xhost"). Thanks, Richard. 2016-06-07 Richard Biener PR c/61564 * c-common.c (parse_optimize_options): Only apply CL_OPTIMIZATION options and warn about others. * gcc.dg/Wpragmas-1.c: New testcase. * gcc.dg/Wattributes-4.c: Likewise. Index: gcc/c-family/c-common.c === *** gcc/c-family/c-common.c (revision 237167) --- gcc/c-family/c-common.c (working copy) *** parse_optimize_options (tree args, bool *** 9580,9585 --- 9580,9608 decode_cmdline_options_to_array_default_mask (opt_argc, opt_argv, &decoded_options, &decoded_options_count); + /* Drop non-Optimization options. */ + unsigned j = 1; + for (i = 1; i < decoded_options_count; ++i) + { + if (! (cl_options[decoded_options[i].opt_index].flags & CL_OPTIMIZATION)) + { + ret = false; + if (attr_p) + warning (OPT_Wattributes, +"bad option %s to optimize attribute", +decoded_options[i].orig_option_with_args_text); + else + warning (OPT_Wpragmas, +"bad option %s to pragma attribute", +decoded_options[i].orig_option_with_args_text); + continue; + } + if (i != j) + decoded_options[j] = decoded_options[i]; + j++; + } + decoded_options_count = j; + /* And apply them. */ decode_options (&global_options, &global_options_set, decoded_options, decoded_options_count, input_location, global_dc); Index: gcc/testsuite/gcc.dg/Wpragmas-1.c === *** gcc/testsuite/gcc.dg/Wpragmas-1.c (revision 0) --- gcc/testsuite/gcc.dg/Wpragmas-1.c (working copy) *** *** 0 --- 1,11 + /* { dg-do compile } */ + + #pragma GCC push_options + #pragma GCC optimize ("-fno-lto") /* { dg-warning "bad option" } */ + int main(void){return 0;} + #pragma GCC pop_options + + /* ??? The FEs still apply the pragma string as optimize attribute to +all functions thus the diagnostic will be repeated for each function +affected. */ + /* { dg-message "bad option" "" { target *-*-* } 0 } */ Index: gcc/testsuite/gcc.dg/Wattributes-4.c === *** gcc/testsuite/gcc.dg/Wattributes-4.c(revision 0) --- gcc/testsuite/gcc.dg/Wattributes-4.c(working copy) *** *** 0 --- 1,3 + /* { dg-do compile } */ + + int __attribute__((optimize("no-lto"))) main(void){return 0;} /* { dg-warning "bad option" } */
PING: [PATCH] PR33661 Fix problem with register asm in templates
Hi Jason, could you please have a look? https://gcc.gnu.org/ml/gcc-patches/2015-06/msg00904.html Thanks! -Andreas-
Re: [PATCH][3/3][RTL ifcvt] PR middle-end/37780: Conditional expression with __builtin_clz() should be optimized out
Kyrill Tkachov writes: > +static rtx > +simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val) > +{ > + if (cmp_code != EQ && cmp_code != NE) > +return NULL_RTX; > + > + /* Result on X == 0 and X !=0 respectively. */ > + rtx on_zero, on_nonzero; > + if (cmp_code == EQ) > +{ > + on_zero = true_val; > + on_nonzero = false_val; > +} > + else > +{ > + on_zero = false_val; > + on_nonzero = true_val; > +} > + > + rtx_code op_code = GET_CODE (on_nonzero); > + if ((op_code != CLZ && op_code != CTZ) > + || !rtx_equal_p (XEXP (on_nonzero, 0), x) > + || !CONST_INT_P (on_zero)) > +return NULL_RTX; > + > + HOST_WIDE_INT op_val; > + machine_mode mode = GET_MODE (on_nonzero); > + if (((op_code == CLZ && CLZ_DEFINED_VALUE_AT_ZERO (mode, op_val)) > + || (op_code == CTZ && CTZ_DEFINED_VALUE_AT_ZERO (mode, op_val))) > + && op_val == INTVAL (on_zero)) > +return on_nonzero; > + > + return NULL_RTX; > +} ../../gcc/simplify-rtx.c: In function 'rtx_def* simplify_cond_clz_ctz(rtx, rtx_code, rtx, rtx)': ../../gcc/simplify-rtx.c:5304:16: error: unused variable 'mode' [-Werror=unused-variable] machine_mode mode = GET_MODE (on_nonzero); ^~~~ Andreas. -- Andreas Schwab, SUSE Labs, sch...@suse.de GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7 "And now for something completely different."
Re: [PATCH,rs6000] Add built-in function support for new Power9 vector absolute difference unsigned instructions
On Tue, Jun 7, 2016 at 1:58 AM, Kelvin Nilsen wrote: > > This patch adds built-in function support for the ISA 3.0 vabsub, > vabsduh, and vabsduw instructions. > > I have bootstrapped and tested on powerpc64le-unkonwn-linux-gnu with no > regressions. Is this ok for the trunk? > > I have also tested against the gcc-6 branch without regressions. Is > this ok for backporting to gcc6 after a few days of burn-in time on the > trunk? It sounds like these match SAD_EXPR and thus should allow vectorizing gcc.dg/vect/slp-reduc-sad.c and gcc.dg/vect/vect-reduc-sad.c using SAD? Richard. > gcc/testsuite/ChangeLog: > > 2016-06-06 Kelvin Nilsen > > * gcc.target/powerpc/vadsdu-0.c: New test. > * gcc.target/powerpc/vadsdu-1.c: New test. > * gcc.target/powerpc/vadsdu-2.c: New test. > * gcc.target/powerpc/vadsdu-3.c: New test. > * gcc.target/powerpc/vadsdu-4.c: New test. > * gcc.target/powerpc/vadsdu-5.c: New test. > * gcc.target/powerpc/vadsdub-1.c: New test. > * gcc.target/powerpc/vadsdub-2.c: New test. > * gcc.target/powerpc/vadsduh-1.c: New test. > * gcc.target/powerpc/vadsduh-2.c: New test. > * gcc.target/powerpc/vadsduw-1.c: New test. > * gcc.target/powerpc/vadsduw-2.c: New test. > > > gcc/ChangeLog: > > 2016-06-06 Kelvin Nilsen > > * config/rs6000/altivec.h (vec_adu): New macro for vector absolute > difference unsigned. > (vec_adub): New macro for vector absolute difference unsigned > byte. > (vec_aduh): New macro for vector absolute difference unsigned > half-word. > (vec_aduw): New macro for vector absolute difference unsigned word. > * config/rs6000/altivec.md (UNSPEC_VADU): New value. > (vadu3): New insn. > (*p9_vadu3): New insn. > * config/rs6000/rs6000-builtin.def (vadub): New built-in > definition. > (vaduh): New built-in definition. > (vaduw): New built-in definition. > (vadu): New overloaded built-in definition. > (vadub): New overloaded built-in definition. > (vaduh): New overloaded built-in definition. > (vaduw): New overloaded built-in definition. > * config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add > overloaded vector absolute difference unsigned functions. > * doc/extend.texi (PowerPC AltiVec Built-in Functions): Document > the ISA 3.0 vector absolute difference unsigned built-in functions. > > Index: gcc/config/rs6000/altivec.h > === > --- gcc/config/rs6000/altivec.h (revision 237045) > +++ gcc/config/rs6000/altivec.h (working copy) > @@ -401,6 +401,11 @@ > #define vec_vprtybq __builtin_vec_vprtybq > #endif > > +#define vec_adu __builtin_vec_vadu > +#define vec_adub __builtin_vec_vadub > +#define vec_aduh __builtin_vec_vaduh > +#define vec_aduw __builtin_vec_vaduw > + > #define vec_slv __builtin_vec_vslv > #define vec_srv __builtin_vec_vsrv > #endif > Index: gcc/config/rs6000/altivec.md > === > --- gcc/config/rs6000/altivec.md(revision 237045) > +++ gcc/config/rs6000/altivec.md(working copy) > @@ -114,6 +114,7 @@ > UNSPEC_STVLXL > UNSPEC_STVRX > UNSPEC_STVRXL > + UNSPEC_VADU > UNSPEC_VSLV > UNSPEC_VSRV > UNSPEC_VMULWHUB > @@ -3464,6 +3465,25 @@ >[(set_attr "length" "4") > (set_attr "type" "vecsimple")]) > > +;; Vector absolute difference unsigned > +(define_expand "vadu3" > + [(set (match_operand:VI 0 "register_operand" "") > +(unspec:VI [(match_operand:VI 1 "register_operand" "") > + (match_operand:VI 2 "register_operand" "")] > + UNSPEC_VADU))] > + "TARGET_P9_VECTOR") > + > +;; Vector absolute difference unsigned > +(define_insn "*p9_vadu3" > + [(set (match_operand:VI 0 "register_operand" "=v") > +(unspec:VI [(match_operand:VI 1 "register_operand" "v") > + (match_operand:VI 2 "register_operand" "v")] > + UNSPEC_VADU))] > + "TARGET_P9_VECTOR" > + "vabsdu %0, %1, %2" > + [(set_attr "type" "add") > + (set_attr "length" "4")]) > + > ;; Vector count trailing zeros > (define_insn "*p9v_ctz2" >[(set (match_operand:VI2 0 "register_operand" "=v") > Index: gcc/config/rs6000/rs6000-builtin.def > === > --- gcc/config/rs6000/rs6000-builtin.def(revision 237045) > +++ gcc/config/rs6000/rs6000-builtin.def(working copy) > @@ -1757,6 +1757,17 @@ BU_P9V_AV_2 (VSRV, "vsrv", > CONST, vsrv) > BU_P9V_OVERLOAD_2 (VSLV, "vslv") > BU_P9V_OVERLOAD_2 (VSRV, "vsrv") > > +/* 2 argument vector functions added in ISA 3.0 (power9). */ > +BU_P9V_AV_2 (VADUB,"vadub",CONST, vaduv16qi3) > +BU_P9V_AV_2 (VADUH,"vaduh",
[PATCH] Fix PR71428
This fixes BIT_FIELD_REF handling in bswap pass where we need to distinguish an "operation" vs. a load properly. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-06-07 Richard Biener PR tree-optimization/71428 * tree-ssa-math-opts.c (perform_symbolic_merge): Properly distinguish BIT_FIELD_REF op vs. load. * gcc.dg/torture/pr71428.c: New testcase. Index: gcc/tree-ssa-math-opts.c === *** gcc/tree-ssa-math-opts.c(revision 237117) --- gcc/tree-ssa-math-opts.c(working copy) *** perform_symbolic_merge (gimple *source_s *** 2164,2173 struct symbolic_number *n_start; tree rhs1 = gimple_assign_rhs1 (source_stmt1); ! if (TREE_CODE (rhs1) == BIT_FIELD_REF) rhs1 = TREE_OPERAND (rhs1, 0); tree rhs2 = gimple_assign_rhs1 (source_stmt2); ! if (TREE_CODE (rhs2) == BIT_FIELD_REF) rhs2 = TREE_OPERAND (rhs2, 0); /* Sources are different, cancel bswap if they are not memory location with --- 2164,2175 struct symbolic_number *n_start; tree rhs1 = gimple_assign_rhs1 (source_stmt1); ! if (TREE_CODE (rhs1) == BIT_FIELD_REF ! && TREE_CODE (TREE_OPERAND (rhs1, 0)) == SSA_NAME) rhs1 = TREE_OPERAND (rhs1, 0); tree rhs2 = gimple_assign_rhs1 (source_stmt2); ! if (TREE_CODE (rhs2) == BIT_FIELD_REF ! && TREE_CODE (TREE_OPERAND (rhs2, 0)) == SSA_NAME) rhs2 = TREE_OPERAND (rhs2, 0); /* Sources are different, cancel bswap if they are not memory location with Index: gcc/testsuite/gcc.dg/torture/pr71428.c === *** gcc/testsuite/gcc.dg/torture/pr71428.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr71428.c (working copy) *** *** 0 --- 1,20 + /* { dg-do run } */ + /* { dg-additional-options "-fno-tree-forwprop -Wno-psabi -w" } */ + + typedef unsigned short v64u16 __attribute__ ((vector_size (64))); + + v64u16 + foo (v64u16 p1) + { + p1[31] |= p1[1]; + return p1; + } + + int + main () + { + v64u16 x = foo ((v64u16){ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 }); + if (x[31] != 1) + __builtin_abort(); + return 0; + }
[PATCH] Fix PR71423
This fixes an omission when moving tree-ssa-forwprop.c code to match.pd in GCC 5 times. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2016-06-07 Richard Biener PR middle-end/71423 * match.pd ((X | ~Y) -> Y <= X): Properly invert the comparison for signed ops. * gcc.dg/torture/pr71423.c: New testcase. Index: gcc/match.pd === *** gcc/match.pd(revision 237117) --- gcc/match.pd(working copy) *** DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) *** 900,911 (ne (bit_and:c (bit_not @0) @1) integer_zerop) (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) && TYPE_PRECISION (TREE_TYPE (@1)) == 1) !(lt @0 @1))) (simplify (ne (bit_ior:c (bit_not @0) @1) integer_zerop) (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) && TYPE_PRECISION (TREE_TYPE (@1)) == 1) !(le @0 @1))) /* ~~x -> x */ (simplify --- 900,915 (ne (bit_and:c (bit_not @0) @1) integer_zerop) (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) && TYPE_PRECISION (TREE_TYPE (@1)) == 1) !(if (TYPE_UNSIGNED (TREE_TYPE (@1))) ! (lt @0 @1) ! (gt @0 @1 (simplify (ne (bit_ior:c (bit_not @0) @1) integer_zerop) (if (INTEGRAL_TYPE_P (TREE_TYPE (@1)) && TYPE_PRECISION (TREE_TYPE (@1)) == 1) !(if (TYPE_UNSIGNED (TREE_TYPE (@1))) ! (le @0 @1) ! (ge @0 @1 /* ~~x -> x */ (simplify Index: gcc/testsuite/gcc.dg/torture/pr71423.c === *** gcc/testsuite/gcc.dg/torture/pr71423.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr71423.c (working copy) *** *** 0 --- 1,20 + /* { dg-do run } */ + + struct S1 + { + int f1:1; + }; + + volatile struct S1 b = { 0 }; + + int + main () + { + char c = b.f1; + b.f1 = 1; + + if (b.f1 > -1 || c) + __builtin_abort (); + + return 0; + }
Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)
On Mon, 6 Jun 2016, Jakub Jelinek wrote: > On Mon, Jun 06, 2016 at 10:05:57AM +0200, Richard Biener wrote: > > So this ends up generating { a ? -1 : 0, b ? -1 : 0, ... }. That > > Yes, that is already what we do now for loop vectorization. > > > might be less optimal than doing { a, b, ... } ? { -1, -1 ... } : { 0, 0, > > .. } > > Well, it would need to be > { a, b, ... } != { 0, 0, ... } ? { -1, -1, ... } : { 0, 0, ... } > then, doesn't VEC_COND_EXPR assume the condition is in canonical > VECTOR_BOOLEAN_TYPE_P form? > > Anyway, if something like the above would be faster, perhaps generic vector > lowering or some similar pass could detect that case post-vectorization and > optimize? Yeah. OTOH an "ideal" vectorizer would already consider the smaller prologue cost for its cost modeling. Richard.
Re: loop-ch tweek
On Mon, 6 Jun 2016, Jan Hubicka wrote: > Hi, > does this look better? Can you make the argument of gimple_inexpensive_call_p a gcall * please? Ok with that change. Thanks, Richard. > Honza > > * gimple.c: Include builtins.h > (gimple_inexpensive_call_p): New function. > * gimple.h (gimple_inexpensive_call_p): Declare. > * tree-ssa-loop-ch.c (should_duplicate_loop_header_p): Use it. > * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Likewise. > Index: gimple.c > === > --- gimple.c (revision 237101) > +++ gimple.c (working copy) > @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. > #include "gimple-walk.h" > #include "gimplify.h" > #include "target.h" > +#include "builtins.h" > > > /* All the tuples have their operand vector (if present) at the very bottom > @@ -3018,3 +3019,16 @@ maybe_remove_unused_call_args (struct fu >update_stmt_fn (fn, stmt); > } > } > + > +/* Return false if STMT will likely expand to real function call. */ > + > +bool > +gimple_inexpensive_call_p (gimple *stmt) > +{ > + if (gimple_call_internal_p (stmt)) > +return true; > + tree decl = gimple_call_fndecl (stmt); > + if (decl && is_inexpensive_builtin (decl)) > +return true; > + return false; > +} > Index: gimple.h > === > --- gimple.h (revision 237101) > +++ gimple.h (working copy) > @@ -1525,6 +1525,7 @@ extern void preprocess_case_label_vec_fo > extern void gimple_seq_set_location (gimple_seq, location_t); > extern void gimple_seq_discard (gimple_seq); > extern void maybe_remove_unused_call_args (struct function *, gimple *); > +extern bool gimple_inexpensive_call_p (gimple *); > > /* Formal (expression) temporary table handling: multiple occurrences of > the same scalar expression are evaluated into the same temporary. */ > Index: tree-ssa-loop-ch.c > === > --- tree-ssa-loop-ch.c(revision 237101) > +++ tree-ssa-loop-ch.c(working copy) > @@ -118,7 +118,8 @@ should_duplicate_loop_header_p (basic_bl >if (is_gimple_debug (last)) > continue; > > - if (is_gimple_call (last)) > + if (gimple_code (last) == GIMPLE_CALL > + && !gimple_inexpensive_call_p (last)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > Index: tree-ssa-loop-ivcanon.c > === > --- tree-ssa-loop-ivcanon.c (revision 237101) > +++ tree-ssa-loop-ivcanon.c (working copy) > @@ -339,15 +339,11 @@ tree_estimate_loop_size (struct loop *lo >for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi)) > { > gimple *stmt = gsi_stmt (gsi); > - if (gimple_code (stmt) == GIMPLE_CALL) > + if (gimple_code (stmt) == GIMPLE_CALL > + && !gimple_inexpensive_call_p (stmt)) > { > int flags = gimple_call_flags (stmt); > - tree decl = gimple_call_fndecl (stmt); > - > - if (decl && DECL_IS_BUILTIN (decl) > - && is_inexpensive_builtin (decl)) > - ; > - else if (flags & (ECF_PURE | ECF_CONST)) > + if (flags & (ECF_PURE | ECF_CONST)) > size->num_pure_calls_on_hot_path++; > else > size->num_non_pure_calls_on_hot_path++; > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)