Re: [PATCH] Add arm_cortex_m7_tune.
On Tue, Oct 21, 2014 at 10:48 AM, Hale Wang hale.w...@arm.com wrote: Hi, This patch is used to tune the gcc for Cortex-M7. The performance of Dhrystone can be improved by 1%. The performance of Coremark can be improved by 2.3%. Patch also attached for convenience. Is it ok for trunk? Thanks and Best Regards, Hale Wang Ok - For the future please include [Patch ARM] in your subject line. Otherwise my filters won't catch these as requiring attention for the ARM backend. regards Ramana gcc/ChangeLog 2014-10-11 Hale Wang hale.w...@arm.com * config/arm/arm.c: Add cortex-m7 tune. * config/arm/arm-cores.def: Use cortex-m7 tune. diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def index 56ec7fd..3b34173 100644 --- a/gcc/config/arm/arm-cores.def +++ b/gcc/config/arm/arm-cores.def @@ -149,7 +149,7 @@ ARM_CORE(cortex-r4, cortexr4, cortexr4, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r4f,cortexr4f, cortexr4f, 7R, FL_LDSCHED, cortex) ARM_CORE(cortex-r5, cortexr5, cortexr5, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) ARM_CORE(cortex-r7, cortexr7, cortexr7, 7R, FL_LDSCHED | FL_ARM_DIV, cortex) -ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, v7m) +ARM_CORE(cortex-m7, cortexm7, cortexm7, 7EM, FL_LDSCHED, cortex_m7) ARM_CORE(cortex-m4, cortexm4, cortexm4, 7EM, FL_LDSCHED, v7m) ARM_CORE(cortex-m3, cortexm3, cortexm3, 7M, FL_LDSCHED, v7m) ARM_CORE(marvell-pj4, marvell_pj4, marvell_pj4, 7A, FL_LDSCHED, 9e) diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 93b989d..834b13a 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -2003,6 +2003,27 @@ const struct tune_params arm_v7m_tune = 8 /* Maximum insns to inline memset. */ }; +/* Cortex-M7 tuning. */ + +const struct tune_params arm_cortex_m7_tune = +{ + arm_9e_rtx_costs, + v7m_extra_costs, + NULL, /* Sched adj cost. */ + 0, /* Constant limit. */ + 0, /* Max cond insns. */ + ARM_PREFETCH_NOT_BENEFICIAL, + true, /* Prefer constant pool. */ + arm_cortex_m_branch_cost, + false, /* Prefer LDRD/STRD. */ + {true, true}, /* Prefer non short circuit. */ + arm_default_vec_cost,/* Vectorizer costs. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false, /* Prefer 32-bit encodings. */ + false, /* Prefer Neon for stringops. */ + 8 /* Maximum insns to inline memset. */ +}; + /* The arm_v6m_tune is duplicated from arm_cortex_tune, rather than arm_v6t2_tune. It is used for cortex-m0, cortex-m1 and cortex-m0plus. */ const struct tune_params arm_v6m_tune =
Re: [1/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target
On Tue, Oct 21, 2014 at 10:18 AM, Terry Guo terry@arm.com wrote: Hi There, This is the first patch to enable GCC generate UAL assembly code for Thumb1 target. This new option enables user to specify which syntax is used in their inline assembly code. If the inline assembly code uses UAL format, then gcc does nothing because gcc generates UAL code as well. If the inline assembly code uses non-UAL, then gcc will insert some directives in final assembly code. Is it ok to trunk? BR, Terry 2014-10-21 Terry Guo terry@arm.com * config/arm/arm.h (TARGET_UNIFIED_ASM): Also include thumb1. (ASM_APP_ON): Redefined. * config/arm/arm.c (arm_option_override): Thumb2 always uses UAL for inline assembly code. * config/arm/arm.opt (masm-syntax-unified): New option. * doc/invoke.texi (-masm-syntax-unified): Document new option. Minor tweaks to documentation are required about ARM state, you should also state that currently we default to divided syntax in ARM state too. Otherwise I'm ok with this. This deserves a news article entry - so please consider writing something up for wwwdocs. Ramana
Re: [2/2][PATCH,ARM]Generate UAL assembly code for Thumb-1 target
On Tue, Oct 21, 2014 at 12:19 PM, Terry Guo terry@arm.com wrote: Hi there, Attached patch intends to enable GCC generate UAL format code for Thumb1 target. Tested with regression test and no regressions. Is it OK to trunk? This is OK - Please don't commit it until we have sorted out patch 1/2 with the options. I'd encourage a follow-up patch to switch ARM state also to UAL :) ... Ramana BR, Terry 2014-10-21 Terry Guo terry@arm.com * config/arm/arm.c (arm_output_mi_thunk): Use UAL for Thumb1 target. * config/arm/thumb1.md: Likewise. gcc/testsuite 2014-10-21 Terry Guo terry@arm.com * gcc.target/arm/anddi_notdi-1.c: Match with UAL format. * gcc.target/arm/pr40956.c: Likewise. * gcc.target/arm/thumb1-Os-mult.c: Likewise. * gcc.target/arm/thumb1-load-64bit-constant-3.c: Likewise.
Re: [libgomp, libiberty, libobjc] Fix gnu11 fallout on Solaris 10+
On 11/03/2014 05:22 PM, Rainer Orth wrote: 2014-10-22 Rainer Orth r...@cebitec.uni-bielefeld.de libobjc: * thr.c (_XOPEN_SOURCE): Define as 600. libiberty: * sigsetmask.c (_POSIX_SOURCE): Remove. libgomp: * config/posix/lock.c (_XOPEN_SOURCE) Define as 600. Ok. r~
Re: [COMMITTED][PATCH][ARM]Add ACLE 2.0 predefined marco __ARM_FEATURE_IDIV
On Thu, Oct 23, 2014 at 11:30 AM, Renlin Li renlin...@arm.com wrote: Are you sure that the ACLE documents this with trailing underscores ? The copy that I have doesn't. You are right, it's my incaution. I have double checked, the macro should be __ARM_FEATURE_IDIV. Could you please do a obvious fix? Thank you so much! applied this as obvious. Ramana 2014-11-04 Ramana Radhakrishnan ramana.radhakrish...@arm.com * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Fix typo in definition of __ARM_FEATURE_IDIV. Kind regards, Renlin Index: gcc/config/arm/arm.h === --- gcc/config/arm/arm.h(revision 217070) +++ gcc/config/arm/arm.h(working copy) @@ -166,7 +166,7 @@ if (TARGET_IDIV)\ { \ builtin_define (__ARM_ARCH_EXT_IDIV__); \ -builtin_define (__ARM_FEATURE_IDIV__); \ +builtin_define (__ARM_FEATURE_IDIV); \ } \ } while (0)
Re: [RFC PATCH, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses
On Mon, Nov 3, 2014 at 10:00 PM, Uros Bizjak ubiz...@gmail.com wrote: Following patch fixes PR 63538, where the data in the large data section was accessed through 32bit address. The patch unifies places where large data sections are determined and passes all declarations to ix86_in_large_data_p only. The patch fixes the testcase form the PR. Also, the code from several tests involving various -mlarge-data-threshold= settings looks consistent now. 2014-11-03 Uros Bizjak ubiz...@gmail.com PR target/63538 * config/i386/i386.c (ix86_encode_section_info): Do not check TREE_STATIC and DECL_EXTERNAL when setting SYMBOL_FLAG_FAR_ADDR flag. (x86_64_elf_select_section): Do not check ix86_cmodel here. (x86_64_elf_unique_section): Ditto. The patch was bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. However, the -mcmodel testcases are virtually non-existent, this is the reason for the RFC status of the patch. I forgot to add the original reporter to CC ... We probably want to avoid putting automatic variables (and STRING_CSTs ?) to large data section. However, it is important that we use the same condition in ix86_encode_section_info and x86_64_elf_select_section (and likes), otherwise the data can go into large section, while the access to the data will be via limited 32bit pointers. The attached alternative patch rejects automatic variables and STRING_CSTs from large data sections. Please note, that it still access the data with the correct instructions. Honza, can you please comment the approach from the ABI perspective - should we reject automatic variables and STRING_CSTs from going into large section? Uros. Index: config/i386/i386.c === --- config/i386/i386.c (revision 217069) +++ config/i386/i386.c (working copy) @@ -5066,6 +5066,14 @@ ix86_in_large_data_p (tree exp) if (TREE_CODE (exp) == FUNCTION_DECL) return false; + /* Strings are never large data. */ + if (TREE_CODE (exp) == STRING_CST) +return false; + + /* Automatic variables are never large data. */ + if (TREE_CODE (exp) == VAR_DECL !is_global_var (exp)) +return false; + if (TREE_CODE (exp) == VAR_DECL DECL_SECTION_NAME (exp)) { const char *section = DECL_SECTION_NAME (exp); @@ -5099,8 +5107,7 @@ ATTRIBUTE_UNUSED static section * x86_64_elf_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align) { - if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) { const char *sname = NULL; unsigned int flags = SECTION_WRITE; @@ -5186,8 +5193,7 @@ x86_64_elf_section_type_flags (tree decl, const ch static void ATTRIBUTE_UNUSED x86_64_elf_unique_section (tree decl, int reloc) { - if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) { const char *prefix = NULL; /* We only need to use .gnu.linkonce if we don't have COMDAT groups. */ @@ -44230,9 +44236,7 @@ ix86_encode_section_info (tree decl, rtx rtl, int { default_encode_section_info (decl, rtl, first); - if (TREE_CODE (decl) == VAR_DECL - (TREE_STATIC (decl) || DECL_EXTERNAL (decl)) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FAR_ADDR; }
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
.. thanks a lot Jon! (after all this parallel mode is still useful for something ;) Paolo.
Re: [AArch64] Fix predicate and constraint mismatch in logical atomic operations
On 25 September 2014 04:45, Michael Collison michael.colli...@linaro.org wrote: On certain patterns in atomics.md the constraint 'n' is used in combination with the predicate atomic_op_operand. The constraint is too general and allows constants that are disallowed by the predicate. This causes an ICE In final_scan_insn when the insn cannot be split because the constraint and predicate do not match. Tested on aarch64-none-elf, aarch64-linux-gnu. Additionally the originally reporter of the bug, (d...@ubuntu.com), applied the patch and successfully bootstrapped and tested with no regressions. 2014-09-23 Michael Collison michael.colli...@linaro.org * config/aarch64/iterators.md (lconst_atomic): New mode attribute to support constraints for CONST_INT in atomic operations. * config/aarch64/atomics.md (atomic_atomic_optabmode): Use lconst_atomic constraint. (atomic_nandmode): Likewise. (atomic_fetch_atomic_optabmode): Likewise. (atomic_fetch_nandmode): Likewise. (atomic_atomic_optab_fetchmode): Likewise. (atomic_nand_fetchmode): Likewise. OK Thanks. /Marcus
Re: [AArch64] Make gentune.sh also generate generic_sched attribute
On 25 September 2014 14:37, James Greenhalgh james.greenha...@arm.com wrote: Hi, This patch fixes an annoying gotcha when adding new cores or piepline models in builds for AArch64. The generic_sched attribute also needs updating in addition to aarch64-tune.md. I see no good reason for this, we can generate that attribute in gentune.sh quite easily. For testing, I built an aarch64-none-elf toolchain with no issues. OK? Thanks, James --- 2014-09-25 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.md (generic_sched): Don't define here. * config/aarch64/gentune.sh: Also generate generic_sched attribute. * config/aarch64/aarch64-tune.md: Regenerate. OK /Marcus
Re: [PATCH][AArch64] Fix PR63293
On 25 September 2014 17:32, Jiong Wang jiong.w...@arm.com wrote: patch updated, please review. 2014-09-25 Jiong Wang jiong.w...@arm.com 2014-09-25 Wilco Dijkstra wilco.dijks...@arm.com gcc/ PR target/63293 * config/aarch64/aarch64.c (aarch64_expand_epiloue): Add barriers before stack adjustment. OK /Marcus
[PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)
On Sat, 2014-10-18 12:54:33 +0200, Oleg Endo oleg.e...@t-online.de wrote: Hi, As discussed in the PR, this adds two new SH built-in functions __builtin_sh_get_fpscr __builtin_sh_set_fpscr. Tested on r216173 with make -k check RUNTESTFLAGS=--target_board=sh-sim\{-m4/-ml,-m4/-mb} and no new failures. Committed as r216424. Cheers, Oleg gcc/ChangeLog: PR target/53513 * config/sh/sh-modes.def (PSI): Remove. * config/sh/sh-protos.h (get_fpscr_rtx): Remove. * config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove. (sh_reorg): Remove commented out FPSCR code. (fpscr_set_from_mem): Use SImode instead of PSImode. Emit lds_fpscr insn instead of move insn. (sh_hard_regno_mode_ok): Return SImode for FPSCR. (sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode handling. (sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns. (sh1_builtin_p): Uncomment. (SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros. (bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr. Change to emit_fpu_switch() is missing here. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c(revision 216350) +++ gcc/config/sh/sh.c(working copy) This chunk is in emit_fpu_switch(), which drops every reference to `dst': @@ -10055,13 +10032,12 @@ emit_move_insn (scratch, XEXP (src, 0)); if (index != 0) emit_insn (gen_addsi3 (scratch, scratch, GEN_INT (index * 4))); - src = adjust_automodify_address (src, PSImode, scratch, index * 4); + src = adjust_automodify_address (src, SImode, scratch, index * 4); } else -src = adjust_address (src, PSImode, index * 4); +src = adjust_address (src, SImode, index * 4); - dst = get_fpscr_rtx (); - emit_move_insn (dst, src); + emit_insn (gen_lds_fpscr (src)); } static rtx get_free_reg (HARD_REG_SET); ...which in turn results in fall-out when simply building it with config-list.mk, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=372472 : [...] g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace-o sh.o -MT sh.o -MMD -MP -MF ./.deps/sh.TPo ../../../gcc/gcc/config/sh/sh.c ../../../gcc/gcc/config/sh/sh.c: In function ‘void emit_fpu_switch(rtx, int)’: ../../../gcc/gcc/config/sh/sh.c:10027:7: error: unused variable ‘dst’ [-Werror=unused-variable] rtx dst, src; ^ cc1plus: all warnings being treated as errors make[2]: *** [sh.o] Error 1 This should fix it, ok for mainline? 2014-11-04 Jan-Benedict Glaw jbg...@lug-owl.de * config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable. diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c index 3a4f5e9..be944da 100644 --- a/gcc/config/sh/sh.c +++ b/gcc/config/sh/sh.c @@ -10024,7 +10024,7 @@ static GTY(()) tree fpscr_values; static void emit_fpu_switch (rtx scratch, int index) { - rtx dst, src; + rtx src; if (fpscr_values == NULL) { MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html the second : signature.asc Description: Digital signature
Re: [PATCH][ARM] Fix names of some rounding intrinsics, impement vrndx_f32 and vrndxq_f32
Phew, This one slipped through the cracks. Ping? https://gcc.gnu.org/ml/gcc-patches/2014-09/msg01981.html Thanks, Kyrill On 23/09/14 16:25, Kyrill Tkachov wrote: On 23/09/14 16:07, Kyrill Tkachov wrote: Hi all, Some intrinsics had the wrong name (inconsistent with the NEON intrinsics spec). This patch fixes that and adds the vrndx_f32 and vrndxq_f32 intrinsics that were missing. For reference, the NEON intrinsics spec can be found at: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0073a/IHI0073A_arm_neon_intrinsics_ref.pdf Kyrill These map down to vrintx.f32 NEON instructions (d and q forms). We already had builtins defined for them, just the intrinsics were not wired up to them properly. Tested arm-none-eabi Ok for trunk? 2014-09-23 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm_neon.h (vrndqn_f32): Rename to... (vrndnq_f32): ... this. (vrndqa_f32): Rename to... (vrndaq_f32): ... this. (vrndqp_f32): Rename to... (vrndpq_f32): ... this. (vrndqm_f32): Rename to... (vrndmq_f32): ... this. (vrndx_f32): New intrinsic. (vrndxq_f32): Likewise. 2014-09-23 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/simd/neon-vrndx_f32_1.c: New test. * gcc.target/arm/simd/neon-vrndxq_f32_1.c: Likewise. * gcc.target/arm/neon/vrndqaf32.c: Rename to... * gcc.target/arm/neon/vrndaqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqmf32.c: Rename to... * gcc.target/arm/neon/vrndmqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqnf32.c: Rename to... * gcc.target/arm/neon/vrndnqf32.c: ... This. Update intrinsic names. * gcc.target/arm/neon/vrndqpf32.c: Rename to... * gcc.target/arm/neon/vrndpqf32.c: ... This. Update intrinsic names.
Re: [PATCH][AArch64] LR register not used in leaf functions
On 01/10/14 09:00, Kugan wrote: On 01/10/14 01:00, Jiong Wang wrote: On 27/09/14 22:20, Kugan wrote: On 23/09/14 01:58, Jiong Wang wrote: On 22/09/14 16:43, Kugan wrote: AArch64 has the same issue ARM had where the LR register was not used in leaf functions. This was reported in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42017. In AArch64, this test-case need to be added with more live ranges for the need for the LR_REGNUM. i.e test-case in the PR needs additional loops up to r31 for the case AArch64 to see this. The same fix (from the thread https://gcc.gnu.org/ml/gcc-patches/2011-04/msg02191.html) which went into ARM should apply to AArch64 as well. Regression tested on qemu for aarch64-none-linux-gnu with no new regressions. Is this OK for trunk? This still be a partial fix. LR should be a caller-saved register free to use in case it's saved properly to across function call. Indeed. This should be improved from the generic code. Right now, if a hard register is used in EPILOGUE_USES, it conflicts with all the live ranges till a call site kills. I think we should have this patch till the generic code can be improved. below is my local patch. LR is treated as free register, and strictly following AArch64 ABI, frame should always be created, FP maintained properly if LR clobbered under -fno-omit-frame-pointer. Thanks Jiong. Sorry I missed your point. As for the additions in your patch: I am really sorry, just noticed the reply... /* ... and any callee saved register that dataflow says is live. */ for (regno = R0_REGNUM; regno = R30_REGNUM; regno++) if (df_regs_ever_live_p (regno) -!call_used_regs[regno]) +(regno == R30_REGNUM + || !call_used_regs[regno])) cfun-machine-frame.reg_offset[regno] = SLOT_REQUIRED; AArch64 CALL_USED_REGISTERS defines R30_REGNUM to be zero. Therefore shouldn’t this be redundant? that's because the patch also modified r30 to be caller saved. -0, 0, 0, 0, 0, 1, 0, 1, /* R24 - R30, SP */ \ +0, 0, 0, 0, 0, 1, 1, 1, /* R24 - R30, SP */ \ thus, we don't need ad-hoc code in pro/epi to save/restore LR. on the other hand, we want LR to be a callee-saved register that it could be used as free register is some scenario. you can't compile the testcase lr_free_2.c for details. } + else +{ + /* If we decided that we didn't need a leaf frame pointer but then used +LR in the function, then we'll want a frame pointer after all, so +prevent this elimination to ensure a frame pointer is used. */ + if (to == STACK_POINTER_REGNUM + flag_omit_leaf_frame_pointer + df_regs_ever_live_p (LR_REGNUM)) + return false; +} aarch64_frame_pointer_required makes frame pointer needed when flag_omit_leaf_frame_pointer and df_regs_ever_live_p (LR_REGNUM). Is this addition still needed? it's needed. the problem is the df_reg_ever_live_p (LR_REGNUM) in aarch64_frame_pointer_required if (flag_omit_leaf_frame_pointer (!crtl-is_leaf || df_regs_ever_live_p (LR_REGNUM))) is invoked before register allocation that it can only catch those LR_REGNUM alive scenarios produced by inline assembly clobber, for example __asm__ (:::x30), written by user explicitly in code, while it can't catch the scenarios that LR_REGNUM allocated by register allocator. so, we need to add another check in aarch64_can_eliminate which is invoked after register allocation to catch those scenarios where LR allocated by register allocator. thanks. Thanks, Kugan
Re: [PATCH 8/11][ARM] Migrate to new reduc_[us](min|max)_scal_optab
On Fri, Oct 24, 2014 at 1:01 PM, Alan Lawrence alan.lawre...@arm.com wrote: Similarly to last patch. Tested, in combination with previous patch: bootstrap on arm-none-linux-gnueabihf cross-tested check-gcc on arm-none-eabi. gcc/ChangeLog: config/arm/neon.md (reduc_smin_mode *2): Rename to... (reduc_smin_scal_mode *2): ...this; extract scalar result. (reduc_smax_mode *2): Rename to... (reduc_smax_scal_mode *2): ...this; extract scalar result. (reduc_umin_mode *2): Rename to... (reduc_umin_scal_mode *2): ...this; extract scalar result. (reduc_umax_mode *2): Rename to... (reduc_umax_scal_mode *2): ...this; extract scalar result. Ok. Ramana
Re: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.
On 29 October 2014 10:34, Tejas Belagod tejas.bela...@arm.com wrote: On 10/10/14 15:48, David Sherwood wrote: I have a fix (originally written by Tejas Belagod) for the following bug: In which case you should add his name along side yours in the ChangeLog entry... ChangeLog: gcc/: 2014-10-10 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64-protos.h (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): New decls. * config/aarch64/iterators.md (UNSPEC_REV_REGLIST): New enum. * config/aarch64/iterators.md (insn_count): New mode_attr. * config/aarch64/aarch64-simd.md (vec_store_lanes(o/c/x)i, vec_load_lanes(o/c/x)i): Fixed to work for Big Endian. * config/aarch64/aarch64-simd.md (aarch64_rev_reglist, aarch64_simd_(ld/st)(2/3/4)): Added. * config/aarch64/aarch64.c (aarch64_simd_attr_length_rglist, aarch64_reverse_mask): Added. + RTVEC_ELT (v, (i * usize + j)) = GEN_INT (((i+1) * usize) - 1 - j); s/i+1/i + 1. Remove extra parentheses. Tejas. OK with the fix up requested by Tejas. /Marcus
Re: [AArch64] [BE] Fix vector load/stores to not use ld1/st1
On 10 October 2014 16:19, Alan Hayward alan.hayw...@arm.com wrote: This patch is dependant on [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe.” It fixes up movoi/ci/xi for Big Endian, so that we end up with the lsb of a big-endian integer to be in the low byte of the highest-numbered register. movoi uses stp/ldp movxi needs a split version (which is shared with register-register movxi), which just splits to two new movs movci can then split to three movs. A future patch will instead split to an movoi and a movti. There are no changes for LE. Ran whole of check with both parts of Make large opaque integer modes endianness-safe”. No regressions. ChangeLog: gcc/: 2014-10-10 Alan Hayward alan.hayw...@arm.com * config/aarch64/aarch64.c (aarch64_classify_address): Allow extra addressing modes for BE. (aarch64_print_operand): new operand for printing a q register+1. (aarch64_simd_emit_reg_reg_move): replacement for aarch64_simd_disambiguate_copy that plants the required mov. * config/aarch64/aarch64-protos.h (aarch64_simd_emit_reg_reg_move): replacement for aarch64_simd_disambiguate_copy. * config/aarch64/aarch64-simd.md (define_split) Use new aarch64_simd_emit_reg_reg_move. (define_expand movmode) less restrictive predicates. (define_insn *aarch64_movmode) Simplify and only allow for LE. (define_insn *aarch64_be_movoi) New. BE only. Plant ldp or stp. (define_insn *aarch64_be_movci) New. BE only. No instructions. (define_insn *aarch64_be_movxi) New. BE only. No instructions. (define_split) OI mov. Use new aarch64_simd_emit_reg_reg_move. (define_split) CI mov. Use new aarch64_simd_emit_reg_reg_move. On BE plant movs for reg to/from mem case. (define_split) XI mov. Use new aarch64_simd_emit_reg_reg_move. On BE plant movs for reg to/from mem case. OK /Marcus
Re: Recent go changes broke alpha bootstrap
On Tue, Nov 4, 2014 at 1:00 AM, Ian Taylor i...@golang.org wrote: On Mon, Nov 3, 2014 at 9:02 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: On Thu, Oct 30, 2014 at 08:05:14AM -0700, Ian Taylor wrote: On Thu, Oct 30, 2014 at 5:46 AM, Dominik Vogt v...@linux.vnet.ibm.com wrote: I'm not quite sure about the best approach. The attempt to represent C unions in the right way is ultimately futile as Go does not have the types necessary for it. For example, the handling of anonymous bit fields will never be right as it's undefinied. On the other hand I could fix the issue at hand by changing the way anonymous unions are represented in Go. Example: struct { int8_t x; union { int16_t y; int 32_t z; }; }; Was represented (before the patch) as struct { X byte; int16 Y; } which had size 4, alignment 2 and y at offset 2 but should had have size 8, alignment 4 and y at offset 4. With the current patch the Go layout is struct { X byte; artificial_name struct { y [2]byte; align [0]int32; } } with the proper size, alignment and offset, but y is addressed as .artificial_name.y insted of just .y, and y is a byte array and not an int16. I could remove the artificial_name struct and add padding before and after y instead: struct { X byte; pad_0 [3]byte; Y int16; pad_1 [2]byte; align [0]int32; } What do you think? Sounds good to me. Basically the fields of the anonymous union should be promoted to become fields of the struct. We can't do it in general, but we can do it for the first field. That addresses the actual uses of anonymous unions. The attached patch fixes this, at least if the first element of the union is not a bitfield. Bitfields can really not be represented properly in Go (think about constructs like struct { int : 1; int bf : 1; }), I'd rather not try to represent them in a predictable way. The patched code may or may not give them a name, and reserves the proper amount of padding where required in structs. If a union begins with an anonymous bitfield (which makes no sense), that is ignored. If a union begins with a named bitfield (possibly after unnamed ones), this may or may not be used as the (sole) element of the Go struct. Thanks. I committed your patch. I have checked that the patch fixes alpha bootstrap with libgo. Thanks, Uros.
Re: [AArch64] [BE] [2/2] Make large opaque integer modes endianness-safe.
On 13 October 2014 11:01, David Sherwood david.sherw...@arm.com wrote: Hi, This is the second patch of the work to fix: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59810 and removes the CANNOT_CHANGE_MODE_CLASS macro, which now permits subregs of vector registers to work correctly on aarch64_be. NOTE: This patch depends upon the following: [AArch64] [BE] [1/2] Make large opaque integer modes endianness-safe. [AArch64] [BE] Fix vector load/stores to not use ld1/st1 Thanks, David. ChangeLog: gcc/: 2014-13-10 David Sherwood david.sherw...@arm.com * config/aarch64/aarch64.h (CLEAR_INSN_CACHE): Removed. CANNOT_CHANGE_MODE_CLASS rather than CLEAR_INSN_CACHE. * config/aarch64/aarch64.c (aarch64_cannot_change_mode_class): Removed. * config/aarch64/aarch64-protos.h (aarch64_cannot_change_mode_class): Removed. The first line of the ChangeLog entry should also reference the PR number it resolves. OK with corrected ChangeLog entry. Thanks /Marcus
[match-and-simplify] Merge from trunk
2014-11-04 Richard Biener rguent...@suse.de Merge from trunk r216941 through r217074. Brings back next merge piece.
Re: [gomp4] acc reductions with multiple variables
Hi! On Fri, 31 Oct 2014 16:35:27 -0700, Cesar Philippidis ce...@codesourcery.com wrote: This patch also resolves an issue when reductions are preformed on the host, i.e. ACC_DEVICE_TYPE=host. Additional cleanup has now been possible; r217078: commit a10c5e14ec563fffa45d24366f95f8cba62af4fd Author: tschwinge tschwinge@138bc75d-0d04-0410-961f-82ee72b054a4 Date: Tue Nov 4 11:50:58 2014 + libgomp testsuite: Remove special-casing for acc_device_host_nonshm. libgomp/ * testsuite/libgomp.oacc-c/reduction-initial-1.c [ACC_DEVICE_TYPE_host_nonshm]: Remove special-casing for N. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gomp-4_0-branch@217078 138bc75d-0d04-0410-961f-82ee72b054a4 --- libgomp/ChangeLog.gomp | 5 + libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c | 7 +-- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git libgomp/ChangeLog.gomp libgomp/ChangeLog.gomp index 35322f0..c70850b 100644 --- libgomp/ChangeLog.gomp +++ libgomp/ChangeLog.gomp @@ -1,3 +1,8 @@ +2014-11-04 Thomas Schwinge tho...@codesourcery.com + + * testsuite/libgomp.oacc-c/reduction-initial-1.c + [ACC_DEVICE_TYPE_host_nonshm]: Remove special-casing for N. + 2014-11-03 Cesar Philippidis ce...@codesourcery.com Thomas Schwinge tho...@codesourcery.com diff --git libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c index 0f66a39..81cf865 100644 --- libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c +++ libgomp/testsuite/libgomp.oacc-c/reduction-initial-1.c @@ -4,12 +4,7 @@ int main(void) { #define I 5 -/* TODO */ -#ifdef ACC_DEVICE_TYPE_host_nonshm -# define N 1 -#else -# define N 11 -#endif +#define N 11 #define A 8 int a = A; Grüße, Thomas pgpKmWaICt1ll.pgp Description: PGP signature
Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version
On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote: While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is unused, so clean that up too. That's not a clean-up. This pertains to PR 39350. Which, incidentally, this hookization completely ignores, entrenching the conflation of move expander and move cost estimates. Thus, can_move_by_pieces gives the wrong result for purposes of rtl optimizations when a target-specific movmem etc expander emits target-specific code. The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt shows a number of call sites that are affected. arc only implements MOVE_BY_PIECES_P, wiring it to false. Mirror that behaviour, and use the default hook for other by_pieces operations. I tried building a compiler but no amount of fiddling with target strings got me to a sensible result, so this patch is completely untested. You could just pick one of the configs in contrib/config-list.mk
Re: [PATCH][PPC] Skip gcc.target tests with conflicting -mcpu
On 03/11/14 18:36, Mike Stump wrote: On Oct 30, 2014, at 10:25 AM, Andrew Stubbs a...@codesourcery.com wrote: Many of the tests in gcc.target/powerpc specify an explicit -mcpu option with dg-options. This is a problem for multilib configurations that use -mcpu in their definition OK to commit? Given the discussion, I think the patch as is is fine. Thanks. Committed, thanks. Andrew
Re: libgomp testsuite: (not) using a specific driver for C++, Fortran?
Hi! On Wed, 15 Oct 2014 17:46:48 +0200, I wrote: No matter whether it's C, C++, or Fortran source code, the libgomp testsuite always uses (for build-tree testing) gcc/xgcc, or (for installed testing) GCC_UNDER_TEST. It doesn't make use of GXX_UNDER_TEST, GFORTRAN_UNDER_TEST. To support the latter two languages' needs, some -l[...] flags are then added via lang_link_flags. For example, for Fortran this is -lgfortran. This is, however, not what would happen if using the gfortran driver to build (which is what a user would be doing -- which we should replicate as much as possible at least for installed testing): the gfortran driver also adds -lquadmath, if applicable. Now, I wonder why to re-invent all that in the libgomp testsuite, if the respective driver already has that knowledge, via spec files, for example? (Also, the regular GCC compiler tests, gcc/testsuite/, are doing the right thing.) Why is libgomp testsuite implemented this way -- just a legacy of the past, or is there a need for that (that I'm not seeing)? Maybe the question also is: why isn't the libgomp testsuite using more of the infrastructure for specific languages, that is already implemented in gcc/testsuite/lib/? (That is, why does libgomp have to use libgomp_target_compile, instead of [language]_target_compile, for example.) (I decided not to look into that latter idea, at the moment.) And maybe that problem also applied to additional target libraries' testsuites; I have not yet looked. (It does, but I'm not addressing that with the following patches.) Anyway, here is a prototype patch to describe how I began to address this for the issue I stumbled upon, which is that the linker complained: Executing on host: x86_64-none-linux-gnu-gcc [...]/libgomp/testsuite/libgomp.fortran/aligned1.f03 [...] -fopenmp -O0 -fopenmp -fcray-pointer -lgfortran -lm -o ./aligned1.exe(timeout = 300) [...]/ld: warning: libquadmath.so.0, needed by [...]/libgfortran.so, not found (try using -rpath or -rpath-link) [...]/libgfortran.so: undefined reference to `logq@QUADMATH_1.0' [...] (That goes away if I add -lquadmath to the command line, but that's not the point I'm making here.) Am I on the right track with the following? Nobody commented, which also means nobody disagreed -- so, here are first a bunch of cleanup patches, refactoring, and then a patch to enable usage of GXX_UNDER_TEST, GFORTRAN_UNDER_TEST for installed testing. OK to commit all that to trunk? I tested (that is, diffed the libgomp.log file) each step incrementally, both in non-installed and in installed testing scenarios. commit 6229e75038b47a09638454a812fb9eff5f31d761 Author: Thomas Schwinge tho...@codesourcery.com Date: Mon Nov 3 09:58:38 2014 +0100 libgomp testsuite: Only use blddir if set. libgomp/ * testsuite/lib/libgomp.exp (libgomp_init): Only use blddir if set. * testsuite/libgomp.c++/c++.exp: Likewise. (It is unclear to me why the current working directory needs to be in LD_LIBRARY_PATH.) --- libgomp/testsuite/lib/libgomp.exp | 5 +++-- libgomp/testsuite/libgomp.c++/c++.exp | 3 ++- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git libgomp/testsuite/lib/libgomp.exp libgomp/testsuite/lib/libgomp.exp index 094e5ed..4234d4f 100644 --- libgomp/testsuite/lib/libgomp.exp +++ libgomp/testsuite/lib/libgomp.exp @@ -53,7 +53,6 @@ proc libgomp_init { args } { global srcdir blddir objdir tool_root_dir global libgomp_initialized global tmpdir -global blddir global gluefile wrap_flags global ALWAYS_CFLAGS global CFLAGS @@ -105,7 +104,7 @@ proc libgomp_init { args } { } # Compute what needs to be put into LD_LIBRARY_PATH -set always_ld_library_path .:${blddir}/.libs +set always_ld_library_path . # Compute what needs to be added to the existing LD_LIBRARY_PATH. if {$gccdir != } { @@ -139,6 +138,8 @@ proc libgomp_init { args } { lappend ALWAYS_CFLAGS additional_flags=-B${blddir}/.libs lappend ALWAYS_CFLAGS additional_flags=-I${blddir} lappend ALWAYS_CFLAGS ldflags=-L${blddir}/.libs + + append always_ld_library_path :${blddir}/.libs } lappend ALWAYS_CFLAGS additional_flags=-I${srcdir}/.. diff --git libgomp/testsuite/libgomp.c++/c++.exp libgomp/testsuite/libgomp.c++/c++.exp index a9cf41a..c2288c8 100644 --- libgomp/testsuite/libgomp.c++/c++.exp +++ libgomp/testsuite/libgomp.c++/c++.exp @@ -53,7 +53,8 @@ if { $lang_test_file_found } { set_ld_library_path_env_vars set flags_file ${blddir}/../libstdc++-v3/scripts/testsuite_flags -if { [file exists $flags_file] } { +if { $blddir != \ + [file exists $flags_file] } { set libstdcxx_includes [exec sh $flags_file --build-includes] } else { set libstdcxx_includes commit 6d8f949c98e04966e6757c80094a63ce5eb2b334 Author: Thomas Schwinge
[PATCH 1/4] Gccgo port to s390[x] -- part II
See commit comment and ChangeLog for details. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany ChangeLog 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/go/syscall/libcall_linux_s390.go: New file for s390 support. * libgo/go/syscall/syscall_linux_s390.go: Ditto. * libgo/go/syscall/libcall_linux_s390x.go: New file for s390x support. * libgo/go/syscall/syscall_linux_s390x.go: Ditto. * libgo/go/runtime/pprof/pprof.go (printStackRecord): Support s390 and s390x. * libgo/runtime/runtime.c (runtime_cputicks): Add support for s390 and s390x * libgo/mksysinfo.sh: Ditto. (upcase_fields): New helper function 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/go/debug/elf/file.go (applyRelocations): Implement relocations on s390x. (applyRelocationsS390x): Ditto. (DWARF): Ditto. * libgo/go/debug/elf/elf.go (R_390): New constants for S390 relocations. (r390Strings): Ditto. (String): Helper function for S390 relocations. (GoString): Ditto. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/go/reflect/makefuncgo_s390.go: New file. (S390MakeFuncStubGo): Implementation of s390 abi. * libgo/go/reflect/makefuncgo_s390x.go: New file. (S390xMakeFuncStubGo): Implementation of s390x abi. * libgo/go/reflect/makefunc_s390.c: New file. (makeFuncStub): s390 and s390x specific implementation of function. * libgo/go/reflect/makefunc.go (MakeFunc): Add support for s390 and s390x. (makeMethodValue): Ditto. (makeValueMethod): Ditto. * libgo/Makefile.am (go_reflect_makefunc_s_file): Ditto. (go_reflect_makefunc_file): Ditto. * libgo/go/reflect/makefunc_dummy.c: Ditto. * libgo/runtime/runtime.h (__go_makefunc_can_recover): Export prototype for use in makefunc_s390.c. (__go_makefunc_returning): Ditto. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/go/syscall/exec_linux.go (forkAndExecInChild): Fix order of the arguments of the clone system call for s390[x]. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/configure.ac (is_s390): New variable. (is_s390x): Ditto (LIBGO_IS_S390): Ditto. (LIBGO_IS_S390X): Ditto. (GOARCH): Support s390 and s390x. * libgo/go/go/build/build.go (cgoEnabled): Ditto. * libgo/go/go/build/syslist.go (goarchList): Ditto. From 27c07fd6b61dfe86169e42ffbd63d24db31aea6c Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Tue, 4 Nov 2014 10:13:26 +0100 Subject: [PATCH 1/4] libgo: Port to s390[x]. --- libgo/Makefile.am | 15 ++ libgo/configure.ac | 16 ++ libgo/go/debug/elf/elf.go | 66 + libgo/go/debug/elf/file.go | 45 +++- libgo/go/go/build/build.go | 2 + libgo/go/go/build/syslist.go| 2 +- libgo/go/reflect/makefunc.go| 2 + libgo/go/reflect/makefunc_s390.c| 92 +++ libgo/go/reflect/makefuncgo_s390.go | 454 libgo/go/reflect/makefuncgo_s390x.go| 436 ++ libgo/go/runtime/pprof/pprof.go | 5 + libgo/go/syscall/exec_linux.go | 7 +- libgo/go/syscall/libcall_linux_s390.go | 7 + libgo/go/syscall/libcall_linux_s390x.go | 7 + libgo/go/syscall/syscall_linux_s390.go | 21 ++ libgo/go/syscall/syscall_linux_s390x.go | 21 ++ libgo/mksysinfo.sh | 53 +++- libgo/runtime/runtime.c | 8 + 18 files changed, 1248 insertions(+), 11 deletions(-) create mode 100644 libgo/go/reflect/makefunc_s390.c create mode 100644 libgo/go/reflect/makefuncgo_s390.go create mode 100644 libgo/go/reflect/makefuncgo_s390x.go create mode 100644 libgo/go/syscall/libcall_linux_s390.go create mode 100644 libgo/go/syscall/libcall_linux_s390x.go create mode 100644 libgo/go/syscall/syscall_linux_s390.go create mode 100644 libgo/go/syscall/syscall_linux_s390x.go diff --git a/libgo/Makefile.am b/libgo/Makefile.am index 3a42250..79cfdd8 100644 --- a/libgo/Makefile.am +++ b/libgo/Makefile.am @@ -943,11 +943,26 @@ go_reflect_makefunc_file = \ go_reflect_makefunc_s_file = \ go/reflect/makefunc_386.S else +if LIBGO_IS_S390 +go_reflect_makefunc_file = \ + go/reflect/makefuncgo_s390.go +go_reflect_makefunc_s_file = \ + go/reflect/makefunc_s390.c +else +if LIBGO_IS_S390X +go_reflect_makefunc_file = \ + go/reflect/makefuncgo_s390x.go \ + go/reflect/makefuncgo_s390.go +go_reflect_makefunc_s_file = \ + go/reflect/makefunc_s390.c +else go_reflect_makefunc_file = go_reflect_makefunc_s_file = \ go/reflect/makefunc_dummy.c endif endif +endif +endif go_reflect_files = \ go/reflect/deepequal.go \ diff --git a/libgo/configure.ac b/libgo/configure.ac index 0469b89..d651827
[PATCH 0/4] Gccgo port to s390[x] -- part II
This is the second set of patches to support s390[x] with Gccgo, containing mostly architecture dependent parts that affect the following directories: gcc/testsuite/go.test (golang) gcc/testsuite/go.test/go-test.exp (Gcc?) libgo (Gcc?) libgo/go (gofrontend) libgo/runtime (gofrontend) It seemed infeasible to split this patch series and send the patches only to the right places. All patches are required for proper s390[x] support. With this series, s390[x] support is basically complete. However, I have another patch that adds complex type support in the reflection library, but that requires the latest libffi code. I'll look into merging that back into gcc next. Summary of this series: --- 0001: Ports libgo to s390[x]. 0002: Adapts some tests in libgo but also fixes some bugs in the tests. 0003: Allow //+ build for the go tests in Gcc. 0004: s390[x] specific fixes for some tests in golang. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
[PATCH 2/4] Gccgo port to s390[x] -- part II
See commit comment and ChangeLog for details. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany ChangeLog 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * libgo/go/sync/atomic/atomic_test.go: Use LoadInt32() to access sync variables. * libgo/go/math/all_test.go (tolerance): Fix bug in calculation. (alike): Less strict precision requirements. (TestLog2): Likewise. From f62844a6e4a0f76c30fe3f1cffc0933f21bbf43d Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Tue, 4 Nov 2014 10:13:16 +0100 Subject: [PATCH 2/4] libgo: Test fixes for s390[x]. 1) libgo/math: Fix TestLog2 failures. Fixes necessary to successfully run the test on s390x. See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63269 for details. 2) libgo/sync/atomic: Fix tests for sync_atomic.CompareAndSwap...(). *addr must not be accessed directly as the compiler can optimise the access to memory away, or make mulitple accesses (which it actually does on s390x). As Go does have no way to express that the memory is volatile, read it with the sync_atomic.Load...() function family instead. --- libgo/go/math/all_test.go | 9 ++--- libgo/go/sync/atomic/atomic_test.go | 16 2 files changed, 14 insertions(+), 11 deletions(-) diff --git a/libgo/go/math/all_test.go b/libgo/go/math/all_test.go index 0d8b10f..0d19e14 100644 --- a/libgo/go/math/all_test.go +++ b/libgo/go/math/all_test.go @@ -1671,8 +1671,8 @@ func tolerance(a, b, e float64) bool { d = -d } - if a != 0 { - e = e * a + if b != 0 { + e = e * b if e 0 { e = -e } @@ -1687,6 +1687,9 @@ func alike(a, b float64) bool { switch { case IsNaN(a) IsNaN(b): return true + case a == 0 !IsNaN(b) !IsInf(b, 0): + // allow deviations when the expected value is zero + return true case a == b: return Signbit(a) == Signbit(b) } @@ -2284,7 +2287,7 @@ func TestLog2(t *testing.T) { for i := -1074; i = 1023; i++ { f := Ldexp(1, i) l := Log2(f) - if l != float64(i) { + if !veryclose(l, float64(i)) { t.Errorf(Log2(2**%d) = %g, want %d, i, l, i) } } diff --git a/libgo/go/sync/atomic/atomic_test.go b/libgo/go/sync/atomic/atomic_test.go index d2af4f4..eaa3b6b 100644 --- a/libgo/go/sync/atomic/atomic_test.go +++ b/libgo/go/sync/atomic/atomic_test.go @@ -858,7 +858,7 @@ func hammerCompareAndSwapInt32(uaddr *uint32, count int) { addr := (*int32)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadInt32(addr) if CompareAndSwapInt32(addr, v, v+1) { break } @@ -869,7 +869,7 @@ func hammerCompareAndSwapInt32(uaddr *uint32, count int) { func hammerCompareAndSwapUint32(addr *uint32, count int) { for i := 0; i count; i++ { for { - v := *addr + v := LoadUint32(addr) if CompareAndSwapUint32(addr, v, v+1) { break } @@ -883,7 +883,7 @@ func hammerCompareAndSwapUintptr32(uaddr *uint32, count int) { addr := (*uintptr)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadUintptr(addr) if CompareAndSwapUintptr(addr, v, v+1) { break } @@ -897,7 +897,7 @@ func hammerCompareAndSwapPointer32(uaddr *uint32, count int) { addr := (*unsafe.Pointer)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadPointer(addr) if CompareAndSwapPointer(addr, v, unsafe.Pointer(uintptr(v)+1)) { break } @@ -1039,7 +1039,7 @@ func hammerCompareAndSwapInt64(uaddr *uint64, count int) { addr := (*int64)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadInt64(addr) if CompareAndSwapInt64(addr, v, v+1) { break } @@ -1050,7 +1050,7 @@ func hammerCompareAndSwapInt64(uaddr *uint64, count int) { func hammerCompareAndSwapUint64(addr *uint64, count int) { for i := 0; i count; i++ { for { - v := *addr + v := LoadUint64(addr) if CompareAndSwapUint64(addr, v, v+1) { break } @@ -1064,7 +1064,7 @@ func hammerCompareAndSwapUintptr64(uaddr *uint64, count int) { addr := (*uintptr)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadUintptr(addr) if CompareAndSwapUintptr(addr, v, v+1) { break } @@ -1078,7 +1078,7 @@ func hammerCompareAndSwapPointer64(uaddr *uint64, count int) { addr := (*unsafe.Pointer)(unsafe.Pointer(uaddr)) for i := 0; i count; i++ { for { - v := *addr + v := LoadPointer(addr) if CompareAndSwapPointer(addr, v, unsafe.Pointer(uintptr(v)+1)) { break } -- 1.8.4.2
[PATCH 3/4] Gccgo port to s390[x] -- part II
See commit comment and ChangeLog for details. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/testsuite/ChangeLog 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * go.test/go-test.exp: Add handling of // +build. From 12f9861253b1c9b8758b5356daedad98bfc044c4 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Tue, 4 Nov 2014 10:13:39 +0100 Subject: [PATCH 3/4] go.test: Add handling of // +build to go-test.exp. --- gcc/testsuite/go.test/go-test.exp | 21 +++-- 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/gcc/testsuite/go.test/go-test.exp b/gcc/testsuite/go.test/go-test.exp index fd83722..71272a3 100644 --- a/gcc/testsuite/go.test/go-test.exp +++ b/gcc/testsuite/go.test/go-test.exp @@ -449,13 +449,22 @@ proc go-gc-tests { } { } if { [ string match // +build * $test_line ] } { - if { [ string match *[getenv GOARCH]* $test_line ] } { - continue - } - if { [ string match *linux* $test_line ] } { - continue + set matches_pos 0 + set matches_neg 0 + if { [ regexp -line \[ \][getenv GOARCH]\(\[ \]\|\$\) $test_line ] } { + set matches_pos 1 + } elseif { [ regexp -line \[ \]\![getenv GOARCH]\(\[ \]\|\$\) $test_line ] } { + set matches_neg 1 + } elseif { [ regexp -line \[ \]linux\(\[ \]\|\$\) $test_line ] } { + set matches_pos 1 + } elseif { [ regexp -line \[ \]\!linux\(\[ \]\|\$\) $test_line ] } { + set matches_neg 1 + } elseif { [ regexp -line \[ \]\!windows\(\[ \]\|\$\) $test_line ] } { + set matches_pos 1 + } elseif { [ regexp -line \[ \]windows\(\[ \]\|\$\) $test_line ] } { + set matches_neg 1 } - if { [ string match *!windows* $test_line ] } { + if { $matches_pos == 1 $matches_neg == 0 } { continue } close $fd -- 1.8.4.2
[PATCH 4/4] Gccgo port to s390[x] -- part II
See commit comment and ChangeLog for details. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany gcc/testsuite/ChangeLog 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * go.test/test/ken/cplx2.go: Fix s390 test failures. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * go.test/test/nilptr_s390.go: Port nilptr.go to s390. * go.test/test/nilptr_s390x.go: Port nilptr.go to s390x. * go.test/test/nilptr.go: Do not run on s390/s390x. * go.test/go-test.exp: // +build matches complete words without whitespace. This is rnecessary to distinguish s390 from s390x. It also handles negation with the '!' prefix. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * go.test/go-test.exp (go-set-goarch): Enable tests on s390[x]. 2014-11-04 Dominik Vogt v...@linux.vnet.ibm.com * go.test/go-test.exp: Add handling of // +build. From 5b0eaf73d005bd152fff5a1a922f5618da4be939 Mon Sep 17 00:00:00 2001 From: Dominik Vogt v...@linux.vnet.ibm.com Date: Tue, 4 Nov 2014 10:13:22 +0100 Subject: [PATCH 4/4] go.test: Changes required for s390[x] port. 1) Add Go architectures s390 and s390x. 2) Do not run test nilptr.go on s390 - platform specific test. * Detects word boundaries to distinguish between s390 and s390x. (Switch to using regular expressions.) * Implement the Prefix '!' to exclude targets from build. 3) Fix go/test/ken/cplx2.go test failures. --- gcc/testsuite/go.test/go-test.exp | 6 + gcc/testsuite/go.test/test/ken/cplx2.go| 20 ++- gcc/testsuite/go.test/test/nilptr.go | 1 + gcc/testsuite/go.test/test/nilptr_s390.go | 190 + gcc/testsuite/go.test/test/nilptr_s390x.go | 190 + 5 files changed, 405 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/go.test/test/nilptr_s390.go create mode 100644 gcc/testsuite/go.test/test/nilptr_s390x.go diff --git a/gcc/testsuite/go.test/go-test.exp b/gcc/testsuite/go.test/go-test.exp index 71272a3..25e405b 100644 --- a/gcc/testsuite/go.test/go-test.exp +++ b/gcc/testsuite/go.test/go-test.exp @@ -244,6 +244,12 @@ proc go-set-goarch { } { set goarch ppc64 } } + s390-*-* { + set goarch s390 + } + s390x-*-* { + set goarch s390x + } sparc*-*-* { if [check_effective_target_ilp32] { set goarch sparc diff --git a/gcc/testsuite/go.test/test/ken/cplx2.go b/gcc/testsuite/go.test/test/ken/cplx2.go index eb1da7b..d11e33c 100644 --- a/gcc/testsuite/go.test/test/ken/cplx2.go +++ b/gcc/testsuite/go.test/test/ken/cplx2.go @@ -97,13 +97,29 @@ func main() { } cd := c5 / c6 - if cd != Cd { + dr := real(Cd) - real(cd) + if dr 0 { + dr = -dr + } + di := imag(Cd) - imag(cd) + if di 0 { + di = -di + } + if dr .00059604644775390625 || di 0 { println(opcode x, cd, Cd) panic(fail) } ce := cd * c6 - if ce != Ce { + dr = real(Ce) - real(ce) + if dr 0 { + dr = -dr + } + di = imag(Ce) - imag(ce) + if di 0 { + di = -di + } + if dr 0 || di 0.0095367431640625 { println(opcode x, ce, Ce) panic(fail) } diff --git a/gcc/testsuite/go.test/test/nilptr.go b/gcc/testsuite/go.test/test/nilptr.go index 9631d16..574d662 100644 --- a/gcc/testsuite/go.test/test/nilptr.go +++ b/gcc/testsuite/go.test/test/nilptr.go @@ -1,3 +1,4 @@ +// +build !s390 !s390x // run // Copyright 2011 The Go Authors. All rights reserved. diff --git a/gcc/testsuite/go.test/test/nilptr_s390.go b/gcc/testsuite/go.test/test/nilptr_s390.go new file mode 100644 index 000..d79916d --- /dev/null +++ b/gcc/testsuite/go.test/test/nilptr_s390.go @@ -0,0 +1,190 @@ +// +build s390 +// run + +// Copyright 2011 The Go Authors. All rights reserved. +// Use of this source code is governed by a BSD-style +// license that can be found in the LICENSE file. + +// Test that the implementation catches nil ptr indirection in a +// large address space. + +package main + +import unsafe + +// Having a big address space means that indexing at a large +// offset from a nil pointer might not cause a memory access +// fault. This test checks that Go is doing the correct explicit +// checks to catch these nil pointer accesses, not just relying on +// the hardware. +// +// Give us a big address space somewhere near min_bss_offset. +const in_mem_size uintptr = 256 20 // 256 MiB +const min_bss_offset uintptr = 1 22 // 0x0040 +const maxlen uintptr = (1 31) - 2 // 0x7ffe +var dummy [in_mem_size]byte + +func main() { + // The test only tests what we intend to test if dummy + // starts near 0x004. Otherwise there might not be + // anything mapped at the address that might be + // accidentally dereferenced below. + if uintptr(unsafe.Pointer(dummy)) in_mem_size + min_bss_offset { + panic(dummy too far out) + } else if uintptr(unsafe.Pointer(dummy)) min_bss_offset { + panic(dummy too close) + } + + shouldPanic(p1) + shouldPanic(p2) + shouldPanic(p3) + shouldPanic(p4) +
Re: [PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)
On 4 Nov 2014, at 11:50, Jan-Benedict Glaw jbg...@lug-owl.de wrote: On Sat, 2014-10-18 12:54:33 +0200, Oleg Endo oleg.e...@t-online.de wrote: Hi, As discussed in the PR, this adds two new SH built-in functions __builtin_sh_get_fpscr __builtin_sh_set_fpscr. Tested on r216173 with make -k check RUNTESTFLAGS=--target_board=sh-sim\{-m4/-ml,-m4/-mb} and no new failures. Committed as r216424. Cheers, Oleg gcc/ChangeLog: PR target/53513 * config/sh/sh-modes.def (PSI): Remove. * config/sh/sh-protos.h (get_fpscr_rtx): Remove. * config/sh/sh.c (fpscr_rtx, get_fpscr_rtx): Remove. (sh_reorg): Remove commented out FPSCR code. (fpscr_set_from_mem): Use SImode instead of PSImode. Emit lds_fpscr insn instead of move insn. (sh_hard_regno_mode_ok): Return SImode for FPSCR. (sh_legitimate_address_p, sh_legitimize_reload_address): Remove PSImode handling. (sh_emit_mode_set): Emit lds_fpscr and sts_fpscr insns. (sh1_builtin_p): Uncomment. (SH_BLTIN_UV 25, SH_BLTIN_VU 26): New macros. (bdesc): Add __builtin_sh_get_fpscr and __builtin_sh_set_fpscr. Change to emit_fpu_switch() is missing here. Index: gcc/config/sh/sh.c === --- gcc/config/sh/sh.c(revision 216350) +++ gcc/config/sh/sh.c(working copy) This chunk is in emit_fpu_switch(), which drops every reference to `dst': @@ -10055,13 +10032,12 @@ emit_move_insn (scratch, XEXP (src, 0)); if (index != 0) emit_insn (gen_addsi3 (scratch, scratch, GEN_INT (index * 4))); - src = adjust_automodify_address (src, PSImode, scratch, index * 4); + src = adjust_automodify_address (src, SImode, scratch, index * 4); } else -src = adjust_address (src, PSImode, index * 4); +src = adjust_address (src, SImode, index * 4); - dst = get_fpscr_rtx (); - emit_move_insn (dst, src); + emit_insn (gen_lds_fpscr (src)); } static rtx get_free_reg (HARD_REG_SET); ...which in turn results in fall-out when simply building it with config-list.mk, see eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=372472 : [...] g++ -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../../gcc/gcc -I../../../gcc/gcc/. -I../../../gcc/gcc/../include -I../../../gcc/gcc/../libcpp/include -I/opt/cfarm/mpc/include -I../../../gcc/gcc/../libdecnumber -I../../../gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I../../../gcc/gcc/../libbacktrace-o sh.o -MT sh.o -MMD -MP -MF ./.deps/sh.TPo ../../../gcc/gcc/config/sh/sh.c ../../../gcc/gcc/config/sh/sh.c: In function ‘void emit_fpu_switch(rtx, int)’: ../../../gcc/gcc/config/sh/sh.c:10027:7: error: unused variable ‘dst’ [-Werror=unused-variable] rtx dst, src; ^ cc1plus: all warnings being treated as errors make[2]: *** [sh.o] Error 1 This should fix it, ok for mainline? Yes, OK. Thanks for catching it. Cheers, Oleg 2014-11-04 Jan-Benedict Glaw jbg...@lug-owl.de * config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable. diff --git a/gcc/config/sh/sh.c b/gcc/config/sh/sh.c index 3a4f5e9..be944da 100644 --- a/gcc/config/sh/sh.c +++ b/gcc/config/sh/sh.c @@ -10024,7 +10024,7 @@ static GTY(()) tree fpscr_values; static void emit_fpu_switch (rtx scratch, int index) { - rtx dst, src; + rtx src; if (fpscr_values == NULL) { MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of:http://www.chiark.greenend.org.uk/~sgtatham/bugs.html the second :
Re: [PATCH] Update libgcc.texi to match implementation in libgcc/libgcc2.c
ping! On Tue, Oct 28, 2014 at 11:19 AM, Kito Cheng kito.ch...@gmail.com wrote: Hi all: This patch update `Bit operations` section in libgcc.text, most bit operation function is take an unsigned integer instead of signed integer in libgcc/libgcc2.c [1], and it seem more make sense :) ChangeLog 2014-10-28 Kito Cheng k...@0xlab.org * doc/libgcc.texi: Update text to match implementation in libgcc/libgcc2.c [1] https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=blob;f=libgcc/libgcc2.c;h=46d6a2ef030ff98a944935f77529cee9c05fb326;hb=HEAD
Re: nvptx offloading patches [3/n], i386 bits RFD
On 11/03/2014 11:27 PM, Jeff Law wrote: On 11/01/14 05:57, Bernd Schmidt wrote: This is not against current trunk; it applies to gomp-4_0-branch where it is one of the necessary parts to make offloading x86-nvptx work. The issue is that the LTO file format depends on the machine_modes enum, it needs to match between host and offload target. The easiest way to do this is to just use the host-modes.def when compiling an offload compiler. Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? It feels like we've got another real distinction to make. We've had host, build target and they're all independent. It feels like we need offload target and better separate between target and offload target. Then we need to figure out the places where we've got bleed-out. Is this a question of terminology? I agree that saying offload host when we'd normally be calling it the target is confusing, but it's difficult to come up with better names. Offload host and target are not quite independent - when it's implemented through LTO at least there has to be a fairly close agreement even down to machine modes, which is why a patch like this is necessary. Not sure how to deal with this any further out than the immediate term than using a hack like this. Though I'd prefer to avoid the #ifdef as it seems to me this shouldn't be baked in at build/configure time. Yeah, I'm not expecting the i386 part to go in quite as-is. For reference I'm including the offload-abi patch - Ilya is submitting this along with other option changes. One possibility would be to print and recognize strings such as lp64D128 or lp64D96 which would include information about the size of long double. Somehow though I can't really bring myself to believe that -mlong-double128 is a real use case with offloading so we might just disallow the combination. CCing Uros in case he has an opinion. Bernd Index: gcc/common.opt === --- gcc/common.opt.orig +++ gcc/common.opt @@ -1601,6 +1601,19 @@ fnon-call-exceptions Common Report Var(flag_non_call_exceptions) Optimization Support synchronous non-call exceptions +foffload-abi= +Common Joined RejectNegative Enum(offload_abi) Var(flag_offload_abi) Init(OFFLOAD_ABI_UNSET) +-foffload-abi=[lp64|ilp32] Set the ABI to use in an offload compiler + +Enum +Name(offload_abi) Type(enum offload_abi) UnknownError(unknown offload ABI %qs) + +EnumValue +Enum(offload_abi) String(ilp32) Value(OFFLOAD_ABI_ILP32) + +EnumValue +Enum(offload_abi) String(lp64) Value(OFFLOAD_ABI_LP64) + fomit-frame-pointer Common Report Var(flag_omit_frame_pointer) Optimization When possible do not generate stack frames Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c.orig +++ gcc/config/i386/i386.c @@ -4261,6 +4261,15 @@ ix86_option_override (void) register_pass (insert_vzeroupper_info); } +/* Implement the TARGET_OFFLOAD_OPTIONS hook. */ +static char * +ix86_offload_options (void) +{ + if (TARGET_LP64) +return xstrdup (-foffload-abi=lp64); + return xstrdup (-foffload-abi=ilp32); +} + /* Update register usage after having seen the compiler flags. */ static void @@ -47142,6 +47151,10 @@ ix86_atomic_assign_expand_fenv (tree *ho #define TARGET_FLOAT_EXCEPTIONS_ROUNDING_SUPPORTED_P \ ix86_float_exceptions_rounding_supported_p +#undef TARGET_OFFLOAD_OPTIONS +#define TARGET_OFFLOAD_OPTIONS \ + ix86_offload_options + struct gcc_target targetm = TARGET_INITIALIZER; #include gt-i386.h Index: gcc/coretypes.h === --- gcc/coretypes.h.orig +++ gcc/coretypes.h @@ -111,6 +111,12 @@ enum tls_model { TLS_MODEL_LOCAL_EXEC }; +enum offload_abi { + OFFLOAD_ABI_UNSET, + OFFLOAD_ABI_LP64, + OFFLOAD_ABI_ILP32 +}; + /* Types of unwind/exception handling info that can be generated. */ enum unwind_info_type Index: gcc/doc/tm.texi === --- gcc/doc/tm.texi.orig +++ gcc/doc/tm.texi @@ -11446,3 +11446,12 @@ Used when offloaded functions are seen i sections are available. It is called once for each symbol that must be recorded in the offload function and variable table. @end deftypefn + +@deftypefn {Target Hook} {char *} TARGET_OFFLOAD_OPTIONS (void) +Used when writing out the list of options into an LTO file. It should +translate any relevant target-specific options (such as the ABI in use) +into one
Re: [PATCH, BUILDROBOT] SH: Fix unused variable warning (was: [SH][committed] PR 53513 - Add __builtin_sh_get_fpscr, __builtin_sh_set_fpscr)
On Tue, 2014-11-04 13:21:24 +0100, Oleg Endo oleg.e...@t-online.de wrote: On 4 Nov 2014, at 11:50, Jan-Benedict Glaw jbg...@lug-owl.de wrote: 2014-11-04 Jan-Benedict Glaw jbg...@lug-owl.de * config/sh/sh.c (emit_fpu_switch): Drop unused automatic variable. [...] This should fix it, ok for mainline? Yes, OK. Thanks for catching it. Thanks. Committed as r217082. MfG, JBG -- Jan-Benedict Glaw jbg...@lug-owl.de +49-172-7608481 Signature of: Fortschritt bedeutet, einen Schritt so zu machen, the second : daß man den nächsten auch noch machen kann. signature.asc Description: Digital signature
Re: [match-and-simplify] support operator list
On Mon, Nov 3, 2014 at 4:50 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 03, 2014 at 04:46:43PM +0100, Richard Biener wrote: On Mon, Nov 3, 2014 at 2:01 PM, Jakub Jelinek ja...@redhat.com wrote: On Mon, Nov 03, 2014 at 06:26:12PM +0530, Prathamesh Kulkarni wrote: --- gcc/match-comparison.pd (revision 216916) +++ gcc/match-comparison.pd (working copy) @@ -1,5 +1,8 @@ /* From fold_binary. */ +(define_operator_list eq_ops eq ne) +(define_operator_list cc eq_ops lt le gt ge) I think cc is a bad name for the macro, that usually stands for condition code register. OTOH it is a perfect match for 'condition code'. So eqcodes and ccodes, or comp_code, ... ? Saving a few keystrokes there can be a problem for readability. Not to mention that there are various other tcc_comparison codes (lggt, unordered, ordered, un{lt,le,gt,ge,eq}). Sure. Let's use ccodes then - or tcc_comparison (though I thought that was quite long). Well, the patch should be mostly about the new syntax of course. Richard. Jakub
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
Hi again, On 11/04/2014 05:34 AM, Jonathan Wakely wrote: On 04/11/14 03:41 +, Jonathan Wakely wrote: On 03/11/14 22:07 +, Jonathan Wakely wrote: On 3 November 2014 17:51, Paolo Carlini paolo.carl...@oracle.com wrote: .. other than the above issue, I see a segmentation fault for: performance/ext/pb_ds/multimap_text_insert_mem_large.cc Not a big deal of course, but unfortunately today I'm seeing *two* segfaults for pb_ds: performance/ext/pb_ds/multimap_text_insert_mem_large.cc .../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME performance/ext/pb_ds/multimap_text_insert_mem_small.cc .../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME Paolo.
Re: [AArch64, Docs, Patch] Add reference to ACLE in docs.
On 03/11/14 17:58, Joseph Myers wrote: On Mon, 3 Nov 2014, Tejas Belagod wrote: If I mention in a couple of sentences the level of ACLE support there is in GCC currently, this section will need to be updated every time there is an improvement in ACLE support - I guess we'll just have to remember to remove parts of this section as we do that. Yes, it's generally the case when adding new user-visible features that documentation needs updating. The release notes (gcc-N/changes.html in htdocs) should be updated for any significant new features as well. Thanks. The AArch64 ACLE CRC32 intrinsics were introduced in 4.9, so its not new in 5.0 - https://gcc.gnu.org/gcc-4.9/changes.html. But we've improved AArch64 NEON Intrinsics in 5.0 significantly which deserves a mention. I'll do that in a separate patch. Revised patch to fix extend.texi for ACLE attached. OK for trunk? Thanks, Tejas. 2014-11-04 Tejas Belagod tejas.bela...@arm.com gcc/ * Makefile.in (TEXI_GCC_FILES): Remove arm-acle-intrinsics.texi, arm-neon-intrinsics.texi, aarch64-acle-intrinsics.texi. * doc/aarch64-acle-intrinsics.texi: Remove. * doc/arm-acle-intrinsics.texi: Remove. * doc/arm-neon-intrinsics.texi: Remove. * doc/extend.texi: Consolidate sections AArch64 intrinsics, ARM NEON Intrinsics, ARM ACLE Intrinsics into one ARM C Language Extension section. Add references to public ACLE specification. intrinsics-doc-3.txt.gz Description: application/gzip
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
.. and also: performance/ext/pb_ds/priority_queue_text_pop_mem.cc .../libstdc++-v3/scripts/check_performance: line 41: 16905 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME Paolo.
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Am 11/04/2014 10:55 AM, schrieb Paolo Carlini: .. thanks a lot Jon! (after all this parallel mode is still useful for something ;) Paolo. Sorry for the terse message. I'm under heavy workload at the moment. But AFAIS now everything looks good again. Thanks a lot, Paolo and Jon! -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQEcBAEBAgAGBQJUWNVLAAoJEB3HOsWs+KJb24YIAJ9WUYMXxfVfQetYQjh60rKv VGJaJ30imyhx5i+G06XI9vPpds0zICnAm/CPNp4bQKAR8XkO0CzI7qbP63KYTPOA nRBDG1sbhxOpXYWbx7CJ52MM860bYF4MuvVSX1aXVmE2at1MqiYxSscARYGJx5b8 e7+7C12TuyocdU9l3edv3j1eA2eW0R4A9Ae1XECsUQg8llgnhiT5p+Jd1fffwUSH g+v8TDXwqcvmdt/aTfyP1C/Gl3YWh/uZARqcbiShq+dlb2je7n2s1l/ZpXc26vXN Z10BIt7x3FRM61qQuUuiRHlRgEcEeaFpy4uPoYKLfbsAToPYcPKbY8YE7qQGNKo= =SdTc -END PGP SIGNATURE-
Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version
On Tue, Nov 04, 2014 at 12:07:56PM +, Joern Rennecke wrote: On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote: While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is unused, so clean that up too. That's not a clean-up. This pertains to PR 39350. Well, it is a clean-up in the sense that this macro is completely unused in the compiler and has no effect, but please revert this hunk if that is your preference. Which, incidentally, this hookization completely ignores, entrenching the conflation of move expander and move cost estimates. No, I have to disagree. The use_by_pieces_infrastructure_p hook doesn't conflate anything - it gives a response to the question Should the by_pieces infrastructure be used?. A target specific movmem pattern - though it might itself choose to move things by pieces, is categorically not using the move_by_pieces infrastructure. If we want to keep a clean separation of concerns here, we would want a similar target hook asking the single question will your movmem/setmem expander succeed?. Thus, can_move_by_pieces gives the wrong result for purposes of rtl optimizations when a target-specific movmem etc expander emits target-specific code. The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt shows a number of call sites that are affected. can_move_by_pieces (likewise can_store_by_pieces) gives the right result, the RTL expanders are using it wrong. I disagree with the approach taken in your patch as it overloads the purpose of can_move_by_pieces. However, I would support a patch pulling this out in to two hooks, so the call in value-prof.c:gimple_stringops_transform would change from: if (!can_move_by_pieces (val, MIN (dest_align, src_align))) return false; to something like: if (!can_move_by_pieces (val, MIN (dest_align, src_align)) !targetm.can_expand_mem_op_p (val, MIN (dest_align, src_align), MOVE_BY_PIECES)) return false; But let's not confuse the use of what should be a simple hook! arc only implements MOVE_BY_PIECES_P, wiring it to false. Mirror that behaviour, and use the default hook for other by_pieces operations. I tried building a compiler but no amount of fiddling with target strings got me to a sensible result, so this patch is completely untested. You could just pick one of the configs in contrib/config-list.mk Digging in to this, my scripts like to integrate a GDB build - which doesn't work for arc-unknown-elf. I've been following the builds on Jan Benedict-Glaw's since I put the patches in on Saturday morning, and it doesn't look like I broke anything for arc. If I have any more arc patches I'll keep this in mind. Thanks, James
Re: [match-and-simplify] support operator list
On Mon, Nov 3, 2014 at 1:56 PM, Prathamesh Kulkarni bilbotheelffri...@gmail.com wrote: (had sent it earlier by private mail). The attached patch supports operator-list and it's use in for. For now, operator-list is rejected in expression. Ok. This patch also allows user-defined operator to be used as operator-list (user-defined ops are really temporary or scoped operator-lists). (for op (plus minus) op2 (op...) (simplify ...)) Should that be supported or rejected ? I've made it supported, but not in the way you implemented it. Instead I am only expanding operator lists. Thus (for op (plus minus) (for op2 (minus op) ... will iterate (op, op2) (plus, minus) (minus, minus) (plus, plus) (minus, minus). I think that makes more sense(?) but it's maybe too confusing as well so eventually we should reject this. I've adapted the patch slightly, using safe_splice, adding a comment and avoiding a long line. I've also used an example operator list in match-builtin.pd instead where it shows the obvious improvement we're going to implement. Committed. Thanks, Richard. 2014-11-04 Prathamesh Kulkarni bilbotheelffri...@gmail.com * genmatch.c (user_id): Add new member is_oper_list. (user_id::user_id): Add new default argument. (parser::parse_operator_list): New function. (parser::parse_for): Allow operator-list. (parser::parse_pattern): Call parser::parse_operator_list. (parser::parse_operation): Reject operator-list. * match-builtin.pd: Define operator lists POWs, CBRTs and SQRTs. (gmail is too stupid to attach stuff it seems) * genmatch.c (user_id): Add new member is_oper_list. (user_id::user_id): Add new default argument. (parser::parse_operator_list): New function. (parser::parse_for): Allow operator-list. (parser::parse_pattern): Call parser::parse_operator_list. (parser::parse_operation): Reject operator-list. * match-comparison.pd Define operator-lists eq_ops and cc and use them in patterns. Thanks, Prathamesh
RFC: Building a minimal libgfortran for nvptx
The ptx port by its nature is lacking features that are expected on normal machines, such as alloca and indirect jumps. We have a subset of the C library which contains functions that can be implemented on the target (excluding things like file I/O other than printf which is a ptx builtin). It would be good to be able to also build as much of libgfortran as possible, and the following patch is what I've been using so far. It recognizes the target at configure time and restricts the list of compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef tests. There's also a new file minimal.c which contains alternative implementations of some functionality (using printf to write error messages rather than fprintf and such). Constructors are currently unimplemented on ptx and therefore the init function is commented out. Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). Bernd Index: libgfortran/Makefile.am === --- libgfortran/Makefile.am.orig +++ libgfortran/Makefile.am @@ -77,6 +77,14 @@ AM_CFLAGS += $(SECTION_FLAGS) AM_CFLAGS += $(IEEE_FLAGS) AM_FCFLAGS += $(IEEE_FLAGS) +if LIBGFOR_MINIMAL +AM_CFLAGS += -DLIBGFOR_MINIMAL +endif + +if LIBGFOR_MINIMAL +gfor_io_src= \ +io/size_from_kind.c +else gfor_io_src= \ io/close.c \ io/file_pos.c \ @@ -94,6 +102,7 @@ io/unit.c \ io/unix.c \ io/write.c \ io/fbuf.c +endif gfor_io_headers= \ io/io.h \ @@ -101,6 +110,41 @@ io/fbuf.h \ io/format.h \ io/unix.h +if LIBGFOR_MINIMAL +gfor_helper_src= \ +intrinsics/associated.c \ +intrinsics/abort.c \ +intrinsics/args.c \ +intrinsics/bit_intrinsics.c \ +intrinsics/cshift0.c \ +intrinsics/eoshift0.c \ +intrinsics/eoshift2.c \ +intrinsics/erfc_scaled.c \ +intrinsics/extends_type_of.c \ +intrinsics/fnum.c \ +intrinsics/ierrno.c \ +intrinsics/ishftc.c \ +intrinsics/iso_c_generated_procs.c \ +intrinsics/iso_c_binding.c \ +intrinsics/malloc.c \ +intrinsics/mvbits.c \ +intrinsics/move_alloc.c \ +intrinsics/pack_generic.c \ +intrinsics/selected_char_kind.c \ +intrinsics/size.c \ +intrinsics/spread_generic.c \ +intrinsics/string_intrinsics.c \ +intrinsics/rand.c \ +intrinsics/random.c \ +intrinsics/reshape_generic.c \ +intrinsics/reshape_packed.c \ +intrinsics/selected_int_kind.f90 \ +intrinsics/selected_real_kind.f90 \ +intrinsics/transpose_generic.c \ +intrinsics/unpack_generic.c \ +runtime/in_pack_generic.c \ +runtime/in_unpack_generic.c +else gfor_helper_src= \ intrinsics/associated.c \ intrinsics/abort.c \ @@ -165,6 +209,7 @@ intrinsics/unlink.c \ intrinsics/unpack_generic.c \ runtime/in_pack_generic.c \ runtime/in_unpack_generic.c +endif if IEEE_SUPPORT @@ -181,6 +226,15 @@ gfor_ieee_src= endif +if LIBGFOR_MINIMAL +gfor_src= \ +runtime/bounds.c \ +runtime/compile_options.c \ +runtime/memory.c \ +runtime/minimal.c \ +runtime/string.c \ +runtime/select.c +else gfor_src= \ runtime/backtrace.c \ runtime/bounds.c \ @@ -195,6 +249,7 @@ runtime/pause.c \ runtime/stop.c \ runtime/string.c \ runtime/select.c +endif i_all_c= \ $(srcdir)/generated/all_l1.c \ Index: libgfortran/Makefile.in === --- libgfortran/Makefile.in.orig +++ libgfortran/Makefile.in @@ -54,10 +54,11 @@ POST_UNINSTALL = : build_triplet = @build@ host_triplet = @host@ target_triplet = @target@ -@IEEE_SUPPORT_TRUE@am__append_1 = ieee/ieee_helper.c +@LIBGFOR_MINIMAL_TRUE@am__append_1 = -DLIBGFOR_MINIMAL +@IEEE_SUPPORT_TRUE@am__append_2 = ieee/ieee_helper.c # dummy sources for libtool -@onestep_TRUE@am__append_2 = libgfortran_c.c libgfortran_f.f90 +@onestep_TRUE@am__append_3 = libgfortran_c.c libgfortran_f.f90 subdir = . DIST_COMMON = ChangeLog $(srcdir)/Makefile.in $(srcdir)/Makefile.am \ $(top_srcdir)/configure $(am__configure_deps) \ @@ -121,9 +122,13 @@ libcaf_single_la_LIBADD = am_libcaf_single_la_OBJECTS = single.lo libcaf_single_la_OBJECTS = $(am_libcaf_single_la_OBJECTS) libgfortran_la_LIBADD = -am__objects_1 = backtrace.lo bounds.lo compile_options.lo \ - convert_char.lo environ.lo error.lo fpu.lo main.lo memory.lo \ - pause.lo stop.lo string.lo select.lo +@LIBGFOR_MINIMAL_FALSE@am__objects_1 = backtrace.lo bounds.lo \ +@LIBGFOR_MINIMAL_FALSE@ compile_options.lo convert_char.lo \ +@LIBGFOR_MINIMAL_FALSE@ environ.lo error.lo fpu.lo main.lo \ +@LIBGFOR_MINIMAL_FALSE@ memory.lo pause.lo stop.lo string.lo \ +@LIBGFOR_MINIMAL_FALSE@ select.lo +@LIBGFOR_MINIMAL_TRUE@am__objects_1 = bounds.lo compile_options.lo \ +@LIBGFOR_MINIMAL_TRUE@ memory.lo minimal.lo string.lo select.lo am__objects_2 = all_l1.lo all_l2.lo all_l4.lo all_l8.lo all_l16.lo am__objects_3 = any_l1.lo any_l2.lo any_l4.lo any_l8.lo any_l16.lo am__objects_4 = count_1_l.lo count_2_l.lo count_4_l.lo
[PATCH][C-Family] Disable bogus shortening of DFP vs FP compare
On the match-and-simplify branch we expose an issue in shorten_compare which happily transforms (double) float-var != (double) dfp-float-var to (float) float-var != (float) dfp-float-var which is wrong and causes FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++11 execution test FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++1y execution test FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++98 execution test you can expose this latent bug by making get_narrower also treat CONVERT_EXPRs as conversions - on trunk the DFP value happens to be converted to double using that. You then also run into the issue that the C frontend rejects the compare via its common_type implementation which complains about a mixed FP - DFP compare. So the following patch which I am testing right now disables the (premature) shorten-compare optimization on mixed FP - DFP operands. It also exposes the issue on trunk by improving get_narrower. Ok for trunk either with or without the tree.c hunk? (I'm going to remove that if it exposes more issues elsewhere revealed by testing) Thanks, Richard. 2014-11-04 Richard Biener rguent...@suse.de * tree.c (get_narrower): Also look through CONVERT_EXPRs. c-family/ * c-common.c (shorten_compare): Do not shorten mixed DFP and non-DFP compares. Index: gcc/tree.c === --- gcc/tree.c (revision 217049) +++ gcc/tree.c (working copy) @@ -8494,7 +8494,7 @@ get_narrower (tree op, int *unsignedp_pt tree win = op; bool integral_p = INTEGRAL_TYPE_P (TREE_TYPE (op)); - while (TREE_CODE (op) == NOP_EXPR) + while (CONVERT_EXPR_P (op)) { int bitschange = (TYPE_PRECISION (TREE_TYPE (op)) Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 217049) +++ gcc/c-family/c-common.c (working copy) @@ -4314,9 +4314,15 @@ shorten_compare (location_t loc, tree *o /* If either arg is decimal float and the other is float, find the proper common type to use for comparison. */ else if (real1 real2 + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0))) + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1 +type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1)); + + /* If either arg is decimal float and the other is float, fail. */ + else if (real1 real2 (DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0))) || DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1) -type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1)); +return 0; else if (real1 real2 (TYPE_PRECISION (TREE_TYPE (primop0))
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
On 4 November 2014 13:13, Paolo Carlini paolo.carl...@oracle.com wrote: Hi again, On 11/04/2014 05:34 AM, Jonathan Wakely wrote: On 04/11/14 03:41 +, Jonathan Wakely wrote: On 03/11/14 22:07 +, Jonathan Wakely wrote: On 3 November 2014 17:51, Paolo Carlini paolo.carl...@oracle.com wrote: .. other than the above issue, I see a segmentation fault for: performance/ext/pb_ds/multimap_text_insert_mem_large.cc Not a big deal of course, but unfortunately today I'm seeing *two* segfaults for pb_ds: performance/ext/pb_ds/multimap_text_insert_mem_large.cc .../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME performance/ext/pb_ds/multimap_text_insert_mem_small.cc .../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME They were both caused by the infinite recursion in the operator== for tracker_allocator, which should be fixed. However, my change to std::deque's move constructor introduced some new failures, mostly due to dg-error line numbers that need to change, but one real failure in 23_containers/deque/allocator/move_assign.cc I'm not sure whether the right approach is to violate the requirement that deque's move constructor move constructs the allocator, (which I changed to copy construction, to meet the exception safety guarantee that the RHS of the move construction maintains its invariants) or to change the invariant so that a deque doesn't always have a map allocated. I'll fix it asap.
Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare
On Tue, 4 Nov 2014, Richard Biener wrote: On the match-and-simplify branch we expose an issue in shorten_compare which happily transforms (double) float-var != (double) dfp-float-var to (float) float-var != (float) dfp-float-var which is wrong and causes FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++11 execution test FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++1y execution test FAIL: c-c++-common/dfp/convert-bfp-12.c -std=c++98 execution test you can expose this latent bug by making get_narrower also treat CONVERT_EXPRs as conversions - on trunk the DFP value happens to be converted to double using that. You then also run into the issue that the C frontend rejects the compare via its common_type implementation which complains about a mixed FP - DFP compare. So the following patch which I am testing right now disables the (premature) shorten-compare optimization on mixed FP - DFP operands. It also exposes the issue on trunk by improving get_narrower. Ok for trunk either with or without the tree.c hunk? (I'm going to remove that if it exposes more issues elsewhere revealed by testing) I removed it - it fails very early in stage2 with a warning about an always-true comparison. No time to sort that out at this point. So consider this a c-common.c change only. Thanks, Richard. 2014-11-04 Richard Biener rguent...@suse.de * tree.c (get_narrower): Also look through CONVERT_EXPRs. c-family/ * c-common.c (shorten_compare): Do not shorten mixed DFP and non-DFP compares. Index: gcc/tree.c === --- gcc/tree.c(revision 217049) +++ gcc/tree.c(working copy) @@ -8494,7 +8494,7 @@ get_narrower (tree op, int *unsignedp_pt tree win = op; bool integral_p = INTEGRAL_TYPE_P (TREE_TYPE (op)); - while (TREE_CODE (op) == NOP_EXPR) + while (CONVERT_EXPR_P (op)) { int bitschange = (TYPE_PRECISION (TREE_TYPE (op)) Index: gcc/c-family/c-common.c === --- gcc/c-family/c-common.c (revision 217049) +++ gcc/c-family/c-common.c (working copy) @@ -4314,9 +4314,15 @@ shorten_compare (location_t loc, tree *o /* If either arg is decimal float and the other is float, find the proper common type to use for comparison. */ else if (real1 real2 + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0))) + DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1 +type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1)); + + /* If either arg is decimal float and the other is float, fail. */ + else if (real1 real2 (DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop0))) || DECIMAL_FLOAT_MODE_P (TYPE_MODE (TREE_TYPE (primop1) -type = common_type (TREE_TYPE (primop0), TREE_TYPE (primop1)); +return 0; else if (real1 real2 (TYPE_PRECISION (TREE_TYPE (primop0)) -- Richard Biener rguent...@suse.de SUSE LINUX GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendoerffer, HRB 21284 (AG Nuernberg) Maxfeldstrasse 5, 90409 Nuernberg, Germany
Re: The nvptx port [1/11+] indirect jumps
On 10/20/2014 04:19 PM, Bernd Schmidt wrote: ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be defined. Add a sorry. Looking back through all the mails it turns out this one wasn't approved yet. Ping? Bernd
[PATCH] Fix dump scan in test devirt-40.C
Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks, Martin gcc/testsuite/ 2014-11-04 Martin Jambor mjam...@suse.cz * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in function body. Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C === --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr UnicodeString a[2]; } -/* { dg-final { scan-tree-dump converting indirect call to function virtual UnicodeString fre2 } } */ +/* { dg-final { scan-tree-dump-not \\n OBJ_TYPE_REF fre2 } } */ /* { dg-final { cleanup-tree-dump fre2 } } */
Re: RFC: Building a minimal libgfortran for nvptx
On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote: The ptx port by its nature is lacking features that are expected on normal machines, such as alloca and indirect jumps. We have a subset of the C library which contains functions that can be implemented on the target (excluding things like file I/O other than printf which is a ptx builtin). It would be good to be able to also build as much of libgfortran as possible, and the following patch is what I've been using so far. It recognizes the target at configure time and restricts the list of compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef tests. There's also a new file minimal.c which contains alternative implementations of some functionality (using printf to write error messages rather than fprintf and such). Constructors are currently unimplemented on ptx and therefore the init function is commented out. Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. -- steve
Re: The nvptx port [1/11+] indirect jumps
On 11/04/2014 04:32 PM, Bernd Schmidt wrote: On 10/20/2014 04:19 PM, Bernd Schmidt wrote: ptx doesn't have indirect jumps, so CODE_FOR_indirect_jump may not be defined. Add a sorry. Looking back through all the mails it turns out this one wasn't approved yet. Ping? Ok. r~
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
Hi, On 11/04/2014 04:32 PM, Jonathan Wakely wrote: Not a big deal of course, but unfortunately today I'm seeing *two* segfaults for pb_ds: performance/ext/pb_ds/multimap_text_insert_mem_large.cc .../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME performance/ext/pb_ds/multimap_text_insert_mem_small.cc .../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME They were both caused by the infinite recursion in the operator== for tracker_allocator, which should be fixed. To be clear, I see (deterministically, apparently) the three pb_ds fails which I reported earlier today with a *current* tree. But the pb_ds issues are definitely low priority, unless they point to something serious elsewhere. Thanks for looking into that. Paolo.
Re: [PATCH] Fix dump scan in test devirt-40.C
On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote: Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks, Martin gcc/testsuite/ 2014-11-04 Martin Jambor mjam...@suse.cz * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in function body. Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C === --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr UnicodeString a[2]; } -/* { dg-final { scan-tree-dump converting indirect call to function virtual UnicodeString fre2 } } */ +/* { dg-final { scan-tree-dump-not \\n OBJ_TYPE_REF fre2 } } */ What's the odd newline and spaces here? /* { dg-final { cleanup-tree-dump fre2 } } */
Re: [PATCH] Fix dump scan in test devirt-40.C
Hi, On 11/04/2014 04:37 PM, Martin Jambor wrote: Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks. Just to confirm, devirt-42.C is also failing, that is also known, right? FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-tree-dump-times optimized return 2 2 Thanks again, Paolo.
Re: [PATCH] Fix dump scan in test devirt-40.C
On 04/11/14 15:51, Paolo Carlini wrote: Hi, On 11/04/2014 04:37 PM, Martin Jambor wrote: Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks. Just to confirm, devirt-42.C is also failing, +1, on my local arm/aarch64 test environment. FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-ipa-dump-times inline Discovered a virtual call to a known target 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-ipa-dump-times inline First type is base of second 3 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-ipa-dump-times inline First type is base of second 3 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-ipa-dump-times inline Discovered a virtual call to a known target 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-ipa-dump-times inline Discovered a virtual call to a known target 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-ipa-dump-times inline First type is base of second 3 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-tree-dump-times optimized return 2 2 that is also known, right? FAIL: g++.dg/ipa/devirt-42.C -std=gnu++11 scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++1y scan-tree-dump-times optimized return 2 2 FAIL: g++.dg/ipa/devirt-42.C -std=gnu++98 scan-tree-dump-times optimized return 2 2 Thanks again, Paolo.
Re: [PATCH] Fix dump scan in test devirt-40.C
On Tue, Nov 04, 2014 at 04:47:18PM +0100, Richard Biener wrote: On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote: Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks, Martin gcc/testsuite/ 2014-11-04 Martin Jambor mjam...@suse.cz * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in function body. Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C === --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr UnicodeString a[2]; } -/* { dg-final { scan-tree-dump converting indirect call to function virtual UnicodeString fre2 } } */ +/* { dg-final { scan-tree-dump-not \\n OBJ_TYPE_REF fre2 } } */ What's the odd newline and spaces here? That only matches an OBJ_TYPE_REF as a statement in the function body and specifically does not match Determining dynamic type for call: OBJ_TYPE_REF and Starting walk at: OBJ_TYPE_REF that are present in the dump file above the dump function. An alternative would be of course to just scan the optimized dump if this is considered too much of a hack. But at least I though that for a testcas it would do nicely. Martin
Re: RFC: Building a minimal libgfortran for nvptx
On 11/04/2014 04:41 PM, Steve Kargl wrote: It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. I don't have the Fortran standard, but I assume that missing pieces in the library would be a violation. However, the alternative is no Fortran (library) support at all, which doesn't seem like an improvement. The target simply does not allow full language support, even for something like C. Note that the intention is not to support Fortran (or any other language) directly targetting ptx code. The only way it's supposed to be used is as an accelerator for OpenACC offloading. Bernd
Re: RFC: Building a minimal libgfortran for nvptx
Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). I’m afraid I don’t really see the point. Maybe I’ll say it differently that Steve has: what would be the state of Fortran on such platform, and would it have users? It looks that, at the very least, your target wouldn’t be able to do any I/O. Unless we have a good reason to think there will be users, because the state of support will be good, I don’t see that we should add to the library maintainance burden by special-casing targets. Also: if other targets come along that have this need, how does your strategy scale up? Thanks, FX
Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare
On Tue, 4 Nov 2014, Richard Biener wrote: c-family/ * c-common.c (shorten_compare): Do not shorten mixed DFP and non-DFP compares. OK. I think it's also wrong for get_narrower to strip conversions between binary and decimal floating point at all, as all such conversions for supported pairs of types can change values. (_Decimal128 can represent all values of __fp16, but no target currently supports both types.) And if flag_signaling_nans (strictly, if HONOR_SNANS (source-mode)), no floating-point conversions should be stripped in get_narrower at all. And even without signaling NaNs, TYPE_PRECISION may not be a reliable indication of whether a conversion is widening - the only issue applying at present might be the corner case that a conversion of a subnormal from __float80 to __float128 is exact but still raises underflow if traps on underflow are enabled, but when you have both binary128 and IBM long double supported together then neither has a set of values that is a superset of the other (it seems https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01086.html misses get_narrower in the places that need addressing). -- Joseph S. Myers jos...@codesourcery.com
Re: [PATCH][C-Family] Disable bogus shortening of DFP vs FP compare
On Tue, 4 Nov 2014, Joseph Myers wrote: On Tue, 4 Nov 2014, Richard Biener wrote: c-family/ * c-common.c (shorten_compare): Do not shorten mixed DFP and non-DFP compares. OK. I think it's also wrong for get_narrower to strip conversions between binary and decimal floating point at all, as all such conversions for supported pairs of types can change values. (_Decimal128 can represent all values of __fp16, but no target currently supports both types.) And if flag_signaling_nans (strictly, if HONOR_SNANS (source-mode)), no floating-point conversions should be stripped in get_narrower at all. And even without signaling NaNs, TYPE_PRECISION may not be a reliable indication of whether a conversion is widening - the only issue applying at present might be the corner case that a conversion of a subnormal from __float80 to __float128 is exact but still raises underflow if traps on underflow are enabled, but when you have both binary128 and IBM long double supported together then neither has a set of values that is a superset of the other (it seems https://gcc.gnu.org/ml/gcc-patches/2014-07/msg01086.html misses get_narrower in the places that need addressing). From this and past issues it looks like most (if not all) conversions between FP types use CONVERT_EXPR, not NOP_EXPR and thus may work as expected here. That's of course somewhat unreliable as the middle-end happily uses NOP_EXPR for those. Richard.
Re: RFC: Building a minimal libgfortran for nvptx
On 11/04/2014 04:59 PM, FX wrote: Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). I’m afraid I don’t really see the point. Maybe I’ll say it differently that Steve has: what would be the state of Fortran on such platform, and would it have users? It looks that, at the very least, your target wouldn’t be able to do any I/O. It would be used through OpenACC - and that I do believe that does have Fortran users, with other compilers. We can make Fortran OpenACC work for simple testcases without any runtime library, but IMO it would be better to go as far as possible in supporting whatever can be made to work. I/O is the major piece that is missing, but you'd expect that to be done on the host rather than on the accelerator. Bernd
Re: RFC: Building a minimal libgfortran for nvptx
On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote: On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote: The ptx port by its nature is lacking features that are expected on normal machines, such as alloca and indirect jumps. We have a subset of the C library which contains functions that can be implemented on the target (excluding things like file I/O other than printf which is a ptx builtin). It would be good to be able to also build as much of libgfortran as possible, and the following patch is what I've been using so far. It recognizes the target at configure time and restricts the list of compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef tests. There's also a new file minimal.c which contains alternative implementations of some functionality (using printf to write error messages rather than fprintf and such). Constructors are currently unimplemented on ptx and therefore the init function is commented out. Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. Jakub
Re: RFC: Building a minimal libgfortran for nvptx
The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. Do we have the architecture for that in place in GCC in general, and in the Fortran front-end in particular? I’d be interested to see how it works… FX
Re: [PATCH 12/27] New file: gcc/jit/jit-recording.h
On Mon, 2014-11-03 at 14:27 -0700, Jeff Law wrote: On 10/31/14 11:02, David Malcolm wrote: This file declares the gcc::jit::recording internal API, so that libgccjit.c can record the calls that are made to the public API, for later playback by the dummy frontend. gcc/jit/ * jit-recording.h: New. --- gcc/jit/jit-recording.h | 1593 +++ 1 file changed, 1593 insertions(+) create mode 100644 gcc/jit/jit-recording.h diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h new file mode 100644 index 000..bb1a2ee --- /dev/null +++ b/gcc/jit/jit-recording.h [ ... ] + +private: + void validate (); So give the complexities in interfacing with the guts of GCC, would it make sense to expose the validate method? Most of the error-checking in the API happens in the API calls in libgccjit.c, testing that the individual pieces are sane. The validate method tests the things that can only be verified as a whole, when the context is about to be compiled: are there unreachable blocks? is every block terminated? etc I can't quite see why client code might want to perform the latter kind of validation without actually doing a compile, so I don't plan to expose this at this time. It's trivial to do so if someone needs it. +/* or just use std::string? */ +class string : public memento Is there some reason not to use std::string? I really like using standard components rather than rolling our own. I think I was trying to avoid std::string for some reason, but I'm not quite sure why, perhaps out of a misremembered idea that libstdc++ was verboten (I currently use std:: in one place, in jit-playback.h, where a playback::context has a: vecstd::pairtree, location * m_cached_locations; ). In any case recording::string is an implementation detail hidden within the library. It is a recording::memento and hence has the same lifetime as the recording::context. I can't think of a reason off the top of my head why a std::string wouldn't work instead, but the existing code works, and has been through a fair amount of debugging. One thing I do make use of is that it's a pointer, for this field within recording::memento: string *m_debug_string; so that I can use NULL for the common case of no debug string has been built for this thing yet - though I suspect I could use a std::string * for that. OK for the trunk. Thanks.
Re: RFC: Building a minimal libgfortran for nvptx
On Tue, Nov 04, 2014 at 05:15:52PM +0100, FX wrote: The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. Do we have the architecture for that in place in GCC in general, and in the Fortran front-end in particular? I’d be interested to see how it works… See https://gcc.gnu.org/wiki/Offloading and kyukhin/gomp4-offload and branches/gomp-4_0-branch branches. Both are in the process of being merged into trunk these days. Jakub
Re: [PATCH] Fix dump scan in test devirt-40.C
On Tue, Nov 4, 2014 at 4:57 PM, Martin Jambor mjam...@suse.cz wrote: On Tue, Nov 04, 2014 at 04:47:18PM +0100, Richard Biener wrote: On Tue, Nov 4, 2014 at 4:37 PM, Martin Jambor mjam...@suse.cz wrote: Hi, since revision 216728, testsuite/g++.dg/ipa/devirt-40.C is failing because although the tested-for devirtualization does happen, it is probably being done earlier and the string we are trying to match is not emitted. But the important thing is that the tested devirtualization takes place. Patch has been pre-approved on IRC by Honza, I have tested it with make -k check RUNTESTFLAGS=dg.exp=ipa/devirt-40.C which I hope is sufficient and I have tested that the pattern matches a call to an OBJ_TYPE_REF in function body in an earlier dump file. I will commit the patch shortly. Thanks, Martin gcc/testsuite/ 2014-11-04 Martin Jambor mjam...@suse.cz * devirt-40.C: Changed dump to not matching OBJ_TYPE_REF in function body. Index: src/gcc/testsuite/g++.dg/ipa/devirt-40.C === --- src.orig/gcc/testsuite/g++.dg/ipa/devirt-40.C +++ src/gcc/testsuite/g++.dg/ipa/devirt-40.C @@ -19,5 +19,5 @@ A::m_fn1 (UnicodeString , int p2, UErr UnicodeString a[2]; } -/* { dg-final { scan-tree-dump converting indirect call to function virtual UnicodeString fre2 } } */ +/* { dg-final { scan-tree-dump-not \\n OBJ_TYPE_REF fre2 } } */ What's the odd newline and spaces here? That only matches an OBJ_TYPE_REF as a statement in the function body and specifically does not match Determining dynamic type for call: OBJ_TYPE_REF and Starting walk at: OBJ_TYPE_REF that are present in the dump file above the dump function. Ah ok. Thanks, Richard. An alternative would be of course to just scan the optimized dump if this is considered too much of a hack. But at least I though that for a testcas it would do nicely. Martin
Re: [Patch 3/7 arc] Deprecate *_BY_PIECES_P, move to hookized version
On 4 November 2014 14:24, James Greenhalgh james.greenha...@arm.com wrote: On Tue, Nov 04, 2014 at 12:07:56PM +, Joern Rennecke wrote: On 31 October 2014 15:10, James Greenhalgh james.greenha...@arm.com wrote: While I am there, arc defines a macro CAN_MOVE_BY_PIECES, which is unused, so clean that up too. That's not a clean-up. This pertains to PR 39350. Well, it is a clean-up in the sense that this macro is completely unused in the compiler and has no effect, but please revert this hunk if that is your preference. Which, incidentally, this hookization completely ignores, entrenching the conflation of move expander and move cost estimates. No, I have to disagree. The use_by_pieces_infrastructure_p hook doesn't conflate anything - it gives a response to the question Should the by_pieces infrastructure be used?. A target specific movmem pattern - though it might itself choose to move things by pieces, is categorically not using the move_by_pieces infrastructure. If we want to keep a clean separation of concerns here, we would want a similar target hook asking the single question will your movmem/setmem expander succeed?. That would not be helpful. What the rtl optimizers actually want to know is will this block copy / memset be cheap? . A movmem expander might succeed (or not) for various reasons. The one that's interesting for the above question is if the call has been inlined with a fast set of instructions. Thus, can_move_by_pieces gives the wrong result for purposes of rtl optimizations when a target-specific movmem etc expander emits target-specific code. The patch at https://gcc.gnu.org/ml/gcc-patches/2009-03/txt00018.txt shows a number of call sites that are affected. can_move_by_pieces (likewise can_store_by_pieces) gives the right result, the RTL expanders are using it wrong. I could agree with that view if there was a good strategy agreed what the rtl expanders should do instead. I disagree with the approach taken in your patch as it overloads the purpose of can_move_by_pieces. However, I would support a patch pulling this out in to two hooks, so the call in value-prof.c:gimple_stringops_transform would change from: if (!can_move_by_pieces (val, MIN (dest_align, src_align))) return false; to something like: if (!can_move_by_pieces (val, MIN (dest_align, src_align)) !targetm.can_expand_mem_op_p (val, MIN (dest_align, src_align), MOVE_BY_PIECES)) return false; But this goes back to the problem that it's not about if we can expand the mem op at all, but if we have a fast expansion. We can always expand via libcall (the middle end does this as a fall-back). Also, the target might do some target-specific slow expansion, e.g. call a function with another name and maybe a modified ABI, but still relatively slow to work. So, either the new hook would answer the wrong question, or it would be misnamed, in which case it's likely that the semantics will sooner or later follow the name. it will gravitate to answer the wrong question again. But let's not confuse the use of what should be a simple hook! What would that be? TARGET_RTX_COST is unsuitable because the RTL for the call hasn't been made yet, and it it was, it would tend to be multiple instructions, maybe even a loop. Should we have an analogous TARGET_TREE_COST hook, so that you can ask the target what it thinks the cost of a tree will be once it's expanded?
[patch] Provide a can_compare_and_swap_p target hook.
java/builtins.c needs to call can_compare_and_swap, which happens to be provided via optabs.h. As a result, we see the following in java/builtins.c: /* FIXME: All these headers are necessary for sync_compare_and_swap. Front ends should never have to look at that. */ #include rtl.h #include insn-codes.h #include expr.h #include optabs.h by providing a target hook and a default which simply calls can_compare_and_swap_p(), we can remove all that include crap from the front end. None of those 4 include files are now directly included by any front end files. Bootstraps on x86_64-unknown-linux-gnu, and no new testsuite failures. OK for trunk? Andrew * target.def (can_compare_and_swap_p): Define target hook. * targhooks.c (default_can_compare_and_swap_p): Provide default hook. * targhooks.h (default_can_compare_and_swap_p): Provide prototype. * doc/tm.texi (target_can_compare_and_swap_p): Provide description. * doc/tm.texi.in (TARGET_CAN_COMPARE_AND_SWAP_P): Provide location. * java/builtins.c. Remove backend includes, include target.h. (compareAndSwapInt_builtin, compareAndSwapLong_builtin, compareAndSwapObject_builtin, VMSupportsCS8_builtin): Use compare_and_swap_p target hook. Index: target.def === *** target.def (revision 217057) --- target.def (working copy) *** specific target options and the caller d *** 5185,5190 --- 5185,5199 bool, (tree caller, tree callee), default_target_can_inline_p) + DEFHOOK + (can_compare_and_swap_p, + This macro returns true if the target can perform an atomic\n\ + compare_and_swap operation on an object of machine mode @var{mode}.\n\ + @var{allow_libcall} is a boolean which is used to indicate whether the\n\ + operation is allowed to be performed with a libcall., + bool, (machine_mode mode, bool allow_libcall), + default_can_compare_and_swap_p) + HOOK_VECTOR_END (target_option) /* For targets that need to mark extra registers as live on entry to Index: targhooks.c === *** targhooks.c (revision 217057) --- targhooks.c (working copy) *** default_target_can_inline_p (tree caller *** 1322,1327 --- 1322,1334 return ret; } + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } + + #ifndef HAVE_casesi # define HAVE_casesi 0 #endif Index: targhooks.h === *** targhooks.h (revision 217057) --- targhooks.h (working copy) *** extern tree std_gimplify_va_arg_expr (tr *** 220,224 --- 220,226 extern bool can_use_doloop_if_innermost (const widest_int , const widest_int , unsigned int, bool); + extern bool default_can_compare_and_swap_p (machine_mode, bool); + #endif /* GCC_TARGHOOKS_H */ Index: doc/tm.texi === *** doc/tm.texi (revision 217057) --- doc/tm.texi (working copy) *** default, inlining is not allowed if the *** 9742,9747 --- 9742,9754 specific target options and the caller does not use the same options. @end deftypefn + @deftypefn {Target Hook} bool TARGET_CAN_COMPARE_AND_SWAP_P (machine_mode @var{mode}, bool @var{allow_libcall}) + This macro returns true if the target can perform an atomic + compare_and_swap operation on an object of machine mode @var{mode}. + @var{allow_libcall} is a boolean which is used to indicate whether the + operation is allowed to be performed with a libcall. + @end deftypefn + @node Emulated TLS @section Emulating TLS @cindex Emulated TLS Index: doc/tm.texi.in === *** doc/tm.texi.in (revision 217057) --- doc/tm.texi.in (working copy) *** on this implementation detail. *** 7227,7232 --- 7227,7234 @hook TARGET_CAN_INLINE_P + @hook TARGET_CAN_COMPARE_AND_SWAP_P + @node Emulated TLS @section Emulating TLS @cindex Emulated TLS Index: java/builtins.c === *** java/builtins.c (revision 217057) --- java/builtins.c (working copy) *** The Free Software Foundation is independ *** 37,49 #include flags.h #include langhooks.h #include java-tree.h ! ! /* FIXME: All these headers are necessary for sync_compare_and_swap. !Front ends should never have to look at that. */ ! #include rtl.h ! #include insn-codes.h ! #include expr.h ! #include optabs.h static tree max_builtin (tree, tree); static tree min_builtin (tree, tree); --- 37,43 #include flags.h #include langhooks.h #include java-tree.h ! #include target.h static tree max_builtin (tree, tree); static tree min_builtin (tree, tree); *** compareAndSwapInt_builtin
Re: [PATCH 13/27] New file: gcc/jit/jit-recording.c
On Mon, 2014-11-03 at 15:03 -0700, Jeff Law wrote: On 10/31/14 11:02, David Malcolm wrote: Implementation of the gcc::jit::recording internal API, so that libgccjit.c can record the calls that are made to the public API, for later playback by the dummy frontend. gcc/jit/ * jit-recording.c: New. Cursory review since it's the JIT implementation directory which you'll likely end up owning... + +/* Assuming that this block has been terminated, get the number of + successor blocks, which will be 0, 1 or 2, for return, unconditional + jump, and conditional jump respectively. + NEXT1 and NEXT2 must be non-NULL. The first successor block (if any) + is written to NEXT1, and the second (if any) to NEXT2. + + Used when validating functions, and when dumping dot representations + of them. */ So presumably no multi-way branches yet? Might be better to build the API in such a way that it handles multi-way branches from the get-go rather than retrofitting later. Your call. This is an internal API. If the external API needs to support building switch statements from client code maybe with something like this: extern void gcc_jit_block_end_with_switch (gcc_jit_block *block, gcc_jit_location *loc, gcc_jit_rvalue *expr, int num_cases, gcc_jit_rvalue **case_values, gcc_jit_block **case_blocks, gcc_jit_block *default_block); then this internal API would obviously need to change, maybe becoming: virtual int get_num_successor_blocks () const; virtual block *get_successor_block (int index) const; but that won't affect the public API. That said it's not clear to me that we do need to support switch statements; no JIT implementation I've seen has needed them (I put them in the Probably not needed section of TODO.rst). Otherwise I don't see anything objectionable. OK for the trunk. Thanks Dave
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
On 11/04/2014 04:46 PM, Paolo Carlini wrote: Hi, On 11/04/2014 04:32 PM, Jonathan Wakely wrote: Not a big deal of course, but unfortunately today I'm seeing *two* segfaults for pb_ds: performance/ext/pb_ds/multimap_text_insert_mem_large.cc .../libstdc++-v3/scripts/check_performance: line 41: 16173 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME performance/ext/pb_ds/multimap_text_insert_mem_small.cc .../libstdc++-v3/scripts/check_performance: line 41: 16217 Segmentation fault ./$EXE_NAME tmp.$FILE_NAME They were both caused by the infinite recursion in the operator== for tracker_allocator, which should be fixed. To be clear, I see (deterministically, apparently) the three pb_ds fails which I reported earlier today with a *current* tree. But the pb_ds issues are definitely low priority, unless they point to something serious elsewhere. Thanks for looking into that. Ah! The testsuite_allocator.h fix of yours is still unapplied, didn't know that, sorry ;) Paolo.
Re: [PATCH 14/27] New files: gcc/jit/jit-builtins.{c|h}
On Mon, 2014-11-03 at 14:04 -0700, Jeff Law wrote: On 10/31/14 11:02, David Malcolm wrote: These files implement support for builtins, for the gcc_jit_context_get_builtin_function API entrypoint. Only a subset of builtins are currently supported, based on those that I needed when porting GNU Octave's JIT. Attempts to use other builtins may lead to an error: unimplemented primitive type for builtin being emitted on the gcc_jit_context. gcc/jit/ * jit-builtins.c: New. * jit-builtins.h: New. So how do you envision maintenance on this in the future? Reality is folks are regularly adding new builtins, are they going to have to update the JIT too? I believe the jit will only need updating if someone adds a new *macro* to builtin-types.def e.g. if someone adds a new DEF_FUNCTION_TYPE_9 or somesuch. Everything is keyed off the .def files, so GCC developers adding a new builtin shouldn't need to touch the jit. As I noted above, only a subset of the types are currently supported from the jit; attempts to use builtins of other types from the jit will fail at runtime with the type error I pasted above (and can be fixed up on a case-by-case basis if the users file bug reports). OK for the trunk. Thanks Dave
Re: The nvptx port [10/11+] Target files
On 10/28/2014 03:56 PM, Bernd Schmidt wrote: +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote) +{ + switch (mode) +{ +case BLKmode: + return .b8; +case BImode: + return .pred; +case QImode: + if (promote) + return .u32; + else + return .u8; +case HImode: + return .u16; Promote here too? Or does this have nothing to do with +static enum machine_mode +arg_promotion (enum machine_mode mode) +{ + if (mode == QImode || mode == HImode) +return SImode; + return mode; +} r~
Re: The nvptx port [10/11+] Target files
On 11/04/2014 05:48 PM, Richard Henderson wrote: On 10/28/2014 03:56 PM, Bernd Schmidt wrote: +nvptx_ptx_type_from_mode (enum machine_mode mode, bool promote) +{ + switch (mode) +{ +case BLKmode: + return .b8; +case BImode: + return .pred; +case QImode: + if (promote) + return .u32; + else + return .u8; +case HImode: + return .u16; Promote here too? Or does this have nothing to do with +static enum machine_mode +arg_promotion (enum machine_mode mode) +{ + if (mode == QImode || mode == HImode) +return SImode; + return mode; +} No, these are different problems - the one in arg promotion is purely about KR C and trying to match untyped function decls with calls, while the type_from_mode bit was about some ptx ideosyncracy. Although I forget what the problem was, that code is more than a year old - I'll see if I can get rid of this. Bernd
Re: libstdc++ testsuite make targets check-parallel and check-performance don't work anymore
On 4 November 2014 16:34, Paolo Carlini wrote: Ah! The testsuite_allocator.h fix of yours is still unapplied, didn't know that, sorry ;) My bad - I thought I'd committed it! Done now.
Re: [PATCH 10/27] New file: gcc/jit/libgccjit.c
On Mon, 2014-11-03 at 13:32 -0700, Jeff Law wrote: On 10/31/14 11:02, David Malcolm wrote: This file implements the entrypoints of the library's public API. It performs error-checking at this boundary, before calling into the jit-recording.h internal API. gcc/jit/ * libgccjit.c: New. --- gcc/jit/libgccjit.c | 1506 +++ 1 file changed, 1506 insertions(+) create mode 100644 gcc/jit/libgccjit.c + +#define IS_ASCII_ALPHA(CHAR) \ + (\ +((CHAR) = 'a' (CHAR) ='z')\ +|| \ +((CHAR) = 'A' (CHAR) = 'Z') \ + ) + +#define IS_ASCII_DIGIT(CHAR) \ + ((CHAR) = '0' (CHAR) ='9') + +#define IS_ASCII_ALNUM(CHAR) \ + (IS_ASCII_ALPHA (CHAR) || IS_ASCII_DIGIT (CHAR)) Can't we rely on the C library to give us equivalents? I've been burned in the past by the C library using locales, in particular the two lowercase i variants in Turkish. These macros are used by gcc_jit_context_new_function to enforce C's naming restrictions, to avoid errors from the assembler. The comment I put there was: /* The assembler can only handle certain names, so for now, enforce C's rules for identifiers upon the name. Eventually we'll need some way to interact with e.g. C++ name mangling. */ Am I right in thinking that for the assembler we need to enforce the C naming rules specifically on *ASCII*. (clearly another comment is needed here). + +/* TODO: mark failure branches as unlikely? */ + Not likely worth the effort. And it'd be better to somehow mark jit_error such that anytime a path unconditionally calls jit_error, the whole path is considered unlikely. (nods) I think it was Ball Larus that had a predictor of this nature, I don't recall its effectiveness offhand. But I'd rather be marking the function than sprinlkling unlikely all over the actual codepaths. Presumably by marking it with __attribute__((cold)) ? (with a suitable macro in case we're not being compiled with a gcc that supports it). I suspect doing so will be of more use in teaching me about gcc internals and the effect on generated code that in measurable benefits to said code in this case :) +static bool +compatible_types (gcc::jit::recording::type *ltype, + gcc::jit::recording::type *rtype) All function definitions should have a block comment describing the function and its arguments. This comment applies throughout the code and needs to be addressed prior to commit. In fact I really can't even continue review of this code due to the lack of comments -- I rely heavily on them. Sorry. I'll post a followup with comments added. Many of the functions are public API entrypoints, where there's a comment in the public header. Should I simply duplicate the comment from there into the .c file, or put a comment like: /* Public entrypoint. See description in libgccjit.h. */ for each of these? (perhaps with additional text giving implementation notes?) Thanks for all the reviews. Looks like this and patch 16 are now the only non-approved parts of the kit (I didn't see a review of 16). Dave
Re: RFC: Building a minimal libgfortran for nvptx
On Tue, Nov 04, 2014 at 04:54:54PM +0100, Bernd Schmidt wrote: On 11/04/2014 04:41 PM, Steve Kargl wrote: It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. I don't have the Fortran standard, but I assume that missing pieces in the library would be a violation. However, the alternative is no Fortran (library) support at all, which doesn't seem like an improvement. The target simply does not allow full language support, even for something like C. Note that the intention is not to support Fortran (or any other language) directly targetting ptx code. The only way it's supposed to be used is as an accelerator for OpenACC offloading. I see. I get nervous when a patch appears that throws away a part of the runtime library. There are typically unintended consequences, which then becomes a support issue. -- Steve
[RFC PATCH v4] Fix PR56480 aka DR374. Allow explicit specialization in enclosing namespace.
On 2014.11.03 at 08:47 -0600, Jason Merrill wrote: On 11/03/2014 05:27 AM, Markus Trippelsdorf wrote: BTW both EDG and clang reject g++.dg/template/spec17.C: namespace io { template typename int foo(); } using namespace io; template int fooint(); But I think it is a reasonable extension to accept it. We should reject it, too. We can only leave out the namespace qualification with inline namespaces. +// { dg-options -pedantic -pedantic-errors } You shouldn't need these dg-options lines; if a testcase has no dg-options line, -pedantic-errors is the default. The following patch passes testing and fixes both issues. Please let me know if you're OK with the general direction, before I write a full Changelog. Thanks. diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index 320c39f636a2..ddfa5ebfd968 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -7855,7 +7855,8 @@ grokfndecl (tree ctype, decl = check_explicit_specialization (orig_declarator, decl, template_count, 2 * funcdef_flag + - 4 * (friendp != 0)); + 4 * (friendp != 0) + + 8 * (in_namespace != 0)); if (decl == error_mark_node) return NULL_TREE; diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 2cf10f442f68..5cc320ed2fcf 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -755,57 +755,58 @@ end_explicit_instantiation (void) processing_explicit_instantiation = false; } -/* An explicit specialization or partial specialization of TMPL is being - declared. Check that the namespace in which the specialization is - occurring is permissible. Returns false iff it is invalid to - specialize TMPL in the current namespace. */ +/* If specialization is true, an explicit specialization or partial + specialization of TMPL is being declared. Check that the namespace + in which the specialization is occurring is permissible. Returns + false iff it is invalid to specialize TMPL in the current namespace. + If specialization is false, tmpl is an explicit instantiation. + Check that it is valid to perform this explicit instantiation in + the current namespace. */ static bool -check_specialization_namespace (tree tmpl) +check_instant_or_special_namespace (tree tmpl, bool specialization) { tree tpl_ns = decl_namespace_context (tmpl); - /* [tmpl.expl.spec] - - An explicit specialization shall be declared in the namespace of - which the template is a member, or, for member templates, in the - namespace of which the enclosing class or enclosing class - template is a member. An explicit specialization of a member - function, member class or static data member of a class template - shall be declared in the namespace of which the class template is - a member. */ - if (current_scope() != DECL_CONTEXT (tmpl) - !at_namespace_scope_p ()) -{ - error (specialization of %qD must appear at namespace scope, tmpl); - return false; -} - if (is_associated_namespace (current_namespace, tpl_ns)) -/* Same or super-using namespace. */ -return true; - else + if (specialization) { - permerror (input_location, specialization of %qD in different namespace, tmpl); - permerror (input_location, from definition of %q+#D, tmpl); - return false; -} -} + /* [tmpl.expl.spec] -/* SPEC is an explicit instantiation. Check that it is valid to - perform this explicit instantiation in the current namespace. */ - -static void -check_explicit_instantiation_namespace (tree spec) -{ - tree ns; +An explicit specialization shall be declared in the namespace of +which the template is a member, or, for member templates, in the +namespace of which the enclosing class or enclosing class +template is a member. An explicit specialization of a member +function, member class or static data member of a class template +shall be declared in the namespace of which the class template is +a member. */ + if (current_scope () != DECL_CONTEXT (tmpl) !at_namespace_scope_p ()) + { + error (specialization of %qD must appear at namespace scope, tmpl); + return false; + } + if (is_associated_namespace (current_namespace, tpl_ns)) + /* Same or super-using namespace. */ + return true; + /* DR374: explicit specializations of templates in an enclosing +namespace are allowed. */ + else if (cxx_dialect cxx11) + { + pedwarn (input_location, OPT_Wpedantic, + specialization of %qD in different namespace, tmpl); + pedwarn (input_location, OPT_Wpedantic, from definition of %q+#D, + tmpl); + } +} /* DR 275: An explicit instantiation shall appear in an enclosing namespace of its
Re: RFC: Building a minimal libgfortran for nvptx
See https://gcc.gnu.org/wiki/Offloading and kyukhin/gomp4-offload and branches/gomp-4_0-branch branches. Both are in the process of being merged into trunk these days. Thanks for the link, I’ll look into it. I suppose then it makes sense to provide partial libgfortran support, assuming someone is volunteering to make offloading work nicely for Fortran code and maintain it. FX
Re: RFC: Building a minimal libgfortran for nvptx
On 11/04/14 09:11, Jakub Jelinek wrote: On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote: On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote: The ptx port by its nature is lacking features that are expected on normal machines, such as alloca and indirect jumps. We have a subset of the C library which contains functions that can be implemented on the target (excluding things like file I/O other than printf which is a ptx builtin). It would be good to be able to also build as much of libgfortran as possible, and the following patch is what I've been using so far. It recognizes the target at configure time and restricts the list of compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef tests. There's also a new file minimal.c which contains alternative implementations of some functionality (using printf to write error messages rather than fprintf and such). Constructors are currently unimplemented on ptx and therefore the init function is commented out. Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. Also note there's a reasonable chance that the GPUs will continue to evolve and will be able to support more of the standard language features. Not sure if they'll ever do the IO side of Fortran, but they could always surprise us. jeff
Re: [patch] Provide a can_compare_and_swap_p target hook.
On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. r~
Re: RFC: Building a minimal libgfortran for nvptx
On Tue, Nov 04, 2014 at 10:20:53AM -0700, Jeff Law wrote: On 11/04/14 09:11, Jakub Jelinek wrote: On Tue, Nov 04, 2014 at 07:41:42AM -0800, Steve Kargl wrote: On Tue, Nov 04, 2014 at 03:34:43PM +0100, Bernd Schmidt wrote: The ptx port by its nature is lacking features that are expected on normal machines, such as alloca and indirect jumps. We have a subset of the C library which contains functions that can be implemented on the target (excluding things like file I/O other than printf which is a ptx builtin). It would be good to be able to also build as much of libgfortran as possible, and the following patch is what I've been using so far. It recognizes the target at configure time and restricts the list of compiled files, as well as providing a LIBGFOR_MINIMAL define for #ifdef tests. There's also a new file minimal.c which contains alternative implementations of some functionality (using printf to write error messages rather than fprintf and such). Constructors are currently unimplemented on ptx and therefore the init function is commented out. Comments on the approach, do the Fortran maintainers have a preference how this should look? The whole thing is good enough to substantially reduce the number of failures when trying to run the Fortran testsuites on nvptx (although many remain). It is unclear to me from reading the diff whether this patch cause gfortran on ptx to knowingly violate the fortran standard. If the answer is yes, this patch causes gfortran on ptx to violate the standard, then the patch is IMHO unacceptable. The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. Also note there's a reasonable chance that the GPUs will continue to evolve and will be able to support more of the standard language features. Not sure if they'll ever do the IO side of Fortran, but they could always surprise us. Thanks for the explanation. Certainly mapping Fortran's array syntax and many of the intrinsic subprograms to an accelerator may provide a nice speed improvement. It wasn't clear to me from Bernd original message that the patch was intended for OpenACC offloading. I was assuming that ptx was a new cpu architecture, which had limited capabilites. -- Steve
Go patch committed: untyped bool for logical operators
The Go language was tweaked to permit logical operators to return an untyped boolean value, which means that code like type myBool bool var b myBool = a b || c d is now permitted (previously the || operator would return the named type bool and an explicit conversion was required for this code). This patch from Chris Manghane implements this in gccgo, simply by removing the old code that prevented it from working. This requires adjusting one existing testcase. This is http://golang.org/issue/6671 . Bootstrapped and ran Go testsuite on x86_64-unknown-linux-gnu. Committed to mainline. Ian Index: gcc/go/gofrontend/expressions.cc === --- gcc/go/gofrontend/expressions.cc(revision 217049) +++ gcc/go/gofrontend/expressions.cc(working copy) @@ -5314,15 +5314,6 @@ Binary_expression::do_determine_type(con subcontext.type = NULL; } - if (this-op_ == OPERATOR_ANDAND || this-op_ == OPERATOR_OROR) -{ - // For a logical operation, the context does not determine the - // types of the operands. The operands must be some boolean - // type but if the context has a boolean type they do not - // inherit it. See http://golang.org/issue/3924. - subcontext.type = NULL; -} - // Set the context for the left hand operand. if (is_shift_op) { Index: gcc/testsuite/go.test/test/fixedbugs/issue3924.go === --- gcc/testsuite/go.test/test/fixedbugs/issue3924.go (revision 217049) +++ gcc/testsuite/go.test/test/fixedbugs/issue3924.go (working copy) @@ -1,4 +1,4 @@ -// errorcheck +// compile // Copyright 2012 The Go Authors. All rights reserved. // Use of this source code is governed by a BSD-style @@ -9,5 +9,5 @@ package foo type mybool bool var x, y = 1, 2 -var _ mybool = x y x y // ERROR cannot use -var _ mybool = x y || x y // ERROR cannot use +var _ mybool = x y x y +var _ mybool = x y || x y
Re: [patch] Provide a can_compare_and_swap_p target hook.
On 11/04/2014 12:25 PM, Richard Henderson wrote: On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. yeah, except it uses some of the optab table stuff that is static to optabs.c... so the basic functionality remains there. we could stick a wrapper somewhere else... but thats effectively what this does. we are asking a question about target behaviour... Im open to other suggestions.. Andrew
Re: [patch] Provide a can_compare_and_swap_p target hook.
On 11/04/2014 06:56 PM, Andrew MacLeod wrote: On 11/04/2014 12:25 PM, Richard Henderson wrote: On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. yeah, except it uses some of the optab table stuff that is static to optabs.c... so the basic functionality remains there. I said move the prototype. Of course the implementation remains where it is. r~
Re: RFC: Building a minimal libgfortran for nvptx
On 11/04/14 08:54, Bernd Schmidt wrote: Note that the intention is not to support Fortran (or any other language) directly targetting ptx code. The only way it's supposed to be used is as an accelerator for OpenACC offloading. Right. To reiterate for everyone, offloading is the goal of the nvptx port. Normal C, C++, Fortran code really isn't interesting, though the direction the hardware is going should allow more and more C, C++ Fortran code to be used as-is on the GPU. Bernd has done a fair amount of work to allow normal-ish C, C++ Fortran code to run, that's really been to allow running GCC's testsuite to shake out the first level bugs in the ptx support. Jeff
Re: Recent go changes broke alpha bootstrap
On Tue, Nov 4, 2014 at 12:20 PM, Uros Bizjak ubiz...@gmail.com wrote: Bitfields can really not be represented properly in Go (think about constructs like struct { int : 1; int bf : 1; }), I'd rather not try to represent them in a predictable way. The patched code may or may not give them a name, and reserves the proper amount of padding where required in structs. If a union begins with an anonymous bitfield (which makes no sense), that is ignored. If a union begins with a named bitfield (possibly after unnamed ones), this may or may not be used as the (sole) element of the Go struct. Thanks. I committed your patch. I have checked that the patch fixes alpha bootstrap with libgo. Also passes gcc.misc-tests/godump-1.c test. 2014-11-04 Uros Bizjak ubiz...@gmail.com * gcc.misc-tests/godump-1.c (dg-skip-if): Add alpha*-*-*. Tested on alphaev68-linux-gnu and committed to mainline SVN. Uros. Index: gcc.misc-tests/godump-1.c === --- gcc.misc-tests/godump-1.c (revision 217089) +++ gcc.misc-tests/godump-1.c (working copy) @@ -2,7 +2,7 @@ /* { dg-options -c -fdump-go-spec=godump-1.out } */ /* { dg-do compile } */ -/* { dg-skip-if not supported for target { ! s390*-*-* i?86-*-* x86_64-*-* } } */ +/* { dg-skip-if not supported for target { ! alpha*-*-* s390*-*-* i?86-*-* x86_64-*-* } } */ /* { dg-skip-if not supported for target { ! lp64 } } */ #include stdint.h
Re: [patch] Provide a can_compare_and_swap_p target hook.
On 11/04/2014 12:57 PM, Richard Henderson wrote: On 11/04/2014 06:56 PM, Andrew MacLeod wrote: On 11/04/2014 12:25 PM, Richard Henderson wrote: On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. yeah, except it uses some of the optab table stuff that is static to optabs.c... so the basic functionality remains there. I said move the prototype. Of course the implementation remains where it is. prototype is in optabs.h where it belongs since its defined in optabs.c. :-) I'm not sure why this is much different than something like the targhook for builtin_support_vector_misalignment(), other than we are calling the routine in optabs.c rather than putting the actual code in targhooks.c. from targhooks.c: bool default_builtin_support_vector_misalignment (machine_mode mode, const_tree type, ...) { if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing) return true; return false; } the idea is to move all the functionality that front ends need into well defined and controlled places so we can increase the separation. can perform a compare_and_swap operation is clearly a target specific question isn't it? Andrew
Re: libgomp testsuite: (not) using a specific driver for C++, Fortran?
On Nov 4, 2014, at 4:13 AM, Thomas Schwinge tho...@codesourcery.com wrote: On Wed, 15 Oct 2014 17:46:48 +0200, I wrote: No matter whether it's C, C++, or Fortran source code, the libgomp testsuite always uses (for build-tree testing) gcc/xgcc, or (for installed testing) GCC_UNDER_TEST. It doesn't make use of GXX_UNDER_TEST, GFORTRAN_UNDER_TEST. To support the latter two languages' needs, some -l[...] flags are then added via lang_link_flags. For example, for Fortran this is -lgfortran. This is, however, not what would happen if using the gfortran driver to build (which is what a user would be doing -- which we should replicate as much as possible at least for installed testing): the gfortran driver also adds -lquadmath, if applicable. Now, I wonder why to re-invent all that in the libgomp testsuite, if the respective driver already has that knowledge, via spec files, for example? (Also, the regular GCC compiler tests, gcc/testsuite/, are doing the right thing.) Why is libgomp testsuite implemented this way -- just a legacy of the past, or is there a need for that (that I'm not seeing)? Maybe the question also is: why isn't the libgomp testsuite using more of the infrastructure for specific languages, that is already implemented in gcc/testsuite/lib/? (That is, why does libgomp have to use libgomp_target_compile, instead of [language]_target_compile, for example.) (I decided not to look into that latter idea, at the moment.) And maybe that problem also applied to additional target libraries' testsuites; I have not yet looked. (It does, but I'm not addressing that with the following patches.) Anyway, here is a prototype patch to describe how I began to address this for the issue I stumbled upon, which is that the linker complained: Executing on host: x86_64-none-linux-gnu-gcc [...]/libgomp/testsuite/libgomp.fortran/aligned1.f03 [...] -fopenmp -O0 -fopenmp -fcray-pointer -lgfortran -lm -o ./aligned1.exe(timeout = 300) [...]/ld: warning: libquadmath.so.0, needed by [...]/libgfortran.so, not found (try using -rpath or -rpath-link) [...]/libgfortran.so: undefined reference to `logq@QUADMATH_1.0' [...] (That goes away if I add -lquadmath to the command line, but that's not the point I'm making here.) Am I on the right track with the following? Nobody commented, which also means nobody disagreed :-) OK to commit all that to trunk? Ok, thanks. If you could, one minor point, s/puts/verbose “boa bla“ N/ I know it was preexisting. I wonder if any variables that you set that need to be cleared out are, that seems a defect in the api or conventions we use and it can screw following tests with a wrong environment in subtle ways. I could miss accidental bleed over from the .exp you’re modifying into other unrelated parts of the test suite. Usually, we just try and be careful and clean it up if it breaks. Watch for any review points from the libgomp people, they might trickle a few in. I don’t mean to cut short any review points from them. Also, please watch for breakage.
Re: [PATCH] Optimize UBSAN_NULL checks, add sanopt.c
On Mon, Nov 03, 2014 at 04:35:00PM +0100, Jakub Jelinek wrote: Well, in theory you could just script removing them one by one and just make sanopt.o after each step to see if the header is or is not needed, perhaps with some manual tweeks. I pruned those headers some more. Perhaps in preparation for future optimizations (other UBSAN_* calls, and ASAN_CHECK and tsan builtins), you should consider putting the hash_map into some structure and pass address of that structure instead, so that you have all the pass context at the same spot. You could put asan_num_accesses in there too, see below. Done. I think it would be better to walk the vector backwards instead of forward. 1) if you have the same SSA_NAME there multiple times, ignoring the already unnecessary stmts, the only case where you'd have multiple stmts is if the earlier stmts dominate the following stmts and for some reason weren't optimized away; that for some reason currently should be just higher required alignment in the dominated stmt (e.g. say have UBSAN_NULL (foo_23, 0); UBSAN_NULL (foo_23, 2); UBSAN_NULL (foo_23, 4); where the first stmt dominates the second two and second stmt dominates the last one. 2) All the si-visited_p stmts should be always at the end of the vector IMHO, preceeded only by !si-visited_p stmts. Thus, when walking backwards, first remove the stmts with bb-visited_p, once you hit one that doesn't have it set, I believe there shouldn't be any further. And then in theory it should be fine to just compare the last stmt in the vector that was left (if any). I agree that should work - so changed. Perhaps you could return the asan_num_accesses value computed during sanopt_optimize (just count IFN_ASAN_CHECK calls that you don't optimize away), and do this only in else if (i.e. when sanopt_optimize has not been run). That way, you'd save one extra IL walk. Done. Bootstrap-ubsan/regtest passed on x86_64-linux, ok for trunk? 2014-11-04 Marek Polacek pola...@redhat.com * Makefile.in (OBJS): Add sanopt.o. (GTFILES): Add sanopt.c. * asan.h (asan_expand_check_ifn): Declare. * asan.c (asan_expand_check_ifn): No longer static. (class pass_sanopt, pass_sanopt::execute, make_pass_sanopt): Move... * sanopt.c: ...here. New file. testsuite/ * c-c++-common/ubsan/align-2.c: Remove dg-output. * c-c++-common/ubsan/align-4.c: Likewise. * g++.dg/ubsan/null-1.C: Likewise. * g++.dg/ubsan/null-2.C: Likewise. diff --git gcc/Makefile.in gcc/Makefile.in index 9c67fe2..f383032 100644 --- gcc/Makefile.in +++ gcc/Makefile.in @@ -1376,6 +1376,7 @@ OBJS = \ asan.o \ tsan.o \ ubsan.o \ + sanopt.o \ tree-call-cdce.o \ tree-cfg.o \ tree-cfgcleanup.o \ @@ -2305,6 +2306,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/asan.c \ $(srcdir)/ubsan.c \ $(srcdir)/tsan.c \ + $(srcdir)/sanopt.c \ $(srcdir)/ipa-devirt.c \ $(srcdir)/internal-fn.h \ @all_gtfiles@ diff --git gcc/asan.c gcc/asan.c index 8f146d2..79dede7 100644 --- gcc/asan.c +++ gcc/asan.c @@ -2497,7 +2497,7 @@ asan_finish_file (void) /* Expand the ASAN_{LOAD,STORE} builtins. */ -static bool +bool asan_expand_check_ifn (gimple_stmt_iterator *iter, bool use_calls) { gimple g = gsi_stmt (*iter); @@ -2800,114 +2800,4 @@ make_pass_asan_O0 (gcc::context *ctxt) return new pass_asan_O0 (ctxt); } -/* Perform optimization of sanitize functions. */ - -namespace { - -const pass_data pass_data_sanopt = -{ - GIMPLE_PASS, /* type */ - sanopt, /* name */ - OPTGROUP_NONE, /* optinfo_flags */ - TV_NONE, /* tv_id */ - ( PROP_ssa | PROP_cfg | PROP_gimple_leh ), /* properties_required */ - 0, /* properties_provided */ - 0, /* properties_destroyed */ - 0, /* todo_flags_start */ - TODO_update_ssa, /* todo_flags_finish */ -}; - -class pass_sanopt : public gimple_opt_pass -{ -public: - pass_sanopt (gcc::context *ctxt) -: gimple_opt_pass (pass_data_sanopt, ctxt) - {} - - /* opt_pass methods: */ - virtual bool gate (function *) { return flag_sanitize; } - virtual unsigned int execute (function *); - -}; // class pass_sanopt - -unsigned int -pass_sanopt::execute (function *fun) -{ - basic_block bb; - - int asan_num_accesses = 0; - if (flag_sanitize SANITIZE_ADDRESS) -{ - gimple_stmt_iterator gsi; - FOR_EACH_BB_FN (bb, fun) - for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (gsi)) - { - gimple stmt = gsi_stmt (gsi); - if (is_gimple_call (stmt) gimple_call_internal_p (stmt) -gimple_call_internal_fn (stmt) == IFN_ASAN_CHECK) - ++asan_num_accesses; - } -} - - bool use_calls = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD INT_MAX - asan_num_accesses = ASAN_INSTRUMENTATION_WITH_CALL_THRESHOLD; - - FOR_EACH_BB_FN (bb, fun) -{ -
Re: RFC: Building a minimal libgfortran for nvptx
On Nov 4 2014, Jeff Law wrote: On 11/04/14 09:11, Jakub Jelinek wrote: The point is, if the target can implement just a subset of the Fortran (or C or C++) standards, then ideally if you use anything that is not supported would just cause always host fallback, the code will still work, but will not be offloaded. So even supporting a subset of the standard is worthwhile, usually one will just offload the most performance critical parts of his code. As I see it, this isn't a free-standing compilation environment, but a component of one for heterogeneous architectures. There are similar issues for some embedded systems, in several languages. That doesn't fit well with the current build model, unfortunately :-( Also note there's a reasonable chance that the GPUs will continue to evolve and will be able to support more of the standard language features. Not sure if they'll ever do the IO side of Fortran, but they could always surprise us. I am almost certain that the current situation is going to change significantly, probably by 2020, but there is as yet no indication of how. And it wouldn't be entirely reasonable to say nothing should be done for this sort of use until the situation becomes clear. Regards, Nick Maclaren.
[PATCH v2, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses
On Tue, Nov 4, 2014 at 10:46 AM, Uros Bizjak ubiz...@gmail.com wrote: Following patch fixes PR 63538, where the data in the large data section was accessed through 32bit address. The patch unifies places where large data sections are determined and passes all declarations to ix86_in_large_data_p only. The patch fixes the testcase form the PR. Also, the code from several tests involving various -mlarge-data-threshold= settings looks consistent now. [...] We probably want to avoid putting automatic variables (and STRING_CSTs ?) to large data section. However, it is important that we use the same condition in ix86_encode_section_info and x86_64_elf_select_section (and likes), otherwise the data can go into large section, while the access to the data will be via limited 32bit pointers. The attached alternative patch rejects automatic variables and STRING_CSTs from large data sections. Please note, that it still access the data with the correct instructions. Considering that STRING_CSTs always have TREE_STATIC set, the patch should allow them, similar to the patch at [1]. Attached is the final patch. 2014-11-04 Uros Bizjak ubiz...@gmail.com PR target/63538 * config/i386/i386.c (in_large_data_p): Reject automatic variables. (ix86_encode_section_info): Do not check for non-automatic varibles when setting SYMBOL_FLAG_FAR_ADDR flag. (x86_64_elf_select_section): Do not check ix86_cmodel here. (x86_64_elf_unique_section): Ditto. (x86_elf_aligned_common): Emit tab before .largecomm. testsuite/ChangeLog: 2014-11-03 Uros Bizjak ubiz...@gmail.com PR target/63538 * gcc.target/i386/pr63538.c: New test. I plan to commit the patch in a couple of days to mainline and 4.9 branch. [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01963.html Uros. Index: gcc/config/i386/i386.c === --- gcc/config/i386/i386.c (revision 217089) +++ gcc/config/i386/i386.c (working copy) @@ -5066,6 +5066,10 @@ ix86_in_large_data_p (tree exp) if (TREE_CODE (exp) == FUNCTION_DECL) return false; + /* Automatic variables are never large data. */ + if (TREE_CODE (exp) == VAR_DECL !is_global_var (exp)) +return false; + if (TREE_CODE (exp) == VAR_DECL DECL_SECTION_NAME (exp)) { const char *section = DECL_SECTION_NAME (exp); @@ -5099,8 +5103,7 @@ ATTRIBUTE_UNUSED static section * x86_64_elf_select_section (tree decl, int reloc, unsigned HOST_WIDE_INT align) { - if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) { const char *sname = NULL; unsigned int flags = SECTION_WRITE; @@ -5186,8 +5189,7 @@ x86_64_elf_section_type_flags (tree decl, const ch static void ATTRIBUTE_UNUSED x86_64_elf_unique_section (tree decl, int reloc) { - if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) { const char *prefix = NULL; /* We only need to use .gnu.linkonce if we don't have COMDAT groups. */ @@ -5256,7 +5258,7 @@ x86_elf_aligned_common (FILE *file, { if ((ix86_cmodel == CM_MEDIUM || ix86_cmodel == CM_MEDIUM_PIC) size (unsigned int)ix86_section_threshold) -fputs (.largecomm\t, file); +fputs (\t.largecomm\t, file); else fputs (COMMON_ASM_OP, file); assemble_name (file, name); @@ -44230,9 +44232,7 @@ ix86_encode_section_info (tree decl, rtx rtl, int { default_encode_section_info (decl, rtl, first); - if (TREE_CODE (decl) == VAR_DECL - (TREE_STATIC (decl) || DECL_EXTERNAL (decl)) - ix86_in_large_data_p (decl)) + if (ix86_in_large_data_p (decl)) SYMBOL_REF_FLAGS (XEXP (rtl, 0)) |= SYMBOL_FLAG_FAR_ADDR; } Index: gcc/testsuite/gcc.target/i386/pr63538.c === --- gcc/testsuite/gcc.target/i386/pr63538.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr63538.c (working copy) @@ -0,0 +1,13 @@ +/* PR target/63538 */ +/* { dg-do compile } */ +/* { dg-require-effective-target lp64 } */ +/* { dg-options -O2 -mcmodel=medium -mlarge-data-threshold=0 } */ + +static char *str = Hello World; + +char *foo () +{ + return str; +} + +/* { dg-final { scan-assembler movabs } } */
Re: [PATCH v2, i386]: Fix PR 63538, With -mcmodel=medium .lrodata accesses do not use 64-bit addresses
On Tue, Nov 4, 2014 at 10:47 AM, Uros Bizjak ubiz...@gmail.com wrote: On Tue, Nov 4, 2014 at 10:46 AM, Uros Bizjak ubiz...@gmail.com wrote: Following patch fixes PR 63538, where the data in the large data section was accessed through 32bit address. The patch unifies places where large data sections are determined and passes all declarations to ix86_in_large_data_p only. The patch fixes the testcase form the PR. Also, the code from several tests involving various -mlarge-data-threshold= settings looks consistent now. [...] We probably want to avoid putting automatic variables (and STRING_CSTs ?) to large data section. However, it is important that we use the same condition in ix86_encode_section_info and x86_64_elf_select_section (and likes), otherwise the data can go into large section, while the access to the data will be via limited 32bit pointers. The attached alternative patch rejects automatic variables and STRING_CSTs from large data sections. Please note, that it still access the data with the correct instructions. Considering that STRING_CSTs always have TREE_STATIC set, the patch should allow them, similar to the patch at [1]. Yes, it would be nice to be able to have STRING_CSTs in .lrodata. Thanks for the patch. Sri Attached is the final patch. 2014-11-04 Uros Bizjak ubiz...@gmail.com PR target/63538 * config/i386/i386.c (in_large_data_p): Reject automatic variables. (ix86_encode_section_info): Do not check for non-automatic varibles when setting SYMBOL_FLAG_FAR_ADDR flag. (x86_64_elf_select_section): Do not check ix86_cmodel here. (x86_64_elf_unique_section): Ditto. (x86_elf_aligned_common): Emit tab before .largecomm. testsuite/ChangeLog: 2014-11-03 Uros Bizjak ubiz...@gmail.com PR target/63538 * gcc.target/i386/pr63538.c: New test. I plan to commit the patch in a couple of days to mainline and 4.9 branch. [1] https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01963.html Uros.
Re: nvptx offloading patches [3/n], i386 bits RFD
On Tue, Nov 4, 2014 at 1:35 PM, Bernd Schmidt ber...@codesourcery.com wrote: Not sure how to deal with this any further out than the immediate term than using a hack like this. Though I'd prefer to avoid the #ifdef as it seems to me this shouldn't be baked in at build/configure time. Yeah, I'm not expecting the i386 part to go in quite as-is. For reference I'm including the offload-abi patch - Ilya is submitting this along with other option changes. One possibility would be to print and recognize strings such as lp64D128 or lp64D96 which would include information about the size of long double. Somehow though I can't really bring myself to believe that -mlong-double128 is a real use case with offloading so we might just disallow the combination. CCing Uros in case he has an opinion. -mlong-double-128 was introduced for Android in: 2014-02-03 H.J. Lu hongjiu...@intel.com * config/i386/i386.c (flag_opts): Add -mlong-double-128. (ix86_option_override_internal): Default long double to 64-bit for 32-bit Bionic and to 128-bit for 64-bit Bionic. [...] IMO, if it troubles offloading, anything else than the default -mlong-double-80 should be disallowed. Uros.
Re: [PATCH] Optimize UBSAN_NULL checks, add sanopt.c
On Tue, Nov 04, 2014 at 07:36:26PM +0100, Marek Polacek wrote: + FOR_EACH_VEC_ELT_REVERSE (v, i, g) + { + /* Remove statements for BBs that have been + already processed. */ + sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux; + if (si-visited_p) + { + v.unordered_remove (i); + continue; + } + /* At this point we shouldn't have any statements + that aren't dominating the current BB. */ + tree align = gimple_call_arg (g, 2); + remove = tree_int_cst_le (cur_align, align); + break; + } As you only remove last item, that is pop. So, how about while (!v.is_empty ()) { gimple g = v.last (); /* Remove statements for BBs that have been already processed. */ sanopt_info *si = (sanopt_info *) gimple_bb (g)-aux; if (si-visited_p) v.pop (); else { /* At this point we shouldn't have any statements that aren't dominating the current BB. */ tree align = gimple_call_arg (g, 2); remove = tree_int_cst_le (cur_align, align); break; } } ? The patch is ok either way. Jakub
Re: [patch] Provide a can_compare_and_swap_p target hook.
On November 4, 2014 7:30:18 PM CET, Andrew MacLeod amacl...@redhat.com wrote: On 11/04/2014 12:57 PM, Richard Henderson wrote: On 11/04/2014 06:56 PM, Andrew MacLeod wrote: On 11/04/2014 12:25 PM, Richard Henderson wrote: On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. yeah, except it uses some of the optab table stuff that is static to optabs.c... so the basic functionality remains there. I said move the prototype. Of course the implementation remains where it is. prototype is in optabs.h where it belongs since its defined in optabs.c. :-) I'm not sure why this is much different than something like the targhook for builtin_support_vector_misalignment(), other than we are calling the routine in optabs.c rather than putting the actual code in targhooks.c. from targhooks.c: bool default_builtin_support_vector_misalignment (machine_mode mode, const_tree type, ...) { if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing) return true; return false; } the idea is to move all the functionality that front ends need into well defined and controlled places so we can increase the separation. can perform a compare_and_swap operation is clearly a target specific question isn't it? I would rather question what is so special about java that it needs to ask that and other frontends not. Don't we have generic atomics support now? Richard. Andrew
[committed] Remove unused variables
I forgot to remove two unused variables. Applying to trunk as obvious. 2014-11-04 Marek Polacek pola...@redhat.com * sanopt.c (sanopt_optimize_walker): Remove unused variables. diff --git gcc/sanopt.c gcc/sanopt.c index f1d51d1..4483ff7 100644 --- gcc/sanopt.c +++ gcc/sanopt.c @@ -117,9 +117,6 @@ sanopt_optimize_walker (basic_block bb, struct sanopt_ctx *ctx) for this pointer. Perhaps we can drop this one. But only if this check doesn't specify stricter alignment. */ - int i; - gimple g; - while (!v.is_empty ()) { gimple g = v.last (); Marek
Re: [patch] Provide a can_compare_and_swap_p target hook.
On 11/04/2014 02:53 PM, Richard Biener wrote: On November 4, 2014 7:30:18 PM CET, Andrew MacLeod amacl...@redhat.com wrote: On 11/04/2014 12:57 PM, Richard Henderson wrote: On 11/04/2014 06:56 PM, Andrew MacLeod wrote: On 11/04/2014 12:25 PM, Richard Henderson wrote: On 11/04/2014 05:28 PM, Andrew MacLeod wrote: + bool + default_can_compare_and_swap_p (machine_mode mode, bool allow_libcall) + { + return can_compare_and_swap_p (mode, allow_libcall); + } This is silly. I think the problem you point out can be better fixed by moving the can_compare_and_swap_p prototype elsewhere. yeah, except it uses some of the optab table stuff that is static to optabs.c... so the basic functionality remains there. I said move the prototype. Of course the implementation remains where it is. prototype is in optabs.h where it belongs since its defined in optabs.c. :-) I'm not sure why this is much different than something like the targhook for builtin_support_vector_misalignment(), other than we are calling the routine in optabs.c rather than putting the actual code in targhooks.c. from targhooks.c: bool default_builtin_support_vector_misalignment (machine_mode mode, const_tree type, ...) { if (optab_handler (movmisalign_optab, mode) != CODE_FOR_nothing) return true; return false; } the idea is to move all the functionality that front ends need into well defined and controlled places so we can increase the separation. can perform a compare_and_swap operation is clearly a target specific question isn't it? I would rather question what is so special about java that it needs to ask that and other frontends not. Don't we have generic atomics support now? Richard. True... I don't know if this is a thing that simply predates our current level of support or if it is something else that is java specific for its builtins. Don't know enough about java to comment. aph? Looks like you wrote the originals in 2006... Can the java CAS builtins simply use our current atomic calls rather than doing their own thing and querying whether the target has a sync compare and swap operation? Andrew
Re: [x86, 2/n] Replace builtins with vector extensions
Ping? https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01808.html On Sat, 18 Oct 2014, Marc Glisse wrote: Hello, this time, +-* for 128 bit integer vectors. I am using an unsigned type so the compiler knows that we expect wrapping. I don't know why Intel's description of mullo insists that the multiplication is signed, that only matters for the high part... Next parts (waiting for approval for this one) should be: - same thing with 256 and 512 bit integer vectors - | ^ (integer only) Maybe (or it can wait until the next release): - == abs min max (integer only) 2014-10-20 Marc Glisse marc.gli...@inria.fr * config/i386/emmintrin.h (__v2du, __v4su, __v8hu, __v16qu): New typedefs. (_mm_add_epi8, _mm_add_epi16, _mm_add_epi32, _mm_add_epi64, _mm_sub_epi8, _mm_sub_epi16, _mm_sub_epi32, _mm_sub_epi64, _mm_mullo_epi16): Use vector extensions instead of builtins. * config/i386/smmintrin.h (_mm_mullo_epi32): Likewise. -- Marc Glisse
Re: nvptx offloading patches [1/n]
On 11/03/14 16:07, Bernd Schmidt wrote: On 11/03/2014 11:22 PM, Jeff Law wrote: On 11/01/14 05:47, Bernd Schmidt wrote: This is one of the patches required to make offloading via the LTO path work when the machines involved differ. x86 requires bigger alignments for some types than nvptx does, which becomes an issue when reading LTO produced by the host compiler. The problem with having a variable with DECL_ALIGN larger than the stack alignment is that gcc will try to align the variable dynamically with an alloca/rounding operation, and there isn't a working alloca on nvptx. Besides, the overhead would be pointless. The patch below restricts the alignments to the maximum possible when reading in LTO data in an offload compiler. Unfortunately BIGGEST_ALIGNMENT isn't suitable for this, as it can vary at runtime with attribute((target)), and because vector modes can exceed it, so a limit based on BIGGEST_ALIGNMENT would be unsuitable for some ports. Instead I've added a hook called limit_offload_alignment which is called when reading LTO on an offload compiler. It does nothing anywhere except on ptx where it limits alignments to 64 bit. Bootstrapped and tested on x86_64-linux. Ok? Not ideal. Doesn't this affect our ability to pass data back and forth between the host and GPU? Or is this strictly a problem with stack objects and thus lives entirely on the GPU? Communication between host and GPU is all done via some form of memcpy, so I wouldn't expect this to be a problem. They still need to agree on the layout of the structure. And assuming it'll always be memcpy perhaps isn't wise. Consider the possibility that one day (perhaps soon) the host and GPU may share address space memory. Structure layouts and such are decided by the host compiler and since that uses higher alignments, they should work fine on the GPU. I believe the only thing this really does is relax the requirements when allocating storage on the GPU side. But if the structure has a higher alignment on the host and that structure is embedded in another structure or array, then that higher alignment affects the composite object's layout. But again, if this just affects stack objects, then it shouldn't be a problem. Jeff
Re: [PATCH 12/27] New file: gcc/jit/jit-recording.h
On 11/04/14 09:12, David Malcolm wrote: So give the complexities in interfacing with the guts of GCC, would it make sense to expose the validate method? Most of the error-checking in the API happens in the API calls in libgccjit.c, testing that the individual pieces are sane. The validate method tests the things that can only be verified as a whole, when the context is about to be compiled: are there unreachable blocks? is every block terminated? etc I can't quite see why client code might want to perform the latter kind of validation without actually doing a compile, so I don't plan to expose this at this time. It's trivial to do so if someone needs it. Yea, I saw all the border checking -- I was thinking mostly about things that require larger context. In particular I was thinking block contents, the cfg and such. Thinking about how I tend to work and might use this, I'd be likely to start building up statements/blocks and want to verify them without going all the way through compilation. But that may be an artifact of living in a world where we have many points (between each pass) where a verification step for key data structures is useful. I certainly don't see it as a blocking issue, just wanted to raise the possibility that exposing the verification step in the ABI might be useful. If you don't want to do that right now, I can live with it. I think I was trying to avoid std::string for some reason, but I'm not quite sure why, perhaps out of a misremembered idea that libstdc++ was verboten (I currently use std:: in one place, in jit-playback.h, where a playback::context has a: vecstd::pairtree, location * m_cached_locations; ). I can't see any reason why we wouldn't use basic capabilities of the C++ runtime system. To use an example we both know and understand, switching EXPR_LIST to a forward_list would be something I would look favorably upon simply because everyone doing C++ knows what a forward_list is, it's proprties, strenghts weaknesses. Only GCC junkies happen to know that EXPR_LIST is just a hand-rolled forward list :-) In any case recording::string is an implementation detail hidden within the library. It is a recording::memento and hence has the same lifetime as the recording::context. I can't think of a reason off the top of my head why a std::string wouldn't work instead, but the existing code works, and has been through a fair amount of debugging. How about as a follow-up? I don't see this as being big enough to warrant blocking the work. jeff
Re: [PATCH 13/27] New file: gcc/jit/jit-recording.c
On 11/04/14 09:24, David Malcolm wrote: So presumably no multi-way branches yet? Might be better to build the API in such a way that it handles multi-way branches from the get-go rather than retrofitting later. Your call. This is an internal API. If the external API needs to support building switch statements from client code maybe with something like this: extern void gcc_jit_block_end_with_switch (gcc_jit_block *block, gcc_jit_location *loc, gcc_jit_rvalue *expr, int num_cases, gcc_jit_rvalue **case_values, gcc_jit_block **case_blocks, gcc_jit_block *default_block); then this internal API would obviously need to change, maybe becoming: virtual int get_num_successor_blocks () const; virtual block *get_successor_block (int index) const; but that won't affect the public API. That said it's not clear to me that we do need to support switch statements; no JIT implementation I've seen has needed them (I put them in the Probably not needed section of TODO.rst). Fair enough :-) jeff
Re: [PATCH 10/27] New file: gcc/jit/libgccjit.c
On 11/04/14 09:57, David Malcolm wrote: +#define IS_ASCII_DIGIT(CHAR) \ + ((CHAR) = '0' (CHAR) ='9') + +#define IS_ASCII_ALNUM(CHAR) \ + (IS_ASCII_ALPHA (CHAR) || IS_ASCII_DIGIT (CHAR)) Can't we rely on the C library to give us equivalents? I've been burned in the past by the C library using locales, in particular the two lowercase i variants in Turkish. These macros are used by gcc_jit_context_new_function to enforce C's naming restrictions, to avoid errors from the assembler. The comment I put there was: /* The assembler can only handle certain names, so for now, enforce C's rules for identifiers upon the name. Eventually we'll need some way to interact with e.g. C++ name mangling. */ Am I right in thinking that for the assembler we need to enforce the C naming rules specifically on *ASCII*. (clearly another comment is needed here). I guess you've got to do it somewhere. Presumably there isn't something already in GCC that enforces an input character set? I guess I just dislike seeing something that feels like it ought to already be available. Presumably by marking it with __attribute__((cold)) ? (with a suitable macro in case we're not being compiled with a gcc that supports it). Yup. That's precisely what you want since that gives the predictors enough information to mark paths as unlikely without having to mark each path yourself. Sorry. I'll post a followup with comments added. Thanks. I probably rely more on those for this kind of review than anything, so the lack of them really stood out. Many of the functions are public API entrypoints, where there's a comment in the public header. Should I simply duplicate the comment from there into the .c file, or put a comment like: Good question. Normally in the past we'd have you duplicate the comment, but with this new usage scenario that may not make a lot of sense since one or the other will likely get out of sync at some point. At this point a snarky comment about generating documentation and the interface from a single definition would be appropriate. /* Public entrypoint. See description in libgccjit.h. */ for each of these? (perhaps with additional text giving implementation notes?) Let's go with this. If folks want the comment duplicated, they can argue for it after the fact :-) Thanks for all the reviews. Looks like this and patch 16 are now the only non-approved parts of the kit (I didn't see a review of 16). Right. I didn't get to #16 yesterday. jeff
Re: nvptx offloading patches [3/n], i386 bits RFD
On 11/04/14 05:35, Bernd Schmidt wrote: Ports that want to be hosts for offloading may need to modify their modes.def. The patch below contains changes to i386-modes.def which modifies XFmode depending on a target switch. I'm not actually entirely sure what to do about this. Do we want to make this flag an error when offloading is enabled? Or maybe add float format support to the -foffload-abi option? Thoughts? Ok for the first part of the patch once the other offloading patches have gone in (bootstrapped and tested on x86_64-linux)? It feels like we've got another real distinction to make. We've had host, build target and they're all independent. It feels like we need offload target and better separate between target and offload target. Then we need to figure out the places where we've got bleed-out. Is this a question of terminology? I agree that saying offload host when we'd normally be calling it the target is confusing, but it's difficult to come up with better names. No, I don't think it's terminology. It's really that in effect we have two targets. One is a normal CPU, the other is a GPU. ie, there's nothing that says we won't have a GPU that's being driven by an ARM or PPC. What I want to avoid is GPU-isms getting sprinkled into the x86 (or any other) backend. The problem is we don't have any infrastructure in place for this kind of situation. So we start off with a few hacks and hopefully we're able to see some commonality and start to see how to handle the multi-architecture target issues a bit better. Jeff
Re: [PATCH] x86: allow to suppress default clobbers added to asm()s
On 11/03/14 01:11, Jan Beulich wrote: I really don't like having an option that's globally applied for this feature. THough I am OK with having a mechanism to avoid implicit clobbers on specific ASMs. Why not? That way, for projects/components knowing all their asm()s have all clobbers explicitly specified they could avoid having to touch all of them and instead just pass the new option. That said - if the option isn't being liked, I'm fine dropping it. In general, I doubt projects are going to achieve and maintain that level of knowledge about their asms. It's just asking for subtle bugs in the long run. Why use negative numbers for the hard register numbers? I wouldn't be at all surprised if lots of random code assumes register numbers are always positive. I think I want through all relevant places, and fixed the one where the assumption was wrongly made. The nice thing about using negative numbers here is that there is already at least one place where negative numbers aren't valid, and we want them to not be valid there. I don't like adding new registers with special names like !foo. Instead I think that listing !cc or something similar in the asm itself if it doesn't clobber the cc register would be better. You mean interpreting the ! in generic code? Doable, but not very nice imo considering that the default addition of clobbers is limited to very few architectures, plus dealing with this in a generic way would - afaict - make the change quite a bit more intrusive. yes, precisely. This is a generic concept and while more invasive, I think what we get in the end is going to ultimately be better. I also suspect other targets ought to be defining those implicit clobbers, but don't :( jeff