[PATCH] Fix PR54733 Optimize endian independent load/store
Hi everybody, *** Motivation *** Currently gcc is capable of replacing hand-crafted implementation of byteswap by a suitable instruction thanks to the bswap optimization pass. The patch proposed here aims at extending this pass to also optimize load in a specific endianness, independent of the host endianness. *** Methodology *** The patch adds support for dealing with a memory source (array or structure) and detect whether the result of a bitwise operation happens to be equivalent to a big endian or little endian load and replace it by a load or a load and a byteswap according to the host endianness. The original code used the concept of symbolic number: a number where the value of each byte indicates its position (in terms of weight) before the bitwise manipulation. After performing the bit manipulation on that symbolic number, the result tells how the byte were shuffled (see variable cmp in function find_bswap). Detecting an operation resulting in a number in the host endianness is thus pretty straightforward: look if the symbolic number has *not* changed. As to supporting read from array and structure, there is some logic to recognize the base of the array/structure and the offset of entries/fields accessed to check if the range of memory accessed would fit in an integer. Each entries is initially treated independently and when they are ORed together the values in the symbolic number are updated according to the host endianness: the entry of higher address would see its values incremented on a little endian machine. Note that as it stands the patch does not work for arrays indexed with variable (such a tab[a] || (tab[a+1] 8)) because fold_const does not fold (a + 1) - a. If such cases were folded, the number of cases detected would automatically be increased due to the use of fold_build2 to compare two offsets. This patch also adds a few testcases to check both (i) that the optimization works as expected and (ii) that the result are correct. It also define new effective targets (bswap16, bswap32 and bswap64) to centralize the information about what target supports byte swap instructions for the testsuite and modify existing tests to use these new effective targets. The patch is quite big but could be split if necessary. A big part of the code added is for handling memory source and it would be difficult to split it but variable renaming and introduction of bwapXX effective target could be made separately to reduce the noise. The patch is too big so is only in attachment of this email. The ChangeLog are as follows: *** gcc/ChangeLog *** 2014-03-19 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * tree-ssa-math-opts.c (find_bswap_1): Renamed to ... (find_bswap_or_nop_1): This. Also add support for memory source. (find_bswap): Renamed to ... (find_bswap_or_nop): This. Also add support for memory source and detection of noop bitwise operations. (execute_optimize_bswap): Likewise. *** gcc/testsuite/ChangeLog *** 2014-03-19 Thomas Preud'homme thomas.preudho...@arm.com PR tree-optimization/54733 * lib/target-supports.exp: New effective targets for architectures capable of performing byte swap. * gcc.dg/optimize-bswapdi-1.c: Convert to new bswap target. * gcc.dg/optimize-bswapdi-2.c: Likewise. * gcc.dg/optimize-bswapsi-1.c: Likewise. * gcc.dg/optimize-bswapdi-3.c: New test to check extension of bswap optimization to support memory sources. * gcc.dg/optimize-bswaphi-1.c: Likewise. * gcc.dg/optimize-bswapsi-2.c: Likewise. * gcc.c-torture/execute/bswap-2.c: Likewise. Is this ok for stage 1? Best regards, Thomas gcc32rm-84.3.diff Description: Binary data
[committed] Fix lto build if WCONTINUED is not defined (PR lto/60571)
Hi! WCONTINUED is (recent) Linux specific, so it doesn't have to be defined on other hosts, or could be missing even on older Linux distros (e.g. glibc 2.3.2 doesn't have it). Fixed thusly, committed as obvious. 2014-03-19 Jakub Jelinek ja...@redhat.com PR lto/60571 * lto.c (wait_for_child): Define WCONTINUED if not defined to 0. Fix formatting. --- gcc/lto/lto.c.jj2014-03-03 08:24:32.0 +0100 +++ gcc/lto/lto.c 2014-03-19 08:12:39.235144361 +0100 @@ -2476,7 +2476,10 @@ wait_for_child () int status; do { - int w = waitpid(0, status, WUNTRACED | WCONTINUED); +#ifndef WCONTINUED +#define WCONTINUED 0 +#endif + int w = waitpid (0, status, WUNTRACED | WCONTINUED); if (w == -1) fatal_error (waitpid failed); @@ -2485,7 +2488,7 @@ wait_for_child () else if (WIFSIGNALED (status)) fatal_error (streaming subprocess was killed by signal); } - while (!WIFEXITED(status) !WIFSIGNALED(status)); + while (!WIFEXITED (status) !WIFSIGNALED (status)); } #endif Jakub
[PATCH, ARM] Fix ICE due to out of bound.
Hi, ICE when compiling gcc.target/arm/neon-modes-3.c with -g in arm_dwarf_register_span since parts[8] is out of bound for XImode. GET_MODE_SIZE (XImode) / 4 is 16. rtx parts[8] can not hold all the registers. According to arm-modes.def, 16 should be the biggest number. So the patch updates parts to rtx parts[16]; Bootstrap and no make check regression on ARM Chrome book. OK for trunk? Thanks! -Zhenqiang ChangeLog: 2014-03-19 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/arm.c (arm_dwarf_register_span): Update the element number of parts. testsuite/ChangeLog: 2014-03-19 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.target/arm/neon-modes-3.c: Add -g option. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..c4466c1 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28692,7 +28692,7 @@ arm_dwarf_register_span (rtx rtl) { enum machine_mode mode; unsigned regno; - rtx parts[8]; + rtx parts[16]; int nregs; int i; diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-3.c b/gcc/testsuite/gcc.target/arm/neon-modes-3.c index fe81875..f3e4f33 100644 --- a/gcc/testsuite/gcc.target/arm/neon-modes-3.c +++ b/gcc/testsuite/gcc.target/arm/neon-modes-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_neon_ok } */ -/* { dg-options -O } */ +/* { dg-options -O -g } */ /* { dg-add-options arm_neon } */ #include arm_neon.h
[PATCH] Fix PR59543
This fixes PR59543 (confirmed by Jakub for the testcase at least) by not dropping debug stmts during WPA phase. LTO profiled-bootstrapped on x86_64-unknown-linux-gnu, applied. Honza - you can always come up with a better fix for 4.10. Richard. 2014-03-19 Richard Biener rguent...@suse.de PR lto/59543 * lto-streamer-in.c (input_function): In WPA stage do not drop debug stmts. Index: lto-streamer-in.c === --- lto-streamer-in.c (revision 208642) +++ lto-streamer-in.c (working copy) @@ -988,7 +988,7 @@ input_function (tree fn_decl, struct dat We can't remove them earlier because this would cause uid mismatches in fixups, but we can do it at this point, as long as debug stmts don't require fixups. */ - if (!MAY_HAVE_DEBUG_STMTS is_gimple_debug (stmt)) + if (!MAY_HAVE_DEBUG_STMTS !flag_wpa is_gimple_debug (stmt)) { gimple_stmt_iterator gsi = bsi; gsi_next (bsi);
Re: [PATCH, ARM] Fix ICE due to out of bound.
On Wed, 19 Mar 2014, Ramana Radhakrishnan wrote: On 03/19/14 08:42, Zhenqiang Chen wrote: Hi, ICE when compiling gcc.target/arm/neon-modes-3.c with -g in arm_dwarf_register_span since parts[8] is out of bound for XImode. GET_MODE_SIZE (XImode) / 4 is 16. rtx parts[8] can not hold all the registers. According to arm-modes.def, 16 should be the biggest number. So the patch updates parts to rtx parts[16]; Bootstrap and no make check regression on ARM Chrome book. OK for trunk? It may be time in 4.10 or 5.0 (whatever we call it :)), to deal with the FIXME in arm_dwarf_register_span to deal with DW_OP_piece. I'm surprised that it's taken so long to hit this. This is OK for stage4 - it looks sane to me but this needs an RM ack before applying. Ok (it can't possibly break anything). Richard. regards Ramana Thanks! -Zhenqiang ChangeLog: 2014-03-19 Zhenqiang Chen zhenqiang.c...@linaro.org * config/arm/arm.c (arm_dwarf_register_span): Update the element number of parts. testsuite/ChangeLog: 2014-03-19 Zhenqiang Chen zhenqiang.c...@linaro.org * gcc.target/arm/neon-modes-3.c: Add -g option. diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..c4466c1 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -28692,7 +28692,7 @@ arm_dwarf_register_span (rtx rtl) { enum machine_mode mode; unsigned regno; - rtx parts[8]; + rtx parts[16]; int nregs; int i; diff --git a/gcc/testsuite/gcc.target/arm/neon-modes-3.c b/gcc/testsuite/gcc.target/arm/neon-modes-3.c index fe81875..f3e4f33 100644 --- a/gcc/testsuite/gcc.target/arm/neon-modes-3.c +++ b/gcc/testsuite/gcc.target/arm/neon-modes-3.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_neon_ok } */ -/* { dg-options -O } */ +/* { dg-options -O -g } */ /* { dg-add-options arm_neon } */ #include arm_neon.h -- Richard Biener rguent...@suse.de SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer
[ARM/AArch64][0/3] Handle bitwise/bytewise reverse operations more effectively
Hi all, This patch series attempts to improve code generation on arm and aarch64 for various bitwise operations that can be expressed with rev16 instructions in those architectures. In particular expressions of the form: ((x 0x00ff00ff) 8) | ((x 0xff00ff00) 8) This can appear in places like the Linux kernel and can be directly mapped to a single rev16 instruction. This series has 3 parts: [1/3] Add a new field to the rtx costs tables to represent the latency of the rev* group of instructions that will be used to accurately model the cost of these operations. Use it to properly cost existing patterns that generate rev16 (for bswap operations). [2/3] Add aarch64 combine patterns to recognise the above bitwise operations and map them to rev16. Model the cost appropriately and add helper functions that can be reused by the arm backend. [3/3] Define similar combine patterns for arm and reuse the helper functions introduced in patch 2/3 to properly cost them. I'm proposing these for next stage-1 of course. Thanks, Kyrill
[PATCH][ARM][1/3] Add rev field to rtx cost tables
Hi all, In order to properly cost the rev16 instruction we need a new field in the cost tables. This patch adds that and specifies its value for the existing cost tables. Since rev16 is used to implement the BSWAP operation we add handling of that in the rtx cost function using the new field. Tested on arm-none-eabi and bootstrapped on an arm linux target. Does it look ok for stage1? Thanks, Kyrill 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/aarch-common-protos.h (alu_cost_table): Add rev field. * config/arm/aarch-cost-tables.h (generic_extra_costs): Specify rev cost. (cortex_a53_extra_costs): Likewise. (cortex_a57_extra_costs): Likewise. * config/arm/arm.c (cortexa9_extra_costs): Likewise. (cortexa7_extra_costs): Likewise. (cortexa12_extra_costs): Likewise. (cortexa15_extra_costs): Likewise. (v7m_extra_costs): Likewise. (arm_new_rtx_costs): Handle BSWAP.commit 13b2976a9448565beabc41055fdcbd209cde949f Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Wed Feb 26 15:55:13 2014 + Add rev field to rtx costs. diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 2b33626..4ff18cd 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -54,6 +54,7 @@ struct alu_cost_table const int bfi; /* Bit-field insert. */ const int bfx; /* Bit-field extraction. */ const int clz; /* Count Leading Zeros. */ + const int rev; /* Reverse bits/bytes. */ const int non_exec; /* Extra cost when not executing insn. */ const bool non_exec_costs_exec; /* True if non-execution must add the exec cost. */ diff --git a/gcc/config/arm/aarch-cost-tables.h b/gcc/config/arm/aarch-cost-tables.h index c30ea2f..adf8708 100644 --- a/gcc/config/arm/aarch-cost-tables.h +++ b/gcc/config/arm/aarch-cost-tables.h @@ -39,6 +39,7 @@ const struct cpu_cost_table generic_extra_costs = 0, /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ COSTS_N_INSNS (1), /* non_exec. */ false /* non_exec_costs_exec. */ }, @@ -139,6 +140,7 @@ const struct cpu_cost_table cortexa53_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -239,6 +241,7 @@ const struct cpu_cost_table cortexa57_extra_costs = COSTS_N_INSNS (1), /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index e69911c..a72ee1e 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -982,6 +982,7 @@ const struct cpu_cost_table cortexa9_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1083,6 +1084,7 @@ const struct cpu_cost_table cortexa7_extra_costs = COSTS_N_INSNS (1), /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ COSTS_N_INSNS (1), /* clz. */ +COSTS_N_INSNS (1), /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1184,6 +1186,7 @@ const struct cpu_cost_table cortexa12_extra_costs = 0, /* bfi. */ COSTS_N_INSNS (1), /* bfx. */ COSTS_N_INSNS (1), /* clz. */ +COSTS_N_INSNS (1), /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1284,6 +1287,7 @@ const struct cpu_cost_table cortexa15_extra_costs = COSTS_N_INSNS (1), /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ 0, /* non_exec. */ true /* non_exec_costs_exec. */ }, @@ -1384,6 +1388,7 @@ const struct cpu_cost_table v7m_extra_costs = 0, /* bfi. */ 0, /* bfx. */ 0, /* clz. */ +0, /* rev. */ COSTS_N_INSNS (1), /* non_exec. */ false /* non_exec_costs_exec. */ }, @@ -9334,6 +9339,47 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = LIBCALL_COST (2); return false; +case BSWAP: + if (arm_arch6) +{ + if (mode == SImode) +{ + *cost = COSTS_N_INSNS (1); + if (speed_p) +*cost += extra_cost-alu.rev; + + return false; +} +} + else +{ +/* No rev instruction available. Look at arm_legacy_rev + and thumb_legacy_rev for the form of RTL used then. */ + if (TARGET_THUMB) +{ + *cost = COSTS_N_INSNS (10); + + if (speed_p) +{ + *cost += 6 * extra_cost-alu.shift; + *cost += 3 * extra_cost-alu.logical; +
[PATCH][AArch64][2/3] Recognise rev16 operations on SImode and DImode data
Hi all, This patch adds a recogniser for the bitmask,shift,orr sequence of instructions that can be used to reverse the bytes in 16-bit halfwords (for the sequence itself look at the testcase included in the patch). This can be implemented with a rev16 instruction. Since the shifts can occur in any order and there are no canonicalisation rules for where they appear in the expression we have to have two patterns to match both cases. The rtx costs function is updated to recognise the pattern and cost it appropriately by using the rev field of the cost tables introduced in patch [1/3]. The rtx costs helper functions that are used to recognise those bitwise operations are placed in config/arm/aarch-common.c so that they can be reused by both arm and aarch64. I've added an execute testcase but no scan-assembler tests since conceptually in the future the combiner might decide to not use a rev instruction due to rtx costs. We can at least test that the code generated is functionally correct though. Tested aarch64-none-elf. Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.md (rev16mode2): New pattern. (rev16mode2_alt): Likewise. * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle rev16 case. * config/arm/aarch-common.c (aarch_rev16_shright_mask_imm_p): New. (aarch_rev16_shleft_mask_imm_p): Likewise. (aarch_rev16_p_1): Likewise. (aarch_rev16_p): Likewise. * config/arm/aarch-common-protos.h (aarch_rev16_p): Declare extern. (aarch_rev16_shright_mask_imm_p): Likewise. (aarch_rev16_shleft_mask_imm_p): Likewise. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/aarch64/rev16_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index ebd58c0..41761ae 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4682,6 +4682,16 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, return false; case IOR: + if (aarch_rev16_p (x)) +{ + *cost = COSTS_N_INSNS (1); + + if (speed) +*cost += extra_cost-alu.rev; + + return true; +} +/* Fall through. */ case XOR: case AND: cost_logic: diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 99a6ac8..a23452b 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3173,6 +3173,38 @@ [(set_attr type rev)] ) +;; There are no canonicalisation rules for the position of the lshiftrt, ashift +;; operations within an IOR/AND RTX, therefore we have two patterns matching +;; each valid permutation. + +(define_insn rev16mode2 + [(set (match_operand:GPI 0 register_operand =r) +(ior:GPI (and:GPI (ashift:GPI (match_operand:GPI 1 register_operand r) + (const_int 8)) + (match_operand:GPI 3 const_int_operand n)) + (and:GPI (lshiftrt:GPI (match_dup 1) +(const_int 8)) + (match_operand:GPI 2 const_int_operand n] + aarch_rev16_shleft_mask_imm_p (operands[3], MODEmode) +aarch_rev16_shright_mask_imm_p (operands[2], MODEmode) + rev16\\t%w0, %w1 + [(set_attr type rev)] +) + +(define_insn rev16mode2_alt + [(set (match_operand:GPI 0 register_operand =r) +(ior:GPI (and:GPI (lshiftrt:GPI (match_operand:GPI 1 register_operand r) +(const_int 8)) + (match_operand:GPI 2 const_int_operand n)) + (and:GPI (ashift:GPI (match_dup 1) + (const_int 8)) + (match_operand:GPI 3 const_int_operand n] + aarch_rev16_shleft_mask_imm_p (operands[3], MODEmode) +aarch_rev16_shright_mask_imm_p (operands[2], MODEmode) + rev16\\t%w0, %w1 + [(set_attr type rev)] +) + ;; zero_extend version of above (define_insn *bswapsi2_uxtw [(set (match_operand:DI 0 register_operand =r) diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index d97ee61..08c4c7a 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -23,6 +23,9 @@ #ifndef GCC_AARCH_COMMON_PROTOS_H #define GCC_AARCH_COMMON_PROTOS_H +extern bool aarch_rev16_p (rtx); +extern bool aarch_rev16_shleft_mask_imm_p (rtx, enum machine_mode); +extern bool aarch_rev16_shright_mask_imm_p (rtx, enum machine_mode); extern int arm_early_load_addr_dep (rtx, rtx); extern int arm_early_store_addr_dep (rtx, rtx); extern int arm_mac_accumulator_is_mul_result (rtx, rtx); diff --git a/gcc/config/arm/aarch-common.c b/gcc/config/arm/aarch-common.c index c11f7e9..75ed3fd 100644 --- a/gcc/config/arm/aarch-common.c +++ b/gcc/config/arm/aarch-common.c @@ -155,6 +155,79 @@ arm_get_set_operands (rtx producer, rtx consumer, return 0; } +bool
[PATCH][ARM][3/3] Recognise bitwise operations leading to SImode rev16
Hi all, This is the arm equivalent of patch [2/3] in the series that adds combine patterns for the bitwise operations leading to a rev16 instruction. It reuses the functions that were put in aarch-common.c to properly cost these operations. I tried matching a DImode rev16 (with the intent of splitting it into two rev16 ops) like aarch64 but combine wouldn't try to match that bitwise pattern in DImode like aarch64 does. Instead it tries various exotic combinations with subregs. Tested arm-none-eabi, bootstrap on arm-none-linux-gnueabihf. Ok for stage1? [gcc/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/arm/arm.md (arm_rev16si2): New pattern. (arm_rev16si2_alt): Likewise. * config/arm/arm.c (arm_new_rtx_costs): Handle rev16 case. [gcc/testsuite/] 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * gcc.target/arm/rev16.c: New test.commit 04e60723bd1fa2f8e2adcfeed676390643ffec0c Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Tue Feb 25 15:26:52 2014 + [ARM] Implement SImode rev16 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index 8d1d721..ed603f0 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9716,8 +9716,17 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, /* Vector mode? */ *cost = LIBCALL_COST (2); return false; +case IOR: + if (mode == SImode arm_arch6 aarch_rev16_p (x)) +{ + *cost = COSTS_N_INSNS (1); + if (speed_p) +*cost += extra_cost-alu.rev; -case AND: case XOR: case IOR: + return true; +} +/* Fall through. */ +case AND: case XOR: if (mode == SImode) { enum rtx_code subcode = GET_CODE (XEXP (x, 0)); diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 4df24a2..47bc747 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -12668,6 +12668,44 @@ (set_attr type rev)] ) +;; There are no canonicalisation rules for the position of the lshiftrt, ashift +;; operations within an IOR/AND RTX, therefore we have two patterns matching +;; each valid permutation. + +(define_insn arm_rev16si2 + [(set (match_operand:SI 0 register_operand =l,l,r) +(ior:SI (and:SI (ashift:SI (match_operand:SI 1 register_operand l,l,r) + (const_int 8)) +(match_operand:SI 3 const_int_operand n,n,n)) +(and:SI (lshiftrt:SI (match_dup 1) + (const_int 8)) +(match_operand:SI 2 const_int_operand n,n,n] + arm_arch6 +aarch_rev16_shleft_mask_imm_p (operands[3], SImode) +aarch_rev16_shright_mask_imm_p (operands[2], SImode) + rev16\\t%0, %1 + [(set_attr arch t1,t2,32) + (set_attr length 2,2,4) + (set_attr type rev)] +) + +(define_insn arm_rev16si2_alt + [(set (match_operand:SI 0 register_operand =l,l,r) +(ior:SI (and:SI (lshiftrt:SI (match_operand:SI 1 register_operand l,l,r) + (const_int 8)) +(match_operand:SI 2 const_int_operand n,n,n)) +(and:SI (ashift:SI (match_dup 1) + (const_int 8)) +(match_operand:SI 3 const_int_operand n,n,n] + arm_arch6 +aarch_rev16_shleft_mask_imm_p (operands[3], SImode) +aarch_rev16_shright_mask_imm_p (operands[2], SImode) + rev16\\t%0, %1 + [(set_attr arch t1,t2,32) + (set_attr length 2,2,4) + (set_attr type rev)] +) + (define_expand bswaphi2 [(set (match_operand:HI 0 s_register_operand =r) (bswap:HI (match_operand:HI 1 s_register_operand r)))] diff --git a/gcc/testsuite/gcc.target/arm/rev16.c b/gcc/testsuite/gcc.target/arm/rev16.c new file mode 100644 index 000..1c869b3 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/rev16.c @@ -0,0 +1,35 @@ +/* { dg-options -O2 } */ +/* { dg-do run } */ + +extern void abort (void); + +typedef unsigned int __u32; + +__u32 +__rev16_32_alt (__u32 x) +{ + return (((__u32)(x) (__u32)0xff00ff00UL) 8) + | (((__u32)(x) (__u32)0x00ff00ffUL) 8); +} + +__u32 +__rev16_32 (__u32 x) +{ + return (((__u32)(x) (__u32)0x00ff00ffUL) 8) + | (((__u32)(x) (__u32)0xff00ff00UL) 8); +} + +int +main (void) +{ + volatile __u32 in32 = 0x12345678; + volatile __u32 expected32 = 0x34127856; + + if (__rev16_32 (in32) != expected32) +abort (); + + if (__rev16_32_alt (in32) != expected32) +abort (); + + return 0; +}
[PATCH][AArch64] Add handling of bswap operations in rtx costs
Hi all, This patch depends on the series started at http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00933.html but is not really a part of it. It just adds costing of the bswap operation using the new rev field in the rtx cost tables since we have patterns in aarch64.md that handle bswap by generating rev16 instructions. Tested aarch64-none-elf. Ok for stage1 after that series goes in? 2014-03-19 Kyrylo Tkachov kyrylo.tkac...@arm.com * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle BSWAP.commit b9771a71dbf62522d423e16ce03353624c1ccd5a Author: Kyrylo Tkachov kyrylo.tkac...@arm.com Date: Thu Feb 27 11:55:27 2014 + [AArch64] Cost bswap operations properly diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 901ad3d..28c8841 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4678,6 +4678,14 @@ aarch64_rtx_costs (rtx x, int code, int outer ATTRIBUTE_UNUSED, return false; +case BSWAP: + *cost = COSTS_N_INSNS (1); + + if (speed) +*cost += extra_cost-alu.rev; + + return false; + case IOR: case XOR: case AND:
stray warning from gcc's cpp
I observe the following minor annoyance on FreeBSD systems where cpp is GCC's cpp. If a DTrace script has the following shebang line: #!/usr/sbin/dtrace -Cs then the following warning is produced when the script is run: cc1: warning: is shorter than expected Some details. dtrace(1) first forks. Then a child seeks on a file descriptor associated with the script file, so that the shebang line is skipped (because otherwise it would confuse cpp). Then the child makes the file descriptor its standard input and then it execs cpp. cpp performs fstat(2) on its standard input descriptor and determines that it points to a regular file. Then it verifies that a number of bytes it reads from the file is the same as a size of the file. The check makes sense if the file is opened by cpp itself, but it does not always make sense for the stdin as described above. The following patch seems to fix the issue, but perhaps there is a better / smarter alternative. --- a/libcpp/files.c +++ b/libcpp/files.c @@ -601,7 +601,8 @@ read_file_guts (cpp_reader *pfile, _cpp_file *file) return false; } - if (regular total != size STAT_SIZE_RELIABLE (file-st)) + if (regular total != size file-fd != 0 + STAT_SIZE_RELIABLE (file-st)) cpp_error (pfile, CPP_DL_WARNING, %s is shorter than expected, file-path); -- Andriy Gapon
Re: [AArch64] 64-bit float vreinterpret implemention
On 28 February 2014 10:30, Alex Velenko alex.vele...@arm.com wrote: Hi Richard, Thank you for your suggestion. Attached is a patch that includes implementation of your proposition. A testsuite was run on LE and BE compilers with no regressions. Here is the description of the patch: This patch introduces vreinterpret implementation for vectors with 64-bit float lanes and adds testcase for those intrinsics. The aarch64_init_simd_builtins() infrastructure requires the presence of named RTL patterns in order to construct the types of the SIMD intrinsics even when an intrinsic is emitted as tree. This seems rather ugly to me. At some point we should figure out how to clean up this aspect of aarch64_init_simd_builtins() and remove the otherwise unused .md patterns. This aside I think your patch is fine as it stands and can be committed in stage-1. Cheers /Marcus
Re: [PATCH, ARM] Fix ICE due to out of bound.
On Wed, Mar 19, 2014 at 09:46:56AM +, Ramana Radhakrishnan wrote: On 03/19/14 08:42, Zhenqiang Chen wrote: ICE when compiling gcc.target/arm/neon-modes-3.c with -g in arm_dwarf_register_span since parts[8] is out of bound for XImode. GET_MODE_SIZE (XImode) / 4 is 16. rtx parts[8] can not hold all the registers. According to arm-modes.def, 16 should be the biggest number. So the patch updates parts to rtx parts[16]; Bootstrap and no make check regression on ARM Chrome book. OK for trunk? It may be time in 4.10 or 5.0 (whatever we call it :)), to deal with the FIXME in arm_dwarf_register_span to deal with DW_OP_piece. I'm surprised that it's taken so long to hit this. This is OK for stage4 - it looks sane to me but this needs an RM ack before applying. Ok. Jakub
[PATCH] Avoid ggc_collect () after WPA forking
This patch avoids calling ggc_collect after we possibly forked during WPA phase as that necessarily causes a lot of page unsharing. I have verified that during a LTO bootstrap we do not allocate GC memory during (or after) lto_wpa_write_files, thus the effect on memory use should be positive (the patch below contains checking code making sure that we don't alloc). LTO bootstrapped on x86_64-unknown-linux-gnu, will apply shortly (without the checking code of course). That should fix the WPA memory explosion Martin sees with building Chromium. Richard. 2014-03-19 Richard Biener rguent...@suse.de * lto.c (lto_wpa_write_files): Move call to lto_promote_cross_file_statics ... (do_whole_program_analysis): ... here, into the partitioning block. Do not ggc_collect after lto_wpa_write_files but for a last time before it. Index: gcc/ggc-page.c === --- gcc/ggc-page.c (revision 208642) +++ gcc/ggc-page.c (working copy) @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s return size; } +int may_alloc = 1; + /* Allocate a chunk of memory of SIZE bytes. Its contents are undefined. */ void * @@ -1208,6 +1210,9 @@ ggc_internal_alloc_stat (size_t size MEM struct page_entry *entry; void *result; + if (!may_alloc) +fatal_error (allocating GC memory); + ggc_round_alloc_size_1 (size, order, object_size); /* If there are non-full pages for this size allocation, they are at Index: gcc/lto/lto.c === --- gcc/lto/lto.c (revision 208642) +++ gcc/lto/lto.c (working copy) @@ -2565,11 +2566,6 @@ lto_wpa_write_files (void) FOR_EACH_VEC_ELT (ltrans_partitions, i, part) lto_stats.num_output_symtab_nodes += lto_symtab_encoder_size (part-encoder); - /* Find out statics that need to be promoted - to globals with hidden visibility because they are accessed from multiple - partitions. */ - lto_promote_cross_file_statics (); - timevar_pop (TV_WHOPR_WPA); timevar_push (TV_WHOPR_WPA_IO); @@ -3281,11 +3277,25 @@ do_whole_program_analysis (void) node-aux = NULL; lto_stats.num_cgraph_partitions += ltrans_partitions.length (); + + /* Find out statics that need to be promoted + to globals with hidden visibility because they are accessed from multiple + partitions. */ + lto_promote_cross_file_statics (); timevar_pop (TV_WHOPR_PARTITIONING); timevar_stop (TV_PHASE_OPT_GEN); - timevar_start (TV_PHASE_STREAM_OUT); + /* Collect a last time - in lto_wpa_write_files we may end up forking + with the idea that this doesn't increase memory usage. So we + absoultely do not want to collect after that. */ + ggc_collect (); +{ + extern int may_alloc; + may_alloc = 0; +} + + timevar_start (TV_PHASE_STREAM_OUT); if (!quiet_flag) { fprintf (stderr, \nStreaming out); @@ -3294,10 +3304,8 @@ do_whole_program_analysis (void) lto_wpa_write_files (); if (!quiet_flag) fprintf (stderr, \n); - timevar_stop (TV_PHASE_STREAM_OUT); - ggc_collect (); if (post_ipa_mem_report) { fprintf (stderr, Memory consumption after IPA\n);
[PATCH] Fix ubsan ICE (PR sanitizer/60569)
Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, so check that it exists before accessing it. Note that the test has to be run; only compiling wasn't enough to provoke the ICE. Ran ubsan testsuite on x86_64-linux, ok for trunk? 2014-03-19 Marek Polacek pola...@redhat.com PR sanitizer/60569 * ubsan.c (ubsan_type_descriptor): Check that DECL_NAME is nonnull before accessing it. testsuite/ * g++.dg/ubsan/pr60569.C: New test. diff --git gcc/testsuite/g++.dg/ubsan/pr60569.C gcc/testsuite/g++.dg/ubsan/pr60569.C index e69de29..df6b7a4 100644 --- gcc/testsuite/g++.dg/ubsan/pr60569.C +++ gcc/testsuite/g++.dg/ubsan/pr60569.C @@ -0,0 +1,21 @@ +// PR sanitizer/60569 +// { dg-do run } +// { dg-require-effective-target lto } +// { dg-options -fsanitize=undefined -flto } + +struct A +{ + void foo (); + struct + { +int i; +void bar () { i = 0; } + } s; +}; + +void A::foo () { s.bar (); } + +int +main () +{ +} diff --git gcc/ubsan.c gcc/ubsan.c index 7c7a893..22470da 100644 --- gcc/ubsan.c +++ gcc/ubsan.c @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool want_pointer_type_p) { if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); - else + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); } Marek
Re: [PATCH] Fix ubsan ICE (PR sanitizer/60569)
On Wed, Mar 19, 2014 at 12:13:57PM +0100, Marek Polacek wrote: Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, so check that it exists before accessing it. Note that the test has to be run; only compiling wasn't enough to provoke the ICE. ?? Shouldn't // { dg-do link } be sufficient? --- gcc/ubsan.c +++ gcc/ubsan.c @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool want_pointer_type_p) { if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); - else + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); } This looks good to me. Jakub
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, Mar 19, 2014 at 12:10 PM, Richard Biener wrote: Index: gcc/ggc-page.c === --- gcc/ggc-page.c (revision 208642) +++ gcc/ggc-page.c (working copy) @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s return size; } +int may_alloc = 1; bool may_alloc? Ciao! Steven
Re: [testsuite] Fix gcc.dg/tls/pr58595.c on Solaris 9
Jakub Jelinek ja...@redhat.com writes: On Tue, Mar 18, 2014 at 11:19:52AM +0100, Rainer Orth wrote: The new gcc.dg/tls/pr58595.c testcase FAILs on Solaris 9: FAIL: gcc.dg/tls/pr58595.c (test for excess errors) Excess errors: Undefined first referenced symbol in file ___tls_get_addr /var/tmp//ccuBbAna.o ld: fatal: Symbol referencing errors. No output written to ./pr58595.exe WARNING: gcc.dg/tls/pr58595.c compilation failed to produce executable Fixed as follows, tested with the appropriate runtest invocation on i386-pc-solaris2.9, i386-pc-solaris2.11, and x86_64-unknown-linux-gnu, installed on mainline. Can you please also change /* { dg-require-effective-target tls } */ to /* { dg-require-effective-target tls_runtime } */ ? Sure, done as follows after retesting as before: 2014-03-19 Rainer Orth r...@cebitec.uni-bielefeld.de * gcc.dg/tls/pr58595.c: Require tls_runtime instead of tls. changeset: 13384:d1c2de35507e tag: tip user:Rainer Orth r...@cebitec.uni-bielefeld.de date:Wed Mar 19 13:04:36 2014 +0100 summary: Require tls_runtime in gcc.dg/tls/pr58595.c diff --git a/gcc/testsuite/gcc.dg/tls/pr58595.c b/gcc/testsuite/gcc.dg/tls/pr58595.c --- a/gcc/testsuite/gcc.dg/tls/pr58595.c +++ b/gcc/testsuite/gcc.dg/tls/pr58595.c @@ -3,7 +3,7 @@ /* { dg-options -O2 } */ /* { dg-additional-options -fpic { target fpic } } */ /* { dg-add-options tls } */ -/* { dg-require-effective-target tls } */ +/* { dg-require-effective-target tls_runtime } */ /* { dg-require-effective-target sync_int_long } */ struct S { unsigned long a, b; }; BTW, don't know if dg-add-options tls can come before that or not. It can: the tls_runtime check takes care of adding the options itself. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [PATCH] Fix ubsan ICE (PR sanitizer/60569)
On Wed, Mar 19, 2014 at 12:17:19PM +0100, Jakub Jelinek wrote: On Wed, Mar 19, 2014 at 12:13:57PM +0100, Marek Polacek wrote: Apparently with LTO we can get a TYPE_NAME without a DECL_NAME, so check that it exists before accessing it. Note that the test has to be run; only compiling wasn't enough to provoke the ICE. ?? Shouldn't // { dg-do link } be sufficient? Ah, forgot about that, it is sufficient. Ok with dg-do link instead of dg-do run? --- gcc/ubsan.c +++ gcc/ubsan.c @@ -318,7 +318,7 @@ ubsan_type_descriptor (tree type, bool want_pointer_type_p) { if (TREE_CODE (TYPE_NAME (type2)) == IDENTIFIER_NODE) tname = IDENTIFIER_POINTER (TYPE_NAME (type2)); - else + else if (DECL_NAME (TYPE_NAME (type2)) != NULL) tname = IDENTIFIER_POINTER (DECL_NAME (TYPE_NAME (type2))); } This looks good to me. Thanks. Marek
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Steven Bosscher wrote: On Wed, Mar 19, 2014 at 12:10 PM, Richard Biener wrote: Index: gcc/ggc-page.c === --- gcc/ggc-page.c (revision 208642) +++ gcc/ggc-page.c (working copy) @@ -1199,6 +1199,8 @@ ggc_round_alloc_size (size_t requested_s return size; } +int may_alloc = 1; bool may_alloc? It's only checking code I didn't commit. We may of course alloc but I wanted to prove we don't. Richard.
[PATCH] Reduce GC walk recursion depth for types
This reduces GC walk recursion depth in two ways. First by re-ordering tree_type_common members to move 'name' last and 'canonical' before 'next_variant'. That makes us first recurse downward (type, pointer_to/reference_to), then on the same level (canonical, next_variant, main_variant) and finally upward (context, name-decl_context). For TS_TYPE_NON_COMMON we still walk down afterwards via values, on the same level via minval/maxval and upwards via binfo, so that the patch helps is maybe too much handwaving? (but it helps a reduced testcase without doing the 2nd part) Second by choosing sth different for chain_next for types than TREE_CHAIN (which is TYPE_STUB_DECL, no chain at all). That makes the unreduced testcase work and apart from the issue below should be obvious enough (though there usually shouldn't be so many type variants - still if for every type we save two or three recursions that still helps). Martin verified this fixes PR60553. I've changed chain_next only for the LTO frontend as while (ggc_test_and_set_mark (xlimit)) xlimit = (CODE_CONTAINS_STRUCT (TREE_CODE ((*xlimit).generic), TS_TYPE_COMMON) ? ((union lang_tree_node *) (*xlimit).generic.type_common.next_variant) : CODE_CONTAINS_STRUCT (TREE_CODE ((*xlimit).generic), TS_COMMON) ? ((union lang_tree_node *) (*xlimit).generic.common.chain) : NULL); likely doesn't create great code ... (note duplicate tree checks with checking here for other frontends, fixed LTO with the patch below). LTO bootstrap running on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Richard. 2014-03-19 Richard Biener rguent...@suse.de PR middle-end/60553 * tree-core.h (tree_type_common): Re-order pointer members to reduce recursion depth during GC walks. lto/ * lto-tree.h (lang_tree_node): For types use TYPE_NEXT_VARIANT instead of TREE_CHAIN as chain_next. Index: gcc/tree-core.h === --- gcc/tree-core.h (revision 208642) +++ gcc/tree-core.h (working copy) @@ -1265,11 +1265,11 @@ struct GTY(()) tree_type_common { const char * GTY ((tag (TYPE_SYMTAB_IS_POINTER))) pointer; struct die_struct * GTY ((tag (TYPE_SYMTAB_IS_DIE))) die; } GTY ((desc (debug_hooks-tree_type_symtab_field))) symtab; - tree name; + tree canonical; tree next_variant; tree main_variant; tree context; - tree canonical; + tree name; }; struct GTY(()) tree_type_with_lang_specific { Index: gcc/lto/lto-tree.h === --- gcc/lto/lto-tree.h (revision 208642) +++ gcc/lto/lto-tree.h (working copy) @@ -48,7 +48,7 @@ enum lto_tree_node_structure_enum { }; union GTY((desc (lto_tree_node_structure (%h)), - chain_next (CODE_CONTAINS_STRUCT (TREE_CODE (%h.generic), TS_COMMON) ? ((union lang_tree_node *) TREE_CHAIN (%h.generic)) : NULL))) + chain_next (CODE_CONTAINS_STRUCT (TREE_CODE (%h.generic), TS_TYPE_COMMON) ? ((union lang_tree_node *) %h.generic.type_common.next_variant) : CODE_CONTAINS_STRUCT (TREE_CODE (%h.generic), TS_COMMON) ? ((union lang_tree_node *) %h.generic.common.chain) : NULL))) lang_tree_node { union tree_node GTY ((tag (TS_LTO_GENERIC),
Re: [PATCH] Reduce GC walk recursion depth for types
On Wed, Mar 19, 2014 at 02:02:10PM +0100, Richard Biener wrote: LTO bootstrap running on x86_64-unknown-linux-gnu. Ok for trunk? Thanks, Richard. 2014-03-19 Richard Biener rguent...@suse.de PR middle-end/60553 * tree-core.h (tree_type_common): Re-order pointer members to reduce recursion depth during GC walks. lto/ * lto-tree.h (lang_tree_node): For types use TYPE_NEXT_VARIANT instead of TREE_CHAIN as chain_next. LGTM. Jakub
[Fortran][PATCH][gomp4]: Transform OpenACC loop directive
Hi Tobias! This patch implements transformation of OpenACC loop directive from Fortran AST to GENERIC. Successfully bootstrapped and tested with no new regressions on x86_64-unknown-linux-gnu. OK for gomp4 branch? -- Ilmir. From de2dd5ba0c48500e8e9084bd46cbfac2f21352fe Mon Sep 17 00:00:00 2001 From: Ilmir Usmanov i.usma...@samsung.com Date: Wed, 19 Mar 2014 15:12:36 +0400 Subject: [PATCH] Transform OpenACC loop directive from fortran AST to GENERIC --- * gcc/fortran/trans-openmp.c (gfc_trans_oacc_loop): New function. (gfc_trans_oacc_combined_directive): Call it. (gfc_trans_oacc_directive): Likewise. * gcc/tree-pretty-print (dump_omp_clause): Fix WORKER and VECTOR. * gcc/testsuite/gfortran.dg/goacc/loop-tree.f95: New test. diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c index 29364f4..cb7c970 100644 --- a/gcc/fortran/trans-openmp.c +++ b/gcc/fortran/trans-openmp.c @@ -1571,11 +1571,181 @@ typedef struct dovar_init_d { tree init; } dovar_init; + +static tree +gfc_trans_oacc_loop (gfc_code *code, stmtblock_t *pblock, + gfc_omp_clauses *loop_clauses) +{ + gfc_se se; + tree dovar, stmt, from, to, step, type, init, cond, incr; + tree count = NULL_TREE, cycle_label, tmp, omp_clauses; + stmtblock_t block; + stmtblock_t body; + gfc_omp_clauses *clauses = code-ext.omp_clauses; + int i, collapse = clauses-collapse; + vecdovar_init inits = vNULL; + dovar_init *di; + unsigned ix; + + if (collapse = 0) +collapse = 1; + + code = code-block-next; + gcc_assert (code-op == EXEC_DO || code-op == EXEC_DO_CONCURRENT); + + init = make_tree_vec (collapse); + cond = make_tree_vec (collapse); + incr = make_tree_vec (collapse); + + if (pblock == NULL) +{ + gfc_start_block (block); + pblock = block; +} + + omp_clauses = gfc_trans_omp_clauses (pblock, loop_clauses, code-loc); + + for (i = 0; i collapse; i++) +{ + int simple = 0; + + /* Evaluate all the expressions in the iterator. */ + gfc_init_se (se, NULL); + gfc_conv_expr_lhs (se, code-ext.iterator-var); + gfc_add_block_to_block (pblock, se.pre); + dovar = se.expr; + type = TREE_TYPE (dovar); + gcc_assert (TREE_CODE (type) == INTEGER_TYPE); + + gfc_init_se (se, NULL); + gfc_conv_expr_val (se, code-ext.iterator-start); + gfc_add_block_to_block (pblock, se.pre); + from = gfc_evaluate_now (se.expr, pblock); + + gfc_init_se (se, NULL); + gfc_conv_expr_val (se, code-ext.iterator-end); + gfc_add_block_to_block (pblock, se.pre); + to = gfc_evaluate_now (se.expr, pblock); + + gfc_init_se (se, NULL); + gfc_conv_expr_val (se, code-ext.iterator-step); + gfc_add_block_to_block (pblock, se.pre); + step = gfc_evaluate_now (se.expr, pblock); + + /* Special case simple loops. */ + if (TREE_CODE (dovar) == VAR_DECL) + { + if (integer_onep (step)) + simple = 1; + else if (tree_int_cst_equal (step, integer_minus_one_node)) + simple = -1; + } + + /* Loop body. */ + if (simple) + { + TREE_VEC_ELT (init, i) = build2_v (MODIFY_EXPR, dovar, from); + /* The condition should not be folded. */ + TREE_VEC_ELT (cond, i) = build2_loc (input_location, simple 0 + ? LE_EXPR : GE_EXPR, + boolean_type_node, dovar, to); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, PLUS_EXPR, + type, dovar, step); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, + MODIFY_EXPR, + type, dovar, + TREE_VEC_ELT (incr, i)); + } + else + { + /* STEP is not 1 or -1. Use: + for (count = 0; count (to + step - from) / step; count++) + { + dovar = from + count * step; + body; + cycle_label:; + } */ + tmp = fold_build2_loc (input_location, MINUS_EXPR, type, step, from); + tmp = fold_build2_loc (input_location, PLUS_EXPR, type, to, tmp); + tmp = fold_build2_loc (input_location, TRUNC_DIV_EXPR, type, tmp, + step); + tmp = gfc_evaluate_now (tmp, pblock); + count = gfc_create_var (type, count); + TREE_VEC_ELT (init, i) = build2_v (MODIFY_EXPR, count, + build_int_cst (type, 0)); + /* The condition should not be folded. */ + TREE_VEC_ELT (cond, i) = build2_loc (input_location, LT_EXPR, + boolean_type_node, + count, tmp); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, PLUS_EXPR, + type, count, + build_int_cst (type, 1)); + TREE_VEC_ELT (incr, i) = fold_build2_loc (input_location, + MODIFY_EXPR, type, count, + TREE_VEC_ELT (incr, i)); + + /* Initialize DOVAR. */ + tmp = fold_build2_loc (input_location, MULT_EXPR, type, count, step); + tmp = fold_build2_loc (input_location, PLUS_EXPR, type, from, tmp); + dovar_init e = {dovar, tmp}; + inits.safe_push (e); + } + + if (i + 1 collapse) + code = code-block-next; +} + + if (pblock != block) +{ + pushlevel (); +
Re: [C++ PATCH] [gomp4] Initial OpenACC support to C++ front-end
Ping. On 13.03.2014 21:05, Ilmir Usmanov wrote: On 07.03.2014 15:37, Ilmir Usmanov wrote: Hi Thomas! I prepared simple patch to add support of OpenACC data, kernels and parallel constructs to C++ FE. It adds support of data clauses too. OK to gomp4 branch? Fixed subject: changed file extensions of tests and fixed comments. OK to gomp4 branch? -- Ilmir.
Re: [patch] gcc fstack-protector-explicit
Well, finally I have the assignment, could you please review this patch? On Wed, Nov 20, 2013 at 4:13 PM, Jeff Law l...@redhat.com wrote: On 11/19/13 07:04, Marcos Díaz wrote: My employer is working on the signature of the papers. Could someone please do the review meanwhile? I'd prefer to wait until the assignment process is complete. If something were to happen and we can't use your code the review time would have been wasted (and such things have certainly happened in the past). Once the assignment is recorded, please ping this patch. Jeff -- __ Marcos Díaz Software Engineer San Lorenzo 47, 3rd Floor, Office 5 Córdoba, Argentina Phone: +54 351 4217888 / +54 351 4218211/ +54 351 7617452 Skype: markdiaz22
Re: [patch] gcc fstack-protector-explicit
On 03/19/14 08:06, Marcos Díaz wrote: Well, finally I have the assignment, could you please review this patch? Thanks. I'll take a look once we open up stage1 development again (should be soon as 4.9 is getting close to being ready). jeff
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Martin Liška wrote: There are stats for Firefox with LTO and -O2. According to graphs it looks that memory consumption for parallel WPA phase is similar. When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. Ok, so I suppose this tracks RSS, not virtual memory use (what is used and what is active)? And it is WPA plus LTRANS stages, WPA ends where memory use first goes down to zero? I wonder if you can identify the point where parallel streaming starts and where it ends ... ;) Btw, I have another patch in my local tree, limiting the exponential growth of blocks we allocate when outputting sections. But it shouldn't be _that_ bad ... maybe you can try if it has any effect? Thanks, Richard. Index: gcc/lto-section-out.c === --- gcc/lto-section-out.c (revision 208642) +++ gcc/lto-section-out.c (working copy) @@ -99,13 +99,19 @@ lto_end_section (void) } +/* We exponentially grow the size of the blocks as we need to make + room for more data to be written. Start with a single page and go up + to 2MB pages for this. */ +#define FIRST_BLOCK_SIZE 4096 +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ void lto_write_stream (struct lto_output_stream *obs) { - unsigned int block_size = 1024; + unsigned int block_size = FIRST_BLOCK_SIZE; struct lto_char_ptr_base *block; struct lto_char_ptr_base *next_block; if (!obs-first_block) @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre else lang_hooks.lto.append_data (base, num_chars, block); block_size *= 2; + block_size = MIN (MAX_BLOCK_SIZE, block_size); } } @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre { /* This is the first time the stream has been written into. */ - obs-block_size = 1024; + obs-block_size = FIRST_BLOCK_SIZE; new_block = (struct lto_char_ptr_base*) xmalloc (obs-block_size); obs-first_block = new_block; } @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre /* Get a new block that is twice as big as the last block and link it into the list. */ obs-block_size *= 2; + obs-block_size = MIN (MAX_BLOCK_SIZE, obs-block_size); new_block = (struct lto_char_ptr_base*) xmalloc (obs-block_size); /* The first bytes of the block are reserved as a pointer to the next block. Set the chain of the full block to the
Re: [PATCH] Avoid ggc_collect () after WPA forking
On 03/19/2014 03:55 PM, Richard Biener wrote: On Wed, 19 Mar 2014, Martin Liška wrote: There are stats for Firefox with LTO and -O2. According to graphs it looks that memory consumption for parallel WPA phase is similar. When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. Ok, so I suppose this tracks RSS, not virtual memory use (what is used and what is active)? Data are given by vmstat, according to: http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory *Active memory*is memory that is being used by a particular process. *Inactive memory*is memory that was allocated to a process that is no longer running. So please follow just 'blue' line that displays really used memory. According to man, vmstat tracks virtual memory statistics. And it is WPA plus LTRANS stages, WPA ends where memory use first goes down to zero? I wonder if you can identify the point where parallel streaming starts and where it ends ... ;) Exactly, WPA ends when it goes to zero. Btw, I have another patch in my local tree, limiting the exponential growth of blocks we allocate when outputting sections. But it shouldn't be _that_ bad ... maybe you can try if it has any effect? I can apply it. Martin Thanks, Richard. Index: gcc/lto-section-out.c === --- gcc/lto-section-out.c (revision 208642) +++ gcc/lto-section-out.c (working copy) @@ -99,13 +99,19 @@ lto_end_section (void) } +/* We exponentially grow the size of the blocks as we need to make + room for more data to be written. Start with a single page and go up + to 2MB pages for this. */ +#define FIRST_BLOCK_SIZE 4096 +#define MAX_BLOCK_SIZE (2 * 1024 * 1024) + /* Write all of the chars in OBS to the assembler. Recycle the blocks in obs as this is being done. */ void lto_write_stream (struct lto_output_stream *obs) { - unsigned int block_size = 1024; + unsigned int block_size = FIRST_BLOCK_SIZE; struct lto_char_ptr_base *block; struct lto_char_ptr_base *next_block; if (!obs-first_block) @@ -135,6 +141,7 @@ lto_write_stream (struct lto_output_stre else lang_hooks.lto.append_data (base, num_chars, block); block_size *= 2; + block_size = MIN (MAX_BLOCK_SIZE, block_size); } } @@ -152,7 +159,7 @@ lto_append_block (struct lto_output_stre { /* This is the first time the stream has been written into. */ - obs-block_size = 1024; + obs-block_size = FIRST_BLOCK_SIZE; new_block = (struct lto_char_ptr_base*) xmalloc (obs-block_size); obs-first_block = new_block; } @@ -162,6 +169,7 @@ lto_append_block (struct lto_output_stre /* Get a new block that is twice as big as the last block and link it into the list. */ obs-block_size *= 2; + obs-block_size = MIN (MAX_BLOCK_SIZE, obs-block_size); new_block = (struct lto_char_ptr_base*) xmalloc (obs-block_size); /* The first bytes of the block are reserved as a pointer to the next block. Set the chain of the full block to the
Re: [ARM] [Trivial] Fix shortening of field name extend.
On Mon, Feb 24, 2014 at 09:13:45AM +, James Greenhalgh wrote: *ping*, CCing Jakub. *ping x2* This was OKed by ramana, but we wanted release manager approval. I would have committed the patch as obvious if we were not in stage 4. Thanks, James On Wed, Feb 12, 2014 at 12:43:10PM +, Ramana Radhakrishnan wrote: On 02/12/14 12:19, James Greenhalgh wrote: Hi, In aarch-common-protos.h we define a field in alu_cost_table: extnd On its own this is an upsetting optimization of the English language, but this trouble is compounded by the comment attached to this field throughout the cost tables themselves: /* Extend. */ This patch fixes the spelling of extend to match that in the commemnts. I've checked that AArch64 and AArch32 build with this patch applied. OK for trunk/stage-1 (I don't mind which)? I am happy for this to go in now - Jakub ? regards Ramana 2014-03-19 James Greenhalgh james.greenha...@arm.com * config/arm/aarch-common-protos.h (alu_cost_table): Fix spelling of extend. * config/arm/arm.c (arm_new_rtx_costs): Fix spelling of extend. diff --git a/gcc/config/arm/aarch-common-protos.h b/gcc/config/arm/aarch-common-protos.h index 056fe56..a5ff6b4 100644 --- a/gcc/config/arm/aarch-common-protos.h +++ b/gcc/config/arm/aarch-common-protos.h @@ -48,8 +48,8 @@ struct alu_cost_table const int arith_shift_reg; /* ... and when the shift is by a reg. */ const int log_shift; /* Additional when logic also shifts... */ const int log_shift_reg; /* ... and when the shift is by a reg. */ - const int extnd; /* Zero/sign extension. */ - const int extnd_arith; /* Extend and arith. */ + const int extend;/* Zero/sign extension. */ + const int extend_arith; /* Extend and arith. */ const int bfi; /* Bit-field insert. */ const int bfx; /* Bit-field extraction. */ const int clz; /* Count Leading Zeros. */ diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index a68ed8d..31df089 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -9594,7 +9594,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, { /* UXTA[BH] or SXTA[BH]. */ if (speed_p) - *cost += extra_cost-alu.extnd_arith; + *cost += extra_cost-alu.extend_arith; *cost += (rtx_cost (XEXP (XEXP (x, 0), 0), ZERO_EXTEND, 0, speed_p) + rtx_cost (XEXP (x, 1), PLUS, 0, speed_p)); @@ -10311,7 +10311,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = COSTS_N_INSNS (1); *cost += rtx_cost (XEXP (x, 0), code, 0, speed_p); if (speed_p) - *cost += extra_cost-alu.extnd; + *cost += extra_cost-alu.extend; } else if (GET_MODE (XEXP (x, 0)) != SImode) { @@ -10364,7 +10364,7 @@ arm_new_rtx_costs (rtx x, enum rtx_code code, enum rtx_code outer_code, *cost = COSTS_N_INSNS (1); *cost += rtx_cost (XEXP (x, 0), code, 0, speed_p); if (speed_p) - *cost += extra_cost-alu.extnd; + *cost += extra_cost-alu.extend; } else if (GET_MODE (XEXP (x, 0)) != SImode) {
Re: [ARM] [Trivial] Fix shortening of field name extend.
On Wed, Mar 19, 2014 at 03:13:40PM +, James Greenhalgh wrote: On Mon, Feb 24, 2014 at 09:13:45AM +, James Greenhalgh wrote: *ping*, CCing Jakub. *ping x2* This was OKed by ramana, but we wanted release manager approval. I would have committed the patch as obvious if we were not in stage 4. This is ok even in stage4. Jakub
Re: [Patch AArch64] Define TARGET_FLAGS_REGNUM
On 28 February 2014 09:32, Ramana Radhakrishnan ramra...@arm.com wrote: Hi, This defines TARGET_FLAGS_REGNUM for AArch64 to be CC_REGNUM. Noticed this turns on the cmpelim pass after reload and in a few examples and a couple of benchmarks I noticed a number of comparisons getting deleted. A similar patch for AArch32 is being tested. Tested cross with aarch64-none-elf on a model with no regressions. Ok for stage1 ? OK /Marcus
Re: [PATCH] Avoid ggc_collect () after WPA forking
On Wed, 19 Mar 2014, Martin Liška wrote: On 03/19/2014 03:55 PM, Richard Biener wrote: On Wed, 19 Mar 2014, Martin Liška wrote: There are stats for Firefox with LTO and -O2. According to graphs it looks that memory consumption for parallel WPA phase is similar. When I disable parallel WPA, wpa footprint is ~4GB, but ltrans memory footprint is similar to parallel WPA that reduces libxul.so linking by ~10%. Ok, so I suppose this tracks RSS, not virtual memory use (what is used and what is active)? Data are given by vmstat, according to: http://stackoverflow.com/questions/18529723/what-is-active-memory-and-inactive-memory *Active memory*is memory that is being used by a particular process. *Inactive memory*is memory that was allocated to a process that is no longer running. So please follow just 'blue' line that displays really used memory. According to man, vmstat tracks virtual memory statistics. But 'blue' is neither active nor inactive ... what is 'used'? Does it correspond to 'swpd'? If it is virtual memory in use then this is expected to grow when fork()ing as the virtual memory space is obviously copied (just the pages are still shared). For me allocating a GB memory and clearing it increases active by 1GB and then forking doesn't increase any of the metrics vmstat -a outputs in any significant way. And it is WPA plus LTRANS stages, WPA ends where memory use first goes down to zero? I wonder if you can identify the point where parallel streaming starts and where it ends ... ;) Exactly, WPA ends when it goes to zero. So the difference isn't that big (8GB vs. 7.2GB), and is likely attributed to heap memory we allocate during the stream-out. For example we need some for the tree-ref-encoders (I remember that can be a significant amount of memory, but I improved that already as far as possible...). So yes, we _do_ allocate memory during stream-out and that is now required N times. Btw, I have another patch in my local tree, limiting the exponential growth of blocks we allocate when outputting sections. But it shouldn't be _that_ bad ... maybe you can try if it has any effect? I can apply it. Thanks, Richard.
PATCH: PR testsuite/60590: Can't recreate the same executable in testsuite
On Wed, Mar 19, 2014 at 8:41 AM, H.J. Lu hongjiu...@intel.com wrote: GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. set_ld_library_path_env_vars sets a few environment variables including LD_RUN_PATH. This patch logs all environment variables set by set_ld_library_path_env_vars so that one can recreate the same executable as make check run. OK to install? Thanks. H.J. --- 2014-03-19 H.J. Lu hongjiu...@intel.com PR testsuite/60590 * lib/target-libpath.exp (set_ld_library_path_env_vars): Log LD_LIBRARY_PATH, LD_RUN_PATH, SHLIB_PATH, LD_LIBRARY_PATH_32, LD_LIBRARY_PATH_64 and DYLD_LIBRARY_PATH. diff --git a/gcc/testsuite/lib/target-libpath.exp b/gcc/testsuite/lib/target-libpath.exp index 603ed8a..1891088 100644 --- a/gcc/testsuite/lib/target-libpath.exp +++ b/gcc/testsuite/lib/target-libpath.exp @@ -155,7 +155,12 @@ proc set_ld_library_path_env_vars { } { setenv DYLD_LIBRARY_PATH $ld_library_path } - verbose -log set_ld_library_path_env_vars: ld_library_path=$ld_library_path + verbose -log LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH] + verbose -log LD_RUN_PATH=[getenv LD_RUN_PATH] + verbose -log SHLIB_PATH=[getenv SHLIB_PATH] + verbose -log LD_LIBRARY_PATH_32=[getenv LD_LIBRARY_PATH_32] + verbose -log LD_LIBRARY_PATH_64=[getenv LD_LIBRARY_PATH_64] + verbose -log DYLD_LIBRARY_PATH=[getenv DYLD_LIBRARY_PATH] } ### Correction. It is a testsuite issue. -- H.J.
Re: [patch testsuite]: g++.dg/abi
On Mar 18, 2014, at 6:16 AM, Kai Tietz ktiet...@googlemail.com wrote: this patch skips anon2.C and anon3.C test for mingw target. Issue here is that weak under pe-coff is different to ELF-targets and therefore test doesn't apply for So, what does the output look like? There should be a trace of weak of some sort in the output.
Re: [C++ Patch / RFC] PR 51474
OK. Jason
PATCH: PR target/60590: Can't recreate the same executable in testsuite
GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. set_ld_library_path_env_vars sets a few environment variables including LD_RUN_PATH. This patch logs all environment variables set by set_ld_library_path_env_vars so that one can recreate the same executable as make check run. OK to install? Thanks. H.J. --- 2014-03-19 H.J. Lu hongjiu...@intel.com PR target/60590 * lib/target-libpath.exp (set_ld_library_path_env_vars): Log LD_LIBRARY_PATH, LD_RUN_PATH, SHLIB_PATH, LD_LIBRARY_PATH_32, LD_LIBRARY_PATH_64 and DYLD_LIBRARY_PATH. diff --git a/gcc/testsuite/lib/target-libpath.exp b/gcc/testsuite/lib/target-libpath.exp index 603ed8a..1891088 100644 --- a/gcc/testsuite/lib/target-libpath.exp +++ b/gcc/testsuite/lib/target-libpath.exp @@ -155,7 +155,12 @@ proc set_ld_library_path_env_vars { } { setenv DYLD_LIBRARY_PATH $ld_library_path } - verbose -log set_ld_library_path_env_vars: ld_library_path=$ld_library_path + verbose -log LD_LIBRARY_PATH=[getenv LD_LIBRARY_PATH] + verbose -log LD_RUN_PATH=[getenv LD_RUN_PATH] + verbose -log SHLIB_PATH=[getenv SHLIB_PATH] + verbose -log LD_LIBRARY_PATH_32=[getenv LD_LIBRARY_PATH_32] + verbose -log LD_LIBRARY_PATH_64=[getenv LD_LIBRARY_PATH_64] + verbose -log DYLD_LIBRARY_PATH=[getenv DYLD_LIBRARY_PATH] } ###
Re: [patch testsuite]: g++.dg/abi
2014-03-19 17:23 GMT+01:00 Mike Stump mikest...@comcast.net: On Mar 18, 2014, at 6:16 AM, Kai Tietz ktiet...@googlemail.com wrote: this patch skips anon2.C and anon3.C test for mingw target. Issue here is that weak under pe-coff is different to ELF-targets and therefore test doesn't apply for So, what does the output look like? There should be a trace of weak of some sort in the output. No, there is none. Output looks like: .seh_proc _ZN2N43._91CIiE3fn2ES2_ _ZN2N43._91CIiE3fn2ES2_: .LFB11: .seh_endprologue ret .seh_endproc .globl _ZN2N41qE .data .align 8 _ZN2N41qE: .quad _ZN2N43._91CIiE3fn2ES2_ .globl _ZN2N41pE .align 8 _ZN2N41pE: .quad _ZN2N43._91CIiE3fn1ENS0_1BE .globl _ZN2N31qE .align 8 _ZN2N31qE: .quad _ZN2N31D1CIiE3fn2ES2_... The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. Kai PS: I have another similiar reasoned patch for g++.dg/abi/thunk5.C on my pile too.
Re: PATCH: PR target/60590: Can't recreate the same executable in testsuite
On Mar 19, 2014, at 8:41 AM, H.J. Lu hongjiu...@intel.com wrote: GNU linker sets DT_RPATH from the environment variable LD_RUN_PATH. set_ld_library_path_env_vars sets a few environment variables including LD_RUN_PATH. This patch logs all environment variables set by set_ld_library_path_env_vars so that one can recreate the same executable as make check run. OK to install? Ok. If someone complains about the log size clutter, we can consider bumping it up to higher verbosity.
[jit] Tighten up the distinction between pointers and arrays
Committed to branch dmalcolm/jit: https://github.com/davidmalcolm/pygccjit/pull/3#issuecomment-37883129 showed a problem where a parameter expecting a (char *) was passed a char[1024] cast to a (char *) as its argument, leading to an ICE: libgccjit.so: internal compiler error: in convert_move, at expr.c:320 0x7fffebea98ad convert_move(rtx_def*, rtx_def*, int) ../../src/gcc/expr.c:320 0x7fffebec31cb expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../src/gcc/expr.c:8105 0x7fffec88d768 expand_gimple_stmt_1 ../../src/gcc/cfgexpand.c:2321 0x7fffec88d9cc expand_gimple_stmt ../../src/gcc/cfgexpand.c:2381 The issue was that the recording::type::dereference method is used for both pointers and for arrays, leading to sloppiness about where lvalues and rvalues can be pointers vs arrays. This commit introduces is_pointer and is_array methods, using them to tighten up type-checking, converting the above ICE into an type-check error when the cast is attempted: libgccjit.so: error: gcc_jit_context_new_cast: cannot cast buffer from type: char[1024] to type: char * The correct way to use an array as a pointer in the JIT API is to use gcc_jit_lvalue_get_address on the array, which gives you an rvalue representing the address of the initial element, and then to cast that rvalue as necessary. gcc/jit * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: accepts_writes_from): Accept writes from pointers, but not arrays. * internal-api.h (gcc::jit::recording::type::is_pointer): New. (gcc::jit::recording::type::is_array): New. (gcc::jit::recording::memento_of_get_type::accepts_writes_from): Allow (void *) to accept writes of pointers, but not arrays. (gcc::jit::recording::memento_of_get_type::is_pointer): New. (gcc::jit::recording::memento_of_get_type::is_array): New. (gcc::jit::recording::memento_of_get_pointer::is_pointer): New. (gcc::jit::recording::memento_of_get_pointer::is_array): New. (gcc::jit::recording::memento_of_get_const::is_pointer): New. (gcc::jit::recording::memento_of_get_const::is_array): New. (gcc::jit::recording::memento_of_get_volatile::is_pointer): New. (gcc::jit::recording::memento_of_get_volatile::is_array): New. (gcc::jit::recording::array_type::is_pointer): New. (gcc::jit::recording::array_type::is_array): New. (gcc::jit::recording::function_type::is_pointer): New. (gcc::jit::recording::function_type::is_array): New. (gcc::jit::recording::struct_::is_pointer): New. (gcc::jit::recording::struct_::is_array): New. * libgccjit.c (gcc_jit_context_new_rvalue_from_ptr): Require the pointer_type to be a pointer, not an array. (gcc_jit_context_null): Likewise. (is_valid_cast): Require pointer casts to be between pointer types, not arrays. (gcc_jit_context_new_array_access): Update error message from not a pointer to not a pointer or array. (gcc_jit_rvalue_dereference_field): Require the pointer arg to be of pointer type, not an array. (gcc_jit_rvalue_dereference): Likewise. gcc/testsuite/ * jit.dg/test-array-as-pointer.c: New test case, verifying that there's a way to treat arrays as pointers. * jit.dg/test-combination.c: Add test-array-as-pointer.c... (create_code): ...here and... (verify_code): ...here. * jit.dg/test-error-array-as-pointer.c: New test case, verifying that bogus casts from array to pointer are caught by the type system, rather than leading to ICEs seen in: https://github.com/davidmalcolm/pygccjit/pull/3#issuecomment-37883129 --- gcc/jit/ChangeLog.jit | 35 +++ gcc/jit/internal-api.c | 2 +- gcc/jit/internal-api.h | 18 +++- gcc/jit/libgccjit.c| 14 +-- gcc/testsuite/ChangeLog.jit| 13 +++ gcc/testsuite/jit.dg/test-array-as-pointer.c | 101 + gcc/testsuite/jit.dg/test-combination.c| 9 ++ gcc/testsuite/jit.dg/test-error-array-as-pointer.c | 99 8 files changed, 282 insertions(+), 9 deletions(-) create mode 100644 gcc/testsuite/jit.dg/test-array-as-pointer.c create mode 100644 gcc/testsuite/jit.dg/test-error-array-as-pointer.c diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index 8244eba..efb1931 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,38 @@ +2014-03-19 David Malcolm dmalc...@redhat.com + + * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: + accepts_writes_from): Accept writes from pointers, but not arrays. + + * internal-api.h (gcc::jit::recording::type::is_pointer): New. +
Re: [patch testsuite]: g++.dg/abi
Kai Tietz ktiet...@googlemail.com writes: 2014-03-19 17:23 GMT+01:00 Mike Stump mikest...@comcast.net: On Mar 18, 2014, at 6:16 AM, Kai Tietz ktiet...@googlemail.com wrote: this patch skips anon2.C and anon3.C test for mingw target. Issue here is that weak under pe-coff is different to ELF-targets and therefore test doesn't apply for So, what does the output look like? There should be a trace of weak of some sort in the output. No, there is none. Output looks like: .seh_proc _ZN2N43._91CIiE3fn2ES2_ _ZN2N43._91CIiE3fn2ES2_: .LFB11: .seh_endprologue ret .seh_endproc .globl _ZN2N41qE .data .align 8 _ZN2N41qE: .quad _ZN2N43._91CIiE3fn2ES2_ .globl _ZN2N41pE .align 8 _ZN2N41pE: .quad _ZN2N43._91CIiE3fn1ENS0_1BE .globl _ZN2N31qE .align 8 _ZN2N31qE: .quad _ZN2N31D1CIiE3fn2ES2_... The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. In that case, it seems far better to have gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that instead of lying about weak support. This way, everything else simply falls into place; no need to special-case many individual testcases. Rainer -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [patch testsuite]: g++.dg/abi
On Mar 19, 2014, at 9:49 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. In that case, it seems far better to have gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that instead of lying about weak support. Yeah, this is the direction I was headed… :-)
[PATCH 2/2, AARCH64] Test case changes: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
Hi Marcus, On 14 March 2014 19:42, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: Do we need a new effective target test, why is the existing fstack_protector not appropriate? stack_protector does a run time test. It failed in cross compilation environment and these are compile only tests. This works fine in my cross environment, how does yours fail? Also I thought richard suggested me to add a new option for this. ref: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03358.html I read that comment to mean use an effective target test instead of matching triples. I don't see that re-using an existing effective target test contradicts that suggestion. Looking through the test suite I see that there are: 6 tests that use dg-do compile with dg-require-effective-target fstack_protector 4 tests that use dg-do run with dg-require-effective-target fstack_protector 2 tests that use dg-do run {target native} dg-require-effective-target fstack_protector and finally the 2 tests we are discussing that use dg-compile with a triple test. so there are already tests in the testsuite that use dg-do compile with the existing effective target test. I see no immediately obvious reason why the two tests that require target native require the native constraint... but I guess that is a different issue. I used the existing dg-require-effective-target check, stack_protector and added it in a separate line. ChangeLog. 2014-03-19 Venkataramanan Kumar venkataramanan.ku...@linaro.org * g++.dg/fstack-protector-strong.C: Add effetive target check for stack protection. * gcc.dg/fstack-protector-strong.c: Likewise. These two tests are passing now for aarch64-none-linux-gnu target under QEMU. Let me know if I can upstream these two patches. regards, Venkat. Index: gcc/testsuite/g++.dg/fstack-protector-strong.C === --- gcc/testsuite/g++.dg/fstack-protector-strong.C (revision 208609) +++ gcc/testsuite/g++.dg/fstack-protector-strong.C (working copy) @@ -1,7 +1,8 @@ /* Test that stack protection is done on chosen functions. */ -/* { dg-do compile { target i?86-*-* x86_64-*-* } } */ +/* { dg-do compile } */ /* { dg-options -O2 -fstack-protector-strong } */ +/* { dg-require-effective-target fstack_protector } */ class A { Index: gcc/testsuite/gcc.dg/fstack-protector-strong.c === --- gcc/testsuite/gcc.dg/fstack-protector-strong.c (revision 208609) +++ gcc/testsuite/gcc.dg/fstack-protector-strong.c (working copy) @@ -1,7 +1,8 @@ /* Test that stack protection is done on chosen functions. */ -/* { dg-do compile { target i?86-*-* x86_64-*-* rs6000-*-* s390x-*-* } } */ +/* { dg-do compile } */ /* { dg-options -O2 -fstack-protector-strong } */ +/* { dg-require-effective-target fstack_protector } */ #includestring.h
[C++ Patch] PR 60384
Hi, in this minor regression we ICE during error recovery, when push_class_level_binding_1 (called by finish_member_declaration via pushdecl_class_level) gets a TEMPLATE_ID_EXPR as the name argument. It's a regression because, since r199779, invalid declarations get more often through (with TREE_TYPE an error_mark_node, like TREE_TYPE (x) in the case at issue). Thus the additional check I'm suggesting. Tested x86_64-linux. Thanks, Paolo. // /cp 2014-03-19 Paolo Carlini paolo.carl...@oracle.com PR c++/60384 * name-lookup.c (push_class_level_binding_1): Check identifier_p on the name argument. /testsuite 2014-03-19 Paolo Carlini paolo.carl...@oracle.com PR c++/60384 * g++.dg/cpp1y/pr60384.C: New. Index: cp/name-lookup.c === --- cp/name-lookup.c(revision 208682) +++ cp/name-lookup.c(working copy) @@ -3112,7 +3112,9 @@ push_class_level_binding_1 (tree name, tree x) if (!class_binding_level) return true; - if (name == error_mark_node) + if (name == error_mark_node + /* Can happen for an erroneous declaration (c++/60384). */ + || !identifier_p (name)) return false; /* Check for invalid member names. But don't worry about a default Index: testsuite/g++.dg/cpp1y/pr60384.C === --- testsuite/g++.dg/cpp1y/pr60384.C(revision 0) +++ testsuite/g++.dg/cpp1y/pr60384.C(working copy) @@ -0,0 +1,9 @@ +// PR c++/60384 +// { dg-do compile { target c++1y } } + +templatetypename int foo(); + +struct A +{ + typedef auto foo(); // { dg-error typedef declared 'auto' } +};
Re: [patch testsuite]: g++.dg/abi
On Wed, 19 Mar 2014, Kai Tietz wrote: The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. There are already two different checks (check_weak_available and check_weak_override_available), reflecting what different testcases need. Is the requirement for these tests logically different from both of those? If so, maybe there should be a third such check (even if in fact it does the same thing as check_weak_override_available). -- Joseph S. Myers jos...@codesourcery.com
[PATCH 1/2, AARCH64]: Machine descriptions: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
Hi Marcus, On 14 March 2014 19:42, Marcus Shawcroft marcus.shawcr...@gmail.com wrote: Hi Venkat On 5 February 2014 10:29, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: Hi Marcus, + ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0 + [(set_attr length 12)]) This pattern emits an opaque sequence of instructions that cannot be scheduled, is that necessary? Can we not expand individual instructions or at least split ? Almost all the ports emits a template of assembly instructions. I m not sure why they have to be generated this way. But usage of these pattern is to clear the register that holds canary value immediately after its usage. I've just read the thread Andrew pointed out, thanks, I'm happy that there is a good reason to do it this way. Andrew, thanks for providing the background. + [(set_attr length 12)]) + These patterns should also set the type attribute, a reasonable value would be multiple. I have incorporated your review comments and split the patch into two. The first patch attached here contains Aarch64 machine descriptions for the stack protect patterns. ChangeLog. 2014-03-19 Venkataramanan Kumar venkataramanan.ku...@linaro.org * config/aarch64/aarch64.md (stack_protect_set, stack_protect_test) (stack_protect_set_mode, stack_protect_test_mode): Add machine descriptions for Stack Smashing Protector. Tested for aarch64-none-linux-gnu target under QEMU . regards, Venkat. Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 208609) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -102,6 +102,8 @@ UNSPEC_TLSDESC UNSPEC_USHL_2S UNSPEC_VSTRUCTDUMMY +UNSPEC_SP_SET +UNSPEC_SP_TEST ]) (define_c_enum unspecv [ @@ -3634,6 +3636,67 @@ DONE; }) +;; Named patterns for stack smashing protection. +(define_expand stack_protect_set + [(match_operand 0 memory_operand) + (match_operand 1 memory_operand)] + +{ + enum machine_mode mode = GET_MODE (operands[0]); + + emit_insn ((mode == DImode + ? gen_stack_protect_set_di + : gen_stack_protect_set_si) (operands[0], operands[1])); + DONE; +}) + +(define_insn stack_protect_set_mode + [(set (match_operand:PTR 0 memory_operand =m) + (unspec:PTR [(match_operand:PTR 1 memory_operand m)] +UNSPEC_SP_SET)) + (set (match_scratch:PTR 2 =r) (const_int 0))] + + ldr\\t%x2, %1\;str\\t%x2, %0\;mov\t%x2,0 + [(set_attr length 12) + (set_attr type multiple)]) + +(define_expand stack_protect_test + [(match_operand 0 memory_operand) + (match_operand 1 memory_operand) + (match_operand 2)] + +{ + + rtx result = gen_reg_rtx (Pmode); + + enum machine_mode mode = GET_MODE (operands[0]); + + emit_insn ((mode == DImode + ? gen_stack_protect_test_di + : gen_stack_protect_test_si) (result, + operands[0], + operands[1])); + + if (mode == DImode) +emit_jump_insn (gen_cbranchdi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx), + result, const0_rtx, operands[2])); + else +emit_jump_insn (gen_cbranchsi4 (gen_rtx_EQ (VOIDmode, result, const0_rtx), + result, const0_rtx, operands[2])); + DONE; +}) + +(define_insn stack_protect_test_mode + [(set (match_operand:PTR 0 register_operand) + (unspec:PTR [(match_operand:PTR 1 memory_operand m) +(match_operand:PTR 2 memory_operand m)] +UNSPEC_SP_TEST)) + (clobber (match_scratch:PTR 3 =r))] + + ldr\t%x3, %x1\;ldr\t%x0, %x2\;eor\t%x0, %x3, %x0 + [(set_attr length 12) + (set_attr type multiple)]) + ;; AdvSIMD Stuff (include aarch64-simd.md)
Re: [PATCH 1/2, AARCH64]: Machine descriptions: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
On 19 March 2014 17:11, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: I have incorporated your review comments and split the patch into two. The first patch attached here contains Aarch64 machine descriptions for the stack protect patterns. ChangeLog. 2014-03-19 Venkataramanan Kumar venkataramanan.ku...@linaro.org * config/aarch64/aarch64.md (stack_protect_set, stack_protect_test) (stack_protect_set_mode, stack_protect_test_mode): Add machine descriptions for Stack Smashing Protector. Tested for aarch64-none-linux-gnu target under QEMU . regards, Venkat. Hi, This is OK for stage-1. Thanks /Marcus
Re: [RFA jit 2/2] introduce scoped_timevar
Trevor == Trevor Saunders tsaund...@mozilla.com writes: Trevor thanks for doing this. I wonder about naming, we already have Trevor auto_vec and while I don't really care wether we use auto_ or Trevor scoped_ it seems like being consistant would be nice. Sounds reasonable to me, I've made this change for v2. Tom
Re: [patch testsuite]: g++.dg/abi
2014-03-19 18:37 GMT+01:00 Joseph S. Myers jos...@codesourcery.com: On Wed, 19 Mar 2014, Kai Tietz wrote: The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. There are already two different checks (check_weak_available and check_weak_override_available), reflecting what different testcases need. Is the requirement for these tests logically different from both of those? If so, maybe there should be a third such check (even if in fact it does the same thing as check_weak_override_available). -- Joseph S. Myers jos...@codesourcery.com On a second thought the disabling of weak-available for mingw-targets seems to be wrong. Actually weak is present. It just has a different meaning. Those testcases are - AFAIU them - actually checking that weaks are available. Nevertheless the check here intends to probe if weak-override is possible. As otherwise weaks make no sense here AFAICS. I don't think that we need to add a third check here. It might be enough to check for weak-override-available instead for those tests. Kai
Re: [4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Oops. Please ignore this for now. I'm preparing a patch series and sent this one prematurely. Thanks, Bill On Wed, 2014-03-19 at 10:25 -0500, Bill Schmidt wrote: Hi, This patch (diff-le-tests) backports adjustments to a few tests for powerpc64le and the ELFv2 ABI. Thanks, Bill
[4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Hi, This patch (diff-le-tests) backports adjustments to a few tests for powerpc64le and the ELFv2 ABI. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-11-27 Bill Schmidt wschm...@linux.vnet.ibm.com * gfortran.dg/nan_7.f90: Disable for little endian PowerPC. Backport from mainline r205106: 2013-11-20 Ulrich Weigand ulrich.weig...@de.ibm.com * gcc.target/powerpc/darwin-longlong.c (msw): Make endian-safe. Backport from mainline r205046: 2013-11-19 Ulrich Weigand ulrich.weig...@de.ibm.com * gcc.target/powerpc/ppc64-abi-2.c (MAKE_SLOT): New macro to construct parameter slot value in endian-independent way. (fcevv, fciievv, fcvevv): Use it. Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:50:39.655337721 +0100 @@ -119,6 +119,12 @@ typedef union vector int v; } vector_int_t; +#ifdef __LITTLE_ENDIAN__ +#define MAKE_SLOT(x, y) ((long)x | ((long)y 32)) +#else +#define MAKE_SLOT(x, y) ((long)y | ((long)x 32)) +#endif + /* Paramter passing. s : gpr 3 v : vpr 2 @@ -226,8 +232,8 @@ fcevv (char *s, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[2].l != 0x10002ULL - || sp-slot[4].l != 0x50006ULL) + if (sp-slot[2].l != MAKE_SLOT (1, 2) + || sp-slot[4].l != MAKE_SLOT (5, 6)) abort(); } @@ -268,8 +274,8 @@ fciievv (char *s, int i, int j, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[4].l != 0x10002ULL - || sp-slot[6].l != 0x50006ULL) + if (sp-slot[4].l != MAKE_SLOT (1, 2) + || sp-slot[6].l != MAKE_SLOT (5, 6)) abort(); } @@ -296,8 +302,8 @@ fcvevv (char *s, vector int x, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[4].l != 0x10002ULL - || sp-slot[6].l != 0x50006ULL) + if (sp-slot[4].l != MAKE_SLOT (1, 2) + || sp-slot[6].l != MAKE_SLOT (5, 6)) abort(); } Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:50:39.659337741 +0100 @@ -11,7 +11,11 @@ int msw(long long in) int i[2]; } ud; ud.ll = in; +#ifdef __LITTLE_ENDIAN__ + return ud.i[1]; +#else return ud.i[0]; +#endif } int main() Index: gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 === --- gcc-4_8-branch.orig/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:50:39.662337756 +0100 @@ -2,6 +2,7 @@ ! { dg-options -fno-range-check } ! { dg-require-effective-target fortran_real_16 } ! { dg-require-effective-target fortran_integer_16 } +! { dg-skip-if { powerpc*le-*-* } { * } { } } ! PR47293 NAN not correctly read character(len=200) :: str real(16) :: r
[RFA jit v2 1/2] introduce class toplev
This patch introduces a new class toplev and changes toplev_main and toplev_finalize to be methods of this class. Additionally, now the timevars are automatically stopped when the object is destroyed. This cleans up compile a bit and makes it simpler to reuse the toplev logic in other code. --- gcc/ChangeLog.jit | 14 + gcc/diagnostic.c | 2 +- gcc/jit/ChangeLog.jit | 5 + gcc/jit/internal-api.c | 25 +- gcc/main.c | 9 gcc/toplev.c | 56 +- gcc/toplev.h | 20 -- 7 files changed, 76 insertions(+), 55 deletions(-) diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index 77ac44c..c590ab1 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,3 +1,17 @@ +2014-03-19 Tom Tromey tro...@redhat.com + + * diagnostic.c (bt_stop): Use toplev::main. + * main.c (main): Update. + * toplev.c (do_compile): Remove argument. Don't check + use_TV_TOTAL. + (toplev::toplev, toplev::~toplev, toplev::start_timevars): New + functions. + (toplev::main): Rename from toplev_main. Update. + (toplev::finalize): Rename from toplev_finalize. Update. + * toplev.h (class toplev): New. + (struct toplev_options): Remove. + (toplev_main, toplev_finalize): Don't declare. + 2014-03-11 David Malcolm dmalc...@redhat.com * gcse.c (gcse_c_finalize): New, to clear test_insn between diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..56dc3ac 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -333,7 +333,7 @@ diagnostic_show_locus (diagnostic_context * context, static const char * const bt_stop[] = { main, - toplev_main, + toplev::main, execute_one_pass, compile_file, }; diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..e45d38c 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,8 @@ +2014-03-19 Tom Tromey tro...@redhat.com + + * internal-api.c (compile): Use toplev, not toplev_options. + Simplify. + 2014-03-19 David Malcolm dmalc...@redhat.com * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..95978bf 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3650,7 +3650,7 @@ compile () /* Call into the rest of gcc. For now, we have to assemble command-line options to pass into - toplev_main, so that they can be parsed. */ + toplev::main, so that they can be parsed. */ /* Pass in user-provided progname, if any, so that it makes it into GCC's progname global, used in various diagnostics. */ @@ -3724,25 +3724,15 @@ compile () ADD_ARG (-fdump-ipa-all); } - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = false; + toplev toplev (false); - if (time_report || !quiet_flag || flag_detailed_statistics) -timevar_init (); - - timevar_start (TV_TOTAL); - - toplev_main (num_args, const_cast char ** (fake_args), toplev_opts); - toplev_finalize (); + toplev.main (num_args, const_cast char ** (fake_args)); + toplev.finalize (); active_playback_ctxt = NULL; if (errors_occurred ()) -{ - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return NULL; -} +return NULL; if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); @@ -3765,8 +3755,6 @@ compile () if (ret) { timevar_pop (TV_ASSEMBLE); - timevar_stop (TV_TOTAL); - timevar_print (stderr); return NULL; } } @@ -3795,9 +3783,6 @@ compile () timevar_pop (TV_LOAD); } - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return result_obj; } diff --git a/gcc/main.c b/gcc/main.c index b893308..4bba041 100644 --- a/gcc/main.c +++ b/gcc/main.c @@ -1,5 +1,5 @@ /* main.c: defines main() for cc1, cc1plus, etc. - Copyright (C) 2007-2013 Free Software Foundation, Inc. + Copyright (C) 2007-2014 Free Software Foundation, Inc. This file is part of GCC. @@ -26,15 +26,14 @@ along with GCC; see the file COPYING3. If not see int main (int argc, char **argv); -/* We define main() to call toplev_main(), which is defined in toplev.c. +/* We define main() to call toplev::main(), which is defined in toplev.c. We do this in a separate file in order to allow the language front-end to define a different main(), if it so desires. */ int main (int argc, char **argv) { - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = true; + toplev toplev (true); - return toplev_main (argc, argv, toplev_opts); + return toplev.main (argc, argv); } diff --git a/gcc/toplev.c b/gcc/toplev.c index f1ac560..5284621 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -1,5 +1,5 @@ /* Top level of GCC compilers (cc1, cc1plus, etc.) - Copyright (C) 1987-2013 Free
[RFA jit v2 0/2] minor refactorings for reuse
Here's a second revision of my patches to the jit branch to clean up toplev and timevar uses a bit. The first revision was here: http://gcc.gnu.org/ml/gcc-patches/2014-03/msg00895.html Compared with that revision, this one hopefully includes the ChangeLog.jit entries; and I took Trevor's suggestion and renamed the timevar class to auto_timevar. Tom
[RFA jit v2 2/2] introduce auto_timevar
This introduces a new auto_timevar class. It pushes a given timevar in its constructor, and pops it in the destructor, giving a much simpler way to use timevars in the typical case where they can be scoped. --- gcc/ChangeLog.jit | 4 gcc/jit/ChangeLog.jit | 4 gcc/jit/internal-api.c | 16 +--- gcc/timevar.h | 26 +- 4 files changed, 38 insertions(+), 12 deletions(-) diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index c590ab1..ee1df88 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-03-19 Tom Tromey tro...@redhat.com + * timevar.h (auto_timevar): New class. + +2014-03-19 Tom Tromey tro...@redhat.com + * diagnostic.c (bt_stop): Use toplev::main. * main.c (main): Update. * toplev.c (do_compile): Remove argument. Don't check diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index e45d38c..69f2412 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,9 @@ 2014-03-19 Tom Tromey tro...@redhat.com + * internal-api.c (compile): Use auto_timevar. + +2014-03-19 Tom Tromey tro...@redhat.com + * internal-api.c (compile): Use toplev, not toplev_options. Simplify. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index 95978bf..090d351 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3737,8 +3737,6 @@ compile () if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); - timevar_push (TV_ASSEMBLE); - /* Gross hacks follow: We have a .s file; we want a .so file. We could reuse parts of gcc/gcc.c to do this. @@ -3746,6 +3744,8 @@ compile () */ /* FIXME: totally faking it for now, not even using pex */ { +auto_timevar assemble_timevar (TV_ASSEMBLE); + char cmd[1024]; snprintf (cmd, 1024, gcc -shared %s -o %s, m_path_s_file, m_path_so_file); @@ -3753,20 +3753,16 @@ compile () printf (cmd: %s\n, cmd); int ret = system (cmd); if (ret) - { - timevar_pop (TV_ASSEMBLE); - return NULL; - } + return NULL; } - timevar_pop (TV_ASSEMBLE); // TODO: split out assembles vs linker /* dlopen the .so file. */ { -const char *error; +auto_timevar load_timevar (TV_LOAD); -timevar_push (TV_LOAD); +const char *error; /* Clear any existing error. */ dlerror (); @@ -3779,8 +3775,6 @@ compile () result_obj = new result (handle); else result_obj = NULL; - -timevar_pop (TV_LOAD); } return result_obj; diff --git a/gcc/timevar.h b/gcc/timevar.h index dc2a8bc..f018e39 100644 --- a/gcc/timevar.h +++ b/gcc/timevar.h @@ -1,5 +1,5 @@ /* Timing variables for measuring compiler performance. - Copyright (C) 2000-2013 Free Software Foundation, Inc. + Copyright (C) 2000-2014 Free Software Foundation, Inc. Contributed by Alex Samuel sam...@codesourcery.com This file is part of GCC. @@ -110,6 +110,30 @@ timevar_pop (timevar_id_t tv) timevar_pop_1 (tv); } +// This is a simple timevar wrapper class that pushes a timevar in its +// constructor and pops the timevar in its destructor. +class auto_timevar +{ + public: + auto_timevar (timevar_id_t tv) +: m_tv (tv) + { +timevar_push (m_tv); + } + + ~auto_timevar () + { +timevar_pop (m_tv); + } + + private: + + // Private to disallow copies. + auto_timevar (const auto_timevar ); + + timevar_id_t m_tv; +}; + extern void print_time (const char *, long); #endif /* ! GCC_TIMEVAR_H */ -- 1.8.5.3
Re: [patch testsuite]: g++.dg/abi
On Mar 19, 2014, at 9:38 AM, Kai Tietz ktiet...@googlemail.com wrote: 2014-03-19 17:23 GMT+01:00 Mike Stump mikest...@comcast.net: On Mar 18, 2014, at 6:16 AM, Kai Tietz ktiet...@googlemail.com wrote: this patch skips anon2.C and anon3.C test for mingw target. Issue here is that weak under pe-coff is different to ELF-targets and therefore test doesn't apply for So, what does the output look like? There should be a trace of weak of some sort in the output. No, there is none. So, does the target support weak?
Re: [PATCH 2/2, AARCH64] Test case changes: Re: [RFC] [PATCH, AARCH64] : Using standard patterns for stack protection.
On 19 March 2014 17:18, Venkataramanan Kumar venkataramanan.ku...@linaro.org wrote: I used the existing dg-require-effective-target check, stack_protector and added it in a separate line. ChangeLog. 2014-03-19 Venkataramanan Kumar venkataramanan.ku...@linaro.org * g++.dg/fstack-protector-strong.C: Add effetive target check for stack protection. * gcc.dg/fstack-protector-strong.c: Likewise. These two tests are passing now for aarch64-none-linux-gnu target under QEMU. Venkat, I think this change is reasonable (for stage-1) but I'd like one of the testsuite maintainers to ACK the change. Cheers /Marcus
Re: [patch testsuite]: g++.dg/abi
2014-03-19 17:54 GMT+01:00 Mike Stump mikest...@comcast.net: On Mar 19, 2014, at 9:49 AM, Rainer Orth r...@cebitec.uni-bielefeld.de wrote: The concept of weak - as present in ELF - isn't known in COFF in general. There is some weak, but it works only for static library and in a limitted way. Therefore we can't (and don't) use it for COFF targets. In that case, it seems far better to have gcc/testsuite/lib/target-support.exp (check_weak_available) reflect that instead of lying about weak support. Yeah, this is the direction I was headed... :-) Ok, I will sent a patch for changing target-support.exp. And yes, target supports a kind of weak, but not the expected gnu-weak. Thanks, Kai
[jit] Avoid shadowing progname global
Committed to branch dmalcolm/jit: gcc/jit/ * internal-api.c (gcc::jit::recording::context::add_error_va): Rename local progname to ctxt_progname to avoid shadowing the related global, for clarity. (gcc::jit::playback::context::compile): Likewise. --- gcc/jit/ChangeLog.jit | 7 +++ gcc/jit/internal-api.c | 22 -- 2 files changed, 19 insertions(+), 10 deletions(-) diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..265242e 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,5 +1,12 @@ 2014-03-19 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::recording::context::add_error_va): + Rename local progname to ctxt_progname to avoid shadowing + the related global, for clarity. + (gcc::jit::playback::context::compile): Likewise. + +2014-03-19 David Malcolm dmalc...@redhat.com + * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: accepts_writes_from): Accept writes from pointers, but not arrays. diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..819800a 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -610,18 +610,19 @@ recording::context::add_error_va (location *loc, const char *fmt, va_list ap) char buf[1024]; vsnprintf (buf, sizeof (buf) - 1, fmt, ap); - const char *progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); - if (!progname) -progname = libgccjit.so; + const char *ctxt_progname = +get_str_option (GCC_JIT_STR_OPTION_PROGNAME); + if (!ctxt_progname) +ctxt_progname = libgccjit.so; if (loc) fprintf (stderr, %s: %s: error: %s\n, -progname, +ctxt_progname, loc-get_debug_string (), buf); else fprintf (stderr, %s: error: %s\n, -progname, +ctxt_progname, buf); if (!m_error_count) @@ -3629,8 +3630,8 @@ playback::context:: compile () { void *handle = NULL; + const char *ctxt_progname; result *result_obj = NULL; - const char *progname; const char *fake_args[20]; unsigned int num_args; @@ -3652,10 +3653,11 @@ compile () For now, we have to assemble command-line options to pass into toplev_main, so that they can be parsed. */ - /* Pass in user-provided progname, if any, so that it makes it - into GCC's progname global, used in various diagnostics. */ - progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); - fake_args[0] = progname ? progname : libgccjit.so; + /* Pass in user-provided program name as argv0, if any, so that it + makes it into GCC's progname global, used in various diagnostics. */ + ctxt_progname = get_str_option (GCC_JIT_STR_OPTION_PROGNAME); + fake_args[0] = +(ctxt_progname ? ctxt_progname : libgccjit.so); fake_args[1] = m_path_c_file; num_args = 2; -- 1.8.5.3
Re: [RFA jit v2 1/2] introduce class toplev
Tom == Tom Tromey tro...@redhat.com writes: Tom This patch introduces a new class toplev and changes toplev_main and Tom toplev_finalize to be methods of this class. Additionally, now the Tom timevars are automatically stopped when the object is destroyed. This Tom cleans up compile a bit and makes it simpler to reuse the toplev Tom logic in other code. David asked me off-list to rename the field in class toplev, so here's a new patch that does this. Tom commit 66f92863ef55c26f673d02dd39027f340940a3bf Author: Tom Tromey tro...@redhat.com Date: Tue Mar 18 08:07:40 2014 -0600 introduce class toplev This patch introduces a new class toplev and changes toplev_main and toplev_finalize to be methods of this class. Additionally, now the timevars are automatically stopped when the object is destroyed. This cleans up compile a bit and makes it simpler to reuse the toplev logic in other code. diff --git a/gcc/ChangeLog.jit b/gcc/ChangeLog.jit index 77ac44c..c590ab1 100644 --- a/gcc/ChangeLog.jit +++ b/gcc/ChangeLog.jit @@ -1,3 +1,17 @@ +2014-03-19 Tom Tromey tro...@redhat.com + + * diagnostic.c (bt_stop): Use toplev::main. + * main.c (main): Update. + * toplev.c (do_compile): Remove argument. Don't check + use_TV_TOTAL. + (toplev::toplev, toplev::~toplev, toplev::start_timevars): New + functions. + (toplev::main): Rename from toplev_main. Update. + (toplev::finalize): Rename from toplev_finalize. Update. + * toplev.h (class toplev): New. + (struct toplev_options): Remove. + (toplev_main, toplev_finalize): Don't declare. + 2014-03-11 David Malcolm dmalc...@redhat.com * gcse.c (gcse_c_finalize): New, to clear test_insn between diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c index 36094a1..56dc3ac 100644 --- a/gcc/diagnostic.c +++ b/gcc/diagnostic.c @@ -333,7 +333,7 @@ diagnostic_show_locus (diagnostic_context * context, static const char * const bt_stop[] = { main, - toplev_main, + toplev::main, execute_one_pass, compile_file, }; diff --git a/gcc/jit/ChangeLog.jit b/gcc/jit/ChangeLog.jit index efb1931..e45d38c 100644 --- a/gcc/jit/ChangeLog.jit +++ b/gcc/jit/ChangeLog.jit @@ -1,3 +1,8 @@ +2014-03-19 Tom Tromey tro...@redhat.com + + * internal-api.c (compile): Use toplev, not toplev_options. + Simplify. + 2014-03-19 David Malcolm dmalc...@redhat.com * internal-api.c (gcc::jit::recording::memento_of_get_pointer:: diff --git a/gcc/jit/internal-api.c b/gcc/jit/internal-api.c index e3ddc4d..95978bf 100644 --- a/gcc/jit/internal-api.c +++ b/gcc/jit/internal-api.c @@ -3650,7 +3650,7 @@ compile () /* Call into the rest of gcc. For now, we have to assemble command-line options to pass into - toplev_main, so that they can be parsed. */ + toplev::main, so that they can be parsed. */ /* Pass in user-provided progname, if any, so that it makes it into GCC's progname global, used in various diagnostics. */ @@ -3724,25 +3724,15 @@ compile () ADD_ARG (-fdump-ipa-all); } - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = false; + toplev toplev (false); - if (time_report || !quiet_flag || flag_detailed_statistics) -timevar_init (); - - timevar_start (TV_TOTAL); - - toplev_main (num_args, const_cast char ** (fake_args), toplev_opts); - toplev_finalize (); + toplev.main (num_args, const_cast char ** (fake_args)); + toplev.finalize (); active_playback_ctxt = NULL; if (errors_occurred ()) -{ - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return NULL; -} +return NULL; if (get_bool_option (GCC_JIT_BOOL_OPTION_DUMP_GENERATED_CODE)) dump_generated_code (); @@ -3765,8 +3755,6 @@ compile () if (ret) { timevar_pop (TV_ASSEMBLE); - timevar_stop (TV_TOTAL); - timevar_print (stderr); return NULL; } } @@ -3795,9 +3783,6 @@ compile () timevar_pop (TV_LOAD); } - timevar_stop (TV_TOTAL); - timevar_print (stderr); - return result_obj; } diff --git a/gcc/main.c b/gcc/main.c index b893308..4bba041 100644 --- a/gcc/main.c +++ b/gcc/main.c @@ -1,5 +1,5 @@ /* main.c: defines main() for cc1, cc1plus, etc. - Copyright (C) 2007-2013 Free Software Foundation, Inc. + Copyright (C) 2007-2014 Free Software Foundation, Inc. This file is part of GCC. @@ -26,15 +26,14 @@ along with GCC; see the file COPYING3. If not see int main (int argc, char **argv); -/* We define main() to call toplev_main(), which is defined in toplev.c. +/* We define main() to call toplev::main(), which is defined in toplev.c. We do this in a separate file in order to allow the language front-end to define a different main(), if it so desires. */ int main (int argc, char **argv) { - toplev_options toplev_opts; - toplev_opts.use_TV_TOTAL = true; + toplev toplev (true); - return toplev_main (argc, argv,
[PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr type multiple)] ) +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r,r] + TARGET_32BIT + # + TARGET_32BIT reload_completed + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *anddi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr type multiple)] ) +(define_insn_and_split *iordi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) + (zero_extend:DI +(match_operand:SI 1 s_register_operand r,r] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *iordi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +anddi_di_notdi (s64int a, s64int b) +{ + return (a ~b); +} + +s64int +anddi_di_notzesidi (s64int a, u32int b) +{ + return (a ~(u64int) b); +} + +s64int +anddi_notdi_zesidi (s64int a, u32int b) +{ + return (~a (u64int) b); +} + +s64int +anddi_di_notsesidi (s64int a, s32int b) +{ + return (a ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x5f470112ll; + s64int c64 = 0xdeadbeef300fll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = anddi_di_notdi (c64, b64); + if (z != 0xdeadbeef2008ll) +abort (); + + z = anddi_di_notzesidi (a64, c32); + if (z != 0xdeadbeefb0b0ll) +abort (); + + z = anddi_notdi_zesidi (c64, c32); + if (z != 0x01104f4fll) +abort (); + + z = anddi_di_notsesidi (a64, d32); + if (z != 0x0531ll) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times bic\t 6 } } */ + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c index cda9c0e..249f080 100644 --- a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c +++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c @@ -9,19 +9,25 @@ typedef unsigned long long u64int; typedef unsigned int u32int; s64int -iordi_notdi (s64int a, s64int b) +iordi_di_notdi (s64int a, s64int b) { return (a | ~b); } s64int -iordi_notzesidi (s64int a, u32int b)
Re: [PATCH] Fix PR60505
On Tue, Mar 18, 2014 at 4:43 AM, Richard Biener rguent...@suse.de wrote: On Mon, 17 Mar 2014, Cong Hou wrote: On Mon, Mar 17, 2014 at 6:44 AM, Richard Biener rguent...@suse.de wrote: On Fri, 14 Mar 2014, Cong Hou wrote: On Fri, Mar 14, 2014 at 12:58 AM, Richard Biener rguent...@suse.de wrote: On Fri, 14 Mar 2014, Jakub Jelinek wrote: On Fri, Mar 14, 2014 at 08:52:07AM +0100, Richard Biener wrote: Consider this fact and if there are alias checks, we can safely remove the epilogue if the maximum trip count of the loop is less than or equal to the calculated threshold. You have to consider n % vf != 0, so an argument on only maximum trip count or threshold cannot work. Well, if you only check if maximum trip count is = vf and you know that for n vf the vectorized loop + it's epilogue path will not be taken, then perhaps you could, but it is a very special case. Now, the question is when we are guaranteed we enter the scalar versioned loop instead for n vf, is that in case of versioning for alias or versioning for alignment? I think neither - I have plans to do the cost model check together with the versioning condition but didn't get around to implement that. That would allow stronger max bounds for the epilogue loop. In vect_transform_loop(), check_profitability will be set to true if th = VF-1 and the number of iteration is unknown (we only consider unknown trip count here), where th is calculated based on the parameter PARAM_MIN_VECT_LOOP_BOUND and cost model, with the minimum value VF-1. If the loop needs to be versioned, then check_profitability with true value will be passed to vect_loop_versioning(), in which an enhanced loop bound check (considering cost) will be built. So I think if the loop is versioned and n VF, then we must enter the scalar version, and in this case removing epilogue should be safe when the maximum trip count = th+1. You mean exactly in the case where the profitability check ensures that n % vf == 0? Thus effectively if n == maximum trip count? That's quite a special case, no? Yes, it is a special case. But it is in this special case that those warnings are thrown out. Also, I think declaring an array with VF*N as length is not unusual. Ok, but then for the patch compute the cost model threshold once in vect_analyze_loop_2 and store it in a new LOOP_VINFO_COST_MODEL_THRESHOLD. Done. Also you have to check the return value from max_stmt_executions_int as that may return -1 if the number cannot be computed (or isn't representable in a HOST_WIDE_INT). It will be converted to unsigned type so that -1 means infinity. You also should check for LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT which should have the same effect on the cost model check. Done. The existing condition is already complicated enough - adding new stuff warrants comments before the (sub-)checks. OK. Comments added. Below is the revised patch. Bootstrapped and tested on a x86-64 machine. Cong diff --git a/gcc/ChangeLog b/gcc/ChangeLog index e1d8666..eceefb3 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,18 @@ +2014-03-11 Cong Hou co...@google.com + + PR tree-optimization/60505 + * tree-vectorizer.h (struct _stmt_vec_info): Add th field as the + threshold of number of iterations below which no vectorization will be + done. + * tree-vect-loop.c (new_loop_vec_info): + Initialize LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_operations): + Set LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_transform_loop): + Use LOOP_VINFO_COST_MODEL_THRESHOLD. + * tree-vect-loop.c (vect_analyze_loop_2): Check the maximum number + of iterations of the loop and see if we should build the epilogue. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 41b6875..09ec1c0 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2014-03-11 Cong Hou co...@google.com + + PR tree-optimization/60505 + * gcc.dg/vect/pr60505.c: New test. + 2014-03-10 Jakub Jelinek ja...@redhat.com PR ipa/60457 diff --git a/gcc/testsuite/gcc.dg/vect/pr60505.c b/gcc/testsuite/gcc.dg/vect/pr60505.c new file mode 100644 index 000..6940513 --- /dev/null +++ b/gcc/testsuite/gcc.dg/vect/pr60505.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-additional-options -Wall -Werror } */ + +void foo(char *in, char *out, int num) +{ + int i; + char ovec[16] = {0}; + + for(i = 0; i num ; ++i) +out[i] = (ovec[i] = in[i]); + out[num] = ovec[num/2]; +} diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index df6ab6f..1c78e11 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -933,6 +933,7 @@ new_loop_vec_info (struct loop *loop) LOOP_VINFO_NITERS (res) = NULL;
[4.8, PATCH 0/26] Backport Power8 and LE support
Hi, Support for Power8 features and the new powerpc64le-linux-gnu target, including the ELFv2 ABI, has been developed up till now on the ibm/gcc-4_8-branch. It was appropriate to use this separate branch while the support was unstable, but this branch will not represent a particularly good support mechanism for distributions going forward. Most distros are set up to pull from the major release branches, and having a separate branch for one target is quite inconvenient. Also, the ibm/gcc-4_8-branch's original purpose is to serve as the code base for IBM's Advance Toolchain 7.0. Over time the two purposes that the branch currently serves will diverge and make things even more complicated. The code is now tested and stable enough that we are ready to backport this support to the FSF 4.8 branch. This patch series constitutes that backport. Almost all of the changes are specific to PowerPC portions of the code, and for those patches I am only CCing David. However, some of the patches require changes to common code, and for these I will CC Richard and Jakub. Three of these are slightly unrelated but necessary patches, one to enable decimal float ABS builtins, and two others to fix PR54537 and PR56843. In addition there are patches that update configuration files throughout for the new target, and some small changes in common call support (call.c, expr.h, function.c) to support how the new ABI handles calls. I realize it is unusual to backport such a large amount of code, but we have been asked by distribution partners to do this, and we feel it makes good sense for long-term support. I have tested the patch series by applying it to a clean FSF 4.8 branch and comparing the test results against those from the IBM 4.8 branch on three systems: * Power8, little endian (--mcpu=power8) * Power8, big endian (--mcpu=power8) * Power7, big endian (--mcpu=power7) I also checked a recursive diff against the two source directories to ensure that no patches were missed. Thanks, Bill [ 1/26] diff-p8 [ 2/26] diff-p8-htm [ 3/26] diff-le-config [ 4/26] diff-le-libtool [ 5/26] diff-le-tests [ 6/26] diff-le-dfp [ 7/26] diff-le-vector [ 8/26] diff-abi-compat [ 9/26] diff-abi-calls [10/26] diff-abi-elfv2 [11/26] diff-abi-gotest [12/26] diff-le-align [13/26] diff-abi-libffi [14/26] diff-dfp-abs [15/26] diff-pr54537 [16/26] diff-pr56843 [17/26] diff-direct-move [18/26] diff-le-config-2 [19/26] diff-quad-memory [20/26] diff-lra [21/26] diff-le-vector-api [22/26] diff-mcall [23/26] diff-pr60137-pr60203 [24/26] diff-reload [25/26] diff-v1ti [26/26] diff-trunk-missing
[4.8, PATCH 3/26] Backport Power8 and LE support: Configury bits 1
Hi, This patch (diff-le-config) backports updates to more recent config.guess and config.sub versions to support the new powerpc64le target. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r203071: 2013-10-01 Joern Rennecke joern.renne...@embecosm.com Import from savannah.gnu.org: * config.guess: Update to 2013-06-10 version. * config.sub: Update to 2013-10-01 version. Index: gcc-4_8-branch/config.guess === --- gcc-4_8-branch.orig/config.guess2013-12-28 17:41:32.765630566 +0100 +++ gcc-4_8-branch/config.guess 2013-12-28 17:50:37.995329461 +0100 @@ -1,10 +1,8 @@ #! /bin/sh # Attempt to guess a canonical system name. -# Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, -# 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, -# 2011, 2012, 2013 Free Software Foundation, Inc. +# Copyright 1992-2013 Free Software Foundation, Inc. -timestamp='2012-12-30' +timestamp='2013-06-10' # This file is free software; you can redistribute it and/or modify it # under the terms of the GNU General Public License as published by @@ -52,9 +50,7 @@ version=\ GNU config.guess ($timestamp) Originally written by Per Bothner. -Copyright (C) 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, -2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, -2012, 2013 Free Software Foundation, Inc. +Copyright 1992-2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. @@ -136,6 +132,27 @@ UNAME_RELEASE=`(uname -r) 2/dev/null` | UNAME_SYSTEM=`(uname -s) 2/dev/null` || UNAME_SYSTEM=unknown UNAME_VERSION=`(uname -v) 2/dev/null` || UNAME_VERSION=unknown +case ${UNAME_SYSTEM} in +Linux|GNU|GNU/*) + # If the system lacks a compiler, then just pick glibc. + # We could probably try harder. + LIBC=gnu + + eval $set_cc_for_build + cat -EOF $dummy.c + #include features.h + #if defined(__UCLIBC__) + LIBC=uclibc + #elif defined(__dietlibc__) + LIBC=dietlibc + #else + LIBC=gnu + #endif + EOF + eval `$CC_FOR_BUILD -E $dummy.c 2/dev/null | grep '^LIBC'` + ;; +esac + # Note: order is significant - the case branches are not exclusive. case ${UNAME_MACHINE}:${UNAME_SYSTEM}:${UNAME_RELEASE}:${UNAME_VERSION} in @@ -857,21 +874,21 @@ EOF exit ;; *:GNU:*:*) # the GNU system - echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-gnu`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` + echo `echo ${UNAME_MACHINE}|sed -e 's,[-/].*$,,'`-unknown-${LIBC}`echo ${UNAME_RELEASE}|sed -e 's,/.*$,,'` exit ;; *:GNU/*:*:*) # other systems with GNU libc and userland - echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-gnu + echo ${UNAME_MACHINE}-unknown-`echo ${UNAME_SYSTEM} | sed 's,^[^/]*/,,' | tr '[A-Z]' '[a-z]'``echo ${UNAME_RELEASE}|sed -e 's/[-(].*//'`-${LIBC} exit ;; i*86:Minix:*:*) echo ${UNAME_MACHINE}-pc-minix exit ;; aarch64:Linux:*:*) - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; aarch64_be:Linux:*:*) UNAME_MACHINE=aarch64_be - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; alpha:Linux:*:*) case `sed -n '/^cpu model/s/^.*: \(.*\)/\1/p' /proc/cpuinfo` in @@ -884,59 +901,54 @@ EOF EV68*) UNAME_MACHINE=alphaev68 ;; esac objdump --private-headers /bin/sh | grep -q ld.so.1 - if test $? = 0 ; then LIBC=libc1 ; else LIBC= ; fi - echo ${UNAME_MACHINE}-unknown-linux-gnu${LIBC} + if test $? = 0 ; then LIBC=gnulibc1 ; fi + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} + exit ;; +arc:Linux:*:* | arceb:Linux:*:*) + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} exit ;; arm*:Linux:*:*) eval $set_cc_for_build if echo __ARM_EABI__ | $CC_FOR_BUILD -E - 2/dev/null \ | grep -q __ARM_EABI__ then - echo ${UNAME_MACHINE}-unknown-linux-gnu + echo ${UNAME_MACHINE}-unknown-linux-${LIBC} else if echo __ARM_PCS_VFP | $CC_FOR_BUILD -E - 2/dev/null \ | grep -q __ARM_PCS_VFP then - echo ${UNAME_MACHINE}-unknown-linux-gnueabi + echo ${UNAME_MACHINE}-unknown-linux-${LIBC}eabi else - echo ${UNAME_MACHINE}-unknown-linux-gnueabihf + echo ${UNAME_MACHINE}-unknown-linux-${LIBC}eabihf fi fi exit ;; avr32*:Linux:*:*) - echo
[4.8, PATCH 5/26] Backport Power8 and LE support: Test adjustments
Hi, This patch (diff-le-tests) backports adjustments to a few tests for powerpc64le and the ELFv2 ABI. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-11-27 Bill Schmidt wschm...@linux.vnet.ibm.com * gfortran.dg/nan_7.f90: Disable for little endian PowerPC. Backport from mainline r205106: 2013-11-20 Ulrich Weigand ulrich.weig...@de.ibm.com * gcc.target/powerpc/darwin-longlong.c (msw): Make endian-safe. Backport from mainline r205046: 2013-11-19 Ulrich Weigand ulrich.weig...@de.ibm.com * gcc.target/powerpc/ppc64-abi-2.c (MAKE_SLOT): New macro to construct parameter slot value in endian-independent way. (fcevv, fciievv, fcvevv): Use it. Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/ppc64-abi-2.c 2013-12-28 17:50:39.655337721 +0100 @@ -119,6 +119,12 @@ typedef union vector int v; } vector_int_t; +#ifdef __LITTLE_ENDIAN__ +#define MAKE_SLOT(x, y) ((long)x | ((long)y 32)) +#else +#define MAKE_SLOT(x, y) ((long)y | ((long)x 32)) +#endif + /* Paramter passing. s : gpr 3 v : vpr 2 @@ -226,8 +232,8 @@ fcevv (char *s, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[2].l != 0x10002ULL - || sp-slot[4].l != 0x50006ULL) + if (sp-slot[2].l != MAKE_SLOT (1, 2) + || sp-slot[4].l != MAKE_SLOT (5, 6)) abort(); } @@ -268,8 +274,8 @@ fciievv (char *s, int i, int j, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[4].l != 0x10002ULL - || sp-slot[6].l != 0x50006ULL) + if (sp-slot[4].l != MAKE_SLOT (1, 2) + || sp-slot[6].l != MAKE_SLOT (5, 6)) abort(); } @@ -296,8 +302,8 @@ fcvevv (char *s, vector int x, ...) sp = __builtin_frame_address(0); sp = sp-backchain; - if (sp-slot[4].l != 0x10002ULL - || sp-slot[6].l != 0x50006ULL) + if (sp-slot[4].l != MAKE_SLOT (1, 2) + || sp-slot[6].l != MAKE_SLOT (5, 6)) abort(); } Index: gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c === --- gcc-4_8-branch.orig/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gcc.target/powerpc/darwin-longlong.c 2013-12-28 17:50:39.659337741 +0100 @@ -11,7 +11,11 @@ int msw(long long in) int i[2]; } ud; ud.ll = in; +#ifdef __LITTLE_ENDIAN__ + return ud.i[1]; +#else return ud.i[0]; +#endif } int main() Index: gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 === --- gcc-4_8-branch.orig/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:41:32.430628909 +0100 +++ gcc-4_8-branch/gcc/testsuite/gfortran.dg/nan_7.f90 2013-12-28 17:50:39.662337756 +0100 @@ -2,6 +2,7 @@ ! { dg-options -fno-range-check } ! { dg-require-effective-target fortran_real_16 } ! { dg-require-effective-target fortran_integer_16 } +! { dg-skip-if { powerpc*le-*-* } { * } { } } ! PR47293 NAN not correctly read character(len=200) :: str real(16) :: r
[4.8, PATCH 8/26] Backport Power8 and LE support: PR57949
Hi, This patch (diff-abi-compat) backports the ABI compatibility fix for PR57949. Thanks, Bill [gcc] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r201750. 2013-11-15 Ulrich Weigand ulrich.weig...@de.ibm.com Note: Default setting of -mcompat-align-parm inverted! 2013-08-14 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/57949 * doc/invoke.texi: Add documentation of mcompat-align-parm option. * config/rs6000/rs6000.opt: Add mcompat-align-parm option. * config/rs6000/rs6000.c (rs6000_function_arg_boundary): For AIX and Linux, correct BLKmode alignment when 128-bit alignment is required and compatibility flag is not set. (rs6000_gimplify_va_arg): For AIX and Linux, honor specified alignment for zero-size arguments when compatibility flag is not set. [gcc/testsuite] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r201750. 2013-11-15 Ulrich Weigand ulrich.weig...@de.ibm.com Note: Default setting of -mcompat-align-parm inverted! 2013-08-14 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/57949 * gcc.target/powerpc/pr57949-1.c: New. * gcc.target/powerpc/pr57949-2.c: New. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -8680,8 +8680,8 @@ rs6000_function_arg_boundary (enum machi || (type TREE_CODE (type) == VECTOR_TYPE int_size_in_bytes (type) = 16)) return 128; - else if (TARGET_MACHO - rs6000_darwin64_abi + else if (((TARGET_MACHO rs6000_darwin64_abi) +|| (DEFAULT_ABI == ABI_AIX !rs6000_compat_align_parm)) mode == BLKmode type TYPE_ALIGN (type) 64) return 128; @@ -10233,8 +10233,9 @@ rs6000_gimplify_va_arg (tree valist, tre We don't need to check for pass-by-reference because of the test above. We can return a simplifed answer, since we know there's no offset to add. */ - if (TARGET_MACHO - rs6000_darwin64_abi + if (((TARGET_MACHO + rs6000_darwin64_abi) + || (DEFAULT_ABI == ABI_AIX !rs6000_compat_align_parm)) integer_zerop (TYPE_SIZE (type))) { unsigned HOST_WIDE_INT align, boundary; Index: gcc-4_8-test/gcc/config/rs6000/rs6000.opt === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.opt +++ gcc-4_8-test/gcc/config/rs6000/rs6000.opt @@ -550,6 +550,10 @@ mquad-memory Target Report Mask(QUAD_MEMORY) Var(rs6000_isa_flags) Generate the quad word memory instructions (lq/stq/lqarx/stqcx). +mcompat-align-parm +Target Report Var(rs6000_compat_align_parm) Init(1) Save +Generate aggregate parameter passing code with at most 64-bit alignment. + mupper-regs-df Target Undocumented Mask(UPPER_REGS_DF) Var(rs6000_isa_flags) Allow double variables in upper registers with -mcpu=power7 or -mvsx Index: gcc-4_8-test/gcc/doc/invoke.texi === --- gcc-4_8-test.orig/gcc/doc/invoke.texi +++ gcc-4_8-test/gcc/doc/invoke.texi @@ -17243,7 +17243,8 @@ following options: -mpopcntb -mpopcntd -mpowerpc64 @gol -mpowerpc-gpopt -mpowerpc-gfxopt -msingle-float -mdouble-float @gol -msimple-fpu -mstring -mmulhw -mdlmzb -mmfpgpr -mvsx @gol --mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory} +-mcrypto -mdirect-move -mpower8-fusion -mpower8-vector -mquad-memory @gol +-mcompat-align-parm -mno-compat-align-parm} The particular options set for any particular CPU varies between compiler versions, depending on what setting seems to produce optimal @@ -18128,6 +18129,23 @@ stack location in the function prologue a pointer on AIX and 64-bit Linux systems. If the TOC value is not saved in the prologue, it is saved just before the call through the pointer. The @option{-mno-save-toc-indirect} option is the default. + +@item -mcompat-align-parm +@itemx -mno-compat-align-parm +@opindex mcompat-align-parm +Generate (do not generate) code to pass structure parameters with a +maximum alignment of 64 bits, for compatibility with older versions +of GCC. + +Older versions of GCC (prior to 4.9.0) incorrectly did not align a +structure parameter on a 128-bit boundary when that structure contained +a member requiring 128-bit alignment. This is corrected in more +recent versions of GCC. This option may be used to generate code +that is compatible with functions compiled with older versions of +GCC. + +In this version of the compiler, the @option{-mcompat-align-parm} +is the default, except when using the Linux ELFv2 ABI. @end table @node RX Options Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr57949-1.c
[4.8, PATCH 9/26] Backport Power8 and LE support: ABI call support
Hi, This patch (diff-abi-calls) backports fixes to common code to support the new ELFv2 ABI. Copying Richard and Jakub for these bits. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r204798: 2013-11-14 Ulrich Weigand ulrich.weig...@de.ibm.com Alan Modra amo...@gmail.com * function.c (assign_parms): Use all.reg_parm_stack_space instead of re-evaluating REG_PARM_STACK_SPACE target macro. (locate_and_pad_parm): New parameter REG_PARM_STACK_SPACE. Use it instead of evaluating target macro REG_PARM_STACK_SPACE every time. (assign_parm_find_entry_rtl): Update call. * calls.c (initialize_argument_information): Update call. (emit_library_call_value_1): Likewise. * expr.h (locate_and_pad_parm): Update prototype. Backport from mainline r204797: 2013-11-14 Ulrich Weigand ulrich.weig...@de.ibm.com * calls.c (store_unaligned_arguments_into_pseudos): Skip PARALLEL arguments. Backport from mainline r197003: 2013-03-23 Eric Botcazou ebotca...@adacore.com * calls.c (expand_call): Add missing guard to code handling return of non-BLKmode structures in MSB. * function.c (expand_function_end): Likewise. Index: gcc-4_8-branch/gcc/calls.c === --- gcc-4_8-branch.orig/gcc/calls.c 2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/calls.c 2013-12-28 17:50:43.356356135 +0100 @@ -983,6 +983,7 @@ store_unaligned_arguments_into_pseudos ( for (i = 0; i num_actuals; i++) if (args[i].reg != 0 ! args[i].pass_on_stack +GET_CODE (args[i].reg) != PARALLEL args[i].mode == BLKmode MEM_P (args[i].value) (MEM_ALIGN (args[i].value) @@ -1327,6 +1328,7 @@ initialize_argument_information (int num #else args[i].reg != 0, #endif +reg_parm_stack_space, args[i].pass_on_stack ? 0 : args[i].partial, fndecl, args_size, args[i].locate); #ifdef BLOCK_REG_PADDING @@ -3171,7 +3173,9 @@ expand_call (tree exp, rtx target, int i group load/store machinery below. */ if (!structure_value_addr !pcc_struct_value + TYPE_MODE (rettype) != VOIDmode TYPE_MODE (rettype) != BLKmode + REG_P (valreg) targetm.calls.return_in_msb (rettype)) { if (shift_return_value (TYPE_MODE (rettype), false, valreg)) @@ -3734,7 +3738,8 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - 0, NULL_TREE, args_size, argvec[count].locate); + reg_parm_stack_space, 0, + NULL_TREE, args_size, argvec[count].locate); if (argvec[count].reg == 0 || argvec[count].partial != 0 || reg_parm_stack_space 0) @@ -3821,7 +3826,7 @@ emit_library_call_value_1 (int retval, r #else argvec[count].reg != 0, #endif - argvec[count].partial, + reg_parm_stack_space, argvec[count].partial, NULL_TREE, args_size, argvec[count].locate); args_size.constant += argvec[count].locate.size.constant; gcc_assert (!argvec[count].locate.size.var); Index: gcc-4_8-branch/gcc/function.c === --- gcc-4_8-branch.orig/gcc/function.c 2013-12-28 17:41:32.056627059 +0100 +++ gcc-4_8-branch/gcc/function.c 2013-12-28 17:50:43.362356165 +0100 @@ -2507,6 +2507,7 @@ assign_parm_find_entry_rtl (struct assig } locate_and_pad_parm (data-promoted_mode, data-passed_type, in_regs, + all-reg_parm_stack_space, entry_parm ? data-partial : 0, current_function_decl, all-stack_args_size, data-locate); @@ -3485,11 +3486,7 @@ assign_parms (tree fndecl) /* Adjust function incoming argument size for alignment and minimum length. */ -#ifdef REG_PARM_STACK_SPACE - crtl-args.size = MAX (crtl-args.size, - REG_PARM_STACK_SPACE (fndecl)); -#endif - + crtl-args.size = MAX (crtl-args.size, all.reg_parm_stack_space); crtl-args.size = CEIL_ROUND (crtl-args.size, PARM_BOUNDARY / BITS_PER_UNIT); @@ -3693,6 +3690,9 @@ gimplify_parameters (void) IN_REGS is nonzero if the argument will be passed in registers. It will never be set if REG_PARM_STACK_SPACE is not defined. + REG_PARM_STACK_SPACE is the number of bytes of stack space reserved + for arguments which are passed in registers. + FNDECL is the function in which the argument was defined. There are two types of
Re: [RFA jit v2 1/2] introduce class toplev
On Wed, 2014-03-19 at 12:10 -0600, Tom Tromey wrote: Tom == Tom Tromey tro...@redhat.com writes: Tom This patch introduces a new class toplev and changes toplev_main and Tom toplev_finalize to be methods of this class. Additionally, now the Tom timevars are automatically stopped when the object is destroyed. This Tom cleans up compile a bit and makes it simpler to reuse the toplev Tom logic in other code. David asked me off-list to rename the field in class toplev, so here's a new patch that does this. Thanks! (yes, I greatly prefer having member data of a class to have a m_ prefix, and for the ctor params to have equivalent names, without the prefix, which this patch does, for toplev). Tom commit 66f92863ef55c26f673d02dd39027f340940a3bf Author: Tom Tromey tro...@redhat.com Date: Tue Mar 18 08:07:40 2014 -0600 introduce class toplev This patch introduces a new class toplev and changes toplev_main and toplev_finalize to be methods of this class. Additionally, now the timevars are automatically stopped when the object is destroyed. This cleans up compile a bit and makes it simpler to reuse the toplev logic in other code. OK. Are you able to push this to my branch, or do you need me to do this?
[4.8, PATCH 11/26] Backport Power8 and LE support: gotest
Hi, This patch (diff-abi-gotest) backports enablement of the Go testsuite for powerpc64le. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r205000. 2013-11-19 Ulrich Weigand ulrich.weig...@de.ibm.com gotest: Recognize PPC ELF v2 function pointers in text section. Index: gcc-4_8-branch/libgo/testsuite/gotest === --- gcc-4_8-branch.orig/libgo/testsuite/gotest 2013-12-28 17:41:31.783625708 +0100 +++ gcc-4_8-branch/libgo/testsuite/gotest 2013-12-28 17:50:45.671367653 +0100 @@ -369,7 +369,7 @@ localname() { { text=T case $GOARCH in - ppc64) text=D ;; + ppc64) text=[TD] ;; esac symtogo='sed -e s/_test/XXXtest/ -e s/.*_\([^_]*\.\)/\1/ -e s/XXXtest/_test/'
[4.8, PATCH 12/26] Backport Power8 and LE support: Defaults
Hi, This patch (diff-le-align) sets some miscellaneous defaults for little endian support. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Apply mainline r205060. 2013-11-20 Alan Modra amo...@gmail.com * config/rs6000/sysv4.h (CC1_ENDIAN_LITTLE_SPEC): Define as empty. * config/rs6000/rs6000.c (rs6000_option_override_internal): Default to strict alignment on older processors when little-endian. * config/rs6000/linux64.h (PROCESSOR_DEFAULT64): Default to power8 for ELFv2. Index: gcc-4_8-branch/gcc/config/rs6000/linux64.h === --- gcc-4_8-branch.orig/gcc/config/rs6000/linux64.h 2013-12-28 17:50:44.252360594 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/linux64.h 2013-12-28 17:50:46.356371060 +0100 @@ -71,7 +71,11 @@ extern int dot_symbols; #undef PROCESSOR_DEFAULT #define PROCESSOR_DEFAULT PROCESSOR_POWER7 #undef PROCESSOR_DEFAULT64 +#ifdef LINUX64_DEFAULT_ABI_ELFv2 +#define PROCESSOR_DEFAULT64 PROCESSOR_POWER8 +#else #define PROCESSOR_DEFAULT64 PROCESSOR_POWER7 +#endif /* We don't need to generate entries in .fixup, except when -mrelocatable or -mrelocatable-lib is given. */ Index: gcc-4_8-branch/gcc/config/rs6000/rs6000.c === --- gcc-4_8-branch.orig/gcc/config/rs6000/rs6000.c 2013-12-28 17:50:44.219360429 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/rs6000.c 2013-12-28 17:50:46.369371125 +0100 @@ -3206,6 +3206,12 @@ rs6000_option_override_internal (bool gl } } + /* If little-endian, default to -mstrict-align on older processors. + Testing for htm matches power8 and later. */ + if (!BYTES_BIG_ENDIAN + !(processor_target_table[tune_index].target_enable OPTION_MASK_HTM)) +rs6000_isa_flags |= ~rs6000_isa_flags_explicit OPTION_MASK_STRICT_ALIGN; + /* Add some warnings for VSX. */ if (TARGET_VSX) { Index: gcc-4_8-branch/gcc/config/rs6000/sysv4.h === --- gcc-4_8-branch.orig/gcc/config/rs6000/sysv4.h 2013-12-28 17:50:44.243360549 +0100 +++ gcc-4_8-branch/gcc/config/rs6000/sysv4.h2013-12-28 17:50:46.374371150 +0100 @@ -538,12 +538,7 @@ ENDIAN_SELECT( -mbig, -mlittle, DEF #defineCC1_ENDIAN_BIG_SPEC -#defineCC1_ENDIAN_LITTLE_SPEC \ -%{!mstrict-align: %{!mno-strict-align: \ -%{!mcall-i960-old: \ - -mstrict-align \ -} \ -}} +#defineCC1_ENDIAN_LITTLE_SPEC #defineCC1_ENDIAN_DEFAULT_SPEC %(cc1_endian_big)
[4.8, PATCH 14/26] Backport Power8 and LE support: DFP absolute value
Hi, This patch (diff-dfp-abs) backports some unrelated but necessary work to enable the DFP absolute value builtins. Copying Jakub who was involved with the original patch. Thanks, Bill 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-08-19 Peter Bergner berg...@vnet.ibm.com Jakub Jelinek ja...@redhat.com * builtins.def (BUILT_IN_FABSD32): New DFP ABS builtin. (BUILT_IN_FABSD64): Likewise. (BUILT_IN_FABSD128): Likewise. * builtins.c (expand_builtin): Add support for new DFP ABS builtins. (fold_builtin_1): Likewise. * config/rs6000/dfp.md (*abstd2_fpr): Handle non-overlapping destination and source operands. (*nabstd2_fpr): Likewise. 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-08-19 Peter Bergner berg...@vnet.ibm.com * gcc.target/powerpc/dfp-dd-2.c: New test. * gcc.target/powerpc/dfp-td-2.c: Likewise. * gcc.target/powerpc/dfp-td-3.c: Likewise. Index: gcc-4_8-test/gcc/builtins.c === --- gcc-4_8-test.orig/gcc/builtins.c +++ gcc-4_8-test/gcc/builtins.c @@ -5861,6 +5861,9 @@ expand_builtin (tree exp, rtx target, rt switch (fcode) { CASE_FLT_FN (BUILT_IN_FABS): +case BUILT_IN_FABSD32: +case BUILT_IN_FABSD64: +case BUILT_IN_FABSD128: target = expand_builtin_fabs (exp, target, subtarget); if (target) return target; @@ -10313,6 +10316,9 @@ fold_builtin_1 (location_t loc, tree fnd return fold_builtin_strlen (loc, type, arg0); CASE_FLT_FN (BUILT_IN_FABS): +case BUILT_IN_FABSD32: +case BUILT_IN_FABSD64: +case BUILT_IN_FABSD128: return fold_builtin_fabs (loc, arg0, type); case BUILT_IN_ABS: Index: gcc-4_8-test/gcc/builtins.def === --- gcc-4_8-test.orig/gcc/builtins.def +++ gcc-4_8-test/gcc/builtins.def @@ -252,6 +252,9 @@ DEF_C99_BUILTIN(BUILT_IN_EXPM1L, DEF_LIB_BUILTIN(BUILT_IN_FABS, fabs, BT_FN_DOUBLE_DOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSF, fabsf, BT_FN_FLOAT_FLOAT, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_C90RES_BUILTIN (BUILT_IN_FABSL, fabsl, BT_FN_LONGDOUBLE_LONGDOUBLE, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD32, fabsd32, BT_FN_DFLOAT32_DFLOAT32, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD64, fabsd64, BT_FN_DFLOAT64_DFLOAT64, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_FABSD128, fabsd128, BT_FN_DFLOAT128_DFLOAT128, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_C99_BUILTIN(BUILT_IN_FDIM, fdim, BT_FN_DOUBLE_DOUBLE_DOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIMF, fdimf, BT_FN_FLOAT_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO) DEF_C99_BUILTIN(BUILT_IN_FDIML, fdiml, BT_FN_LONGDOUBLE_LONGDOUBLE_LONGDOUBLE, ATTR_MATHFN_FPROUNDING_ERRNO) Index: gcc-4_8-test/gcc/config/rs6000/dfp.md === --- gcc-4_8-test.orig/gcc/config/rs6000/dfp.md +++ gcc-4_8-test/gcc/config/rs6000/dfp.md @@ -148,18 +148,24 @@ ) (define_insn *abstd2_fpr - [(set (match_operand:TD 0 gpc_reg_operand =d) - (abs:TD (match_operand:TD 1 gpc_reg_operand d)))] + [(set (match_operand:TD 0 gpc_reg_operand =d,d) + (abs:TD (match_operand:TD 1 gpc_reg_operand 0,d)))] TARGET_HARD_FLOAT TARGET_FPRS - fabs %0,%1 - [(set_attr type fp)]) + @ + fabs %0,%1 + fabs %0,%1\;fmr %L0,%L1 + [(set_attr type fp) + (set_attr length 4,8)]) (define_insn *nabstd2_fpr - [(set (match_operand:TD 0 gpc_reg_operand =d) - (neg:TD (abs:TD (match_operand:TD 1 gpc_reg_operand d] + [(set (match_operand:TD 0 gpc_reg_operand =d,d) + (neg:TD (abs:TD (match_operand:TD 1 gpc_reg_operand 0,d] TARGET_HARD_FLOAT TARGET_FPRS - fnabs %0,%1 - [(set_attr type fp)]) + @ + fnabs %0,%1 + fnabs %0,%1\;fmr %L0,%L1 + [(set_attr type fp) + (set_attr length 4,8)]) ;; Hardware support for decimal floating point operations. Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/dfp-dd-2.c === --- /dev/null +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/dfp-dd-2.c @@ -0,0 +1,26 @@ +/* Test generation of DFP instructions for POWER6. */ +/* { dg-do compile { target { powerpc*-*-linux* powerpc_fprs } } } */ +/* { dg-options -std=gnu99 -O1 -mcpu=power6 } */ + +/* { dg-final { scan-assembler-times fneg 1 } } */ +/* { dg-final { scan-assembler-times fabs 1 } } */ +/* { dg-final { scan-assembler-times fnabs 1 } } */ +/* { dg-final { scan-assembler-times fmr 0 } } */ + +_Decimal64 +func1 (_Decimal64 a, _Decimal64 b) +{ + return -b; +} + +_Decimal64 +func2 (_Decimal64 a, _Decimal64 b) +{ + return
[4.8, PATCH 17/26] Backport Power8 and LE support: Direct moves
Hi, This patch (diff-direct-move) backports support for the Power8 direct move instructions for little endian. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-10-23 Pat Haugen pthau...@us.ibm.com * gcc.target/powerpc/direct-move.h: Fix header for executable tests. Back port from mainline 2014-01-16 Michael Meissner meiss...@linux.vnet.ibm.com PR target/59844 * config/rs6000/rs6000.md (reload_vsx_from_gprsf): Add little endian support, remove tests for WORDS_BIG_ENDIAN. (p8_mfvsrd_3_mode): Likewise. (reload_gpr_from_vsxmode): Likewise. (reload_gpr_from_vsxsf): Likewise. (p8_mfvsrd_4_disf): Likewise. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.md === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.md +++ gcc-4_8-test/gcc/config/rs6000/rs6000.md @@ -9438,7 +9438,7 @@ (unspec:SF [(match_operand:SF 1 register_operand r)] UNSPEC_P8V_RELOAD_FROM_GPR)) (clobber (match_operand:DI 2 register_operand =r))] - TARGET_POWERPC64 TARGET_DIRECT_MOVE WORDS_BIG_ENDIAN + TARGET_POWERPC64 TARGET_DIRECT_MOVE # reload_completed [(const_int 0)] @@ -9465,7 +9465,7 @@ [(set (match_operand:DF 0 register_operand =r) (unspec:DF [(match_operand:FMOVE128_GPR 1 register_operand wa)] UNSPEC_P8V_RELOAD_FROM_VSX))] - TARGET_POWERPC64 TARGET_DIRECT_MOVE WORDS_BIG_ENDIAN + TARGET_POWERPC64 TARGET_DIRECT_MOVE mfvsrd %0,%x1 [(set_attr type mftgpr)]) @@ -9475,7 +9475,7 @@ [(match_operand:FMOVE128_GPR 1 register_operand wa)] UNSPEC_P8V_RELOAD_FROM_VSX)) (clobber (match_operand:FMOVE128_GPR 2 register_operand =wa))] - TARGET_POWERPC64 TARGET_DIRECT_MOVE WORDS_BIG_ENDIAN + TARGET_POWERPC64 TARGET_DIRECT_MOVE # reload_completed [(const_int 0)] @@ -9502,7 +9502,7 @@ (unspec:SF [(match_operand:SF 1 register_operand wa)] UNSPEC_P8V_RELOAD_FROM_VSX)) (clobber (match_operand:V4SF 2 register_operand =wa))] - TARGET_POWERPC64 TARGET_DIRECT_MOVE WORDS_BIG_ENDIAN + TARGET_POWERPC64 TARGET_DIRECT_MOVE # reload_completed [(const_int 0)] @@ -9524,7 +9524,7 @@ [(set (match_operand:DI 0 register_operand =r) (unspec:DI [(match_operand:V4SF 1 register_operand wa)] UNSPEC_P8V_RELOAD_FROM_VSX))] - TARGET_POWERPC64 TARGET_DIRECT_MOVE WORDS_BIG_ENDIAN + TARGET_POWERPC64 TARGET_DIRECT_MOVE mfvsrd %0,%x1 [(set_attr type mftgpr)]) Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/direct-move.h === --- gcc-4_8-test.orig/gcc/testsuite/gcc.target/powerpc/direct-move.h +++ gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/direct-move.h @@ -1,5 +1,7 @@ /* Test functions for direct move support. */ +#include math.h +extern void abort (void); #ifndef VSX_REG_ATTR #define VSX_REG_ATTR wa @@ -111,7 +113,7 @@ const struct test_struct test_functions[ void __attribute__((__noinline__)) test_value (TYPE a) { - size_t i; + long i; for (i = 0; i sizeof (test_functions) / sizeof (test_functions[0]); i++) { @@ -127,8 +129,7 @@ test_value (TYPE a) int main (void) { - size_t i; - long j; + long i,j; union { TYPE value; unsigned char bytes[sizeof (TYPE)];
[4.8, PATCH 6/26] Backport Power8 and LE support: TDmode for LE
Hi, This patch (diff-le-dfp) backports fixes for TDmode on a little endian target. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r205123: 2013-11-20 Ulrich Weigand ulrich.weig...@de.ibm.com * config/rs6000/rs6000.c (rs6000_cannot_change_mode_class): Do not allow subregs of TDmode in FPRs of smaller size in little-endian. (rs6000_split_multireg_move): When splitting an access to TDmode in FPRs, do not use simplify_gen_subreg. Backport from mainline r204927: 2013-11-17 Ulrich Weigand ulrich.weig...@de.ibm.com * config/rs6000/rs6000.c (rs6000_emit_move): Use low word of sdmode_stack_slot also in little-endian mode. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -7963,7 +7963,9 @@ rs6000_emit_move (rtx dest, rtx source, } else if (INT_REGNO_P (REGNO (operands[1]))) { - rtx mem = adjust_address_nv (operands[0], mode, 4); + rtx mem = operands[0]; + if (BYTES_BIG_ENDIAN) + mem = adjust_address_nv (mem, mode, 4); mem = eliminate_regs (mem, VOIDmode, NULL_RTX); emit_insn (gen_movsd_hardfloat (mem, operands[1])); } @@ -7986,7 +7988,9 @@ rs6000_emit_move (rtx dest, rtx source, } else if (INT_REGNO_P (REGNO (operands[0]))) { - rtx mem = adjust_address_nv (operands[1], mode, 4); + rtx mem = operands[1]; + if (BYTES_BIG_ENDIAN) + mem = adjust_address_nv (mem, mode, 4); mem = eliminate_regs (mem, VOIDmode, NULL_RTX); emit_insn (gen_movsd_hardfloat (operands[0], mem)); } @@ -16082,6 +16086,13 @@ rs6000_cannot_change_mode_class (enum ma if (TARGET_IEEEQUAD (to == TFmode || from == TFmode)) return true; + /* TDmode in floating-mode registers must always go into a register +pair with the most significant word in the even-numbered register +to match ISA requirements. In little-endian mode, this does not +match subreg numbering, so we cannot allow subregs. */ + if (!BYTES_BIG_ENDIAN (to == TDmode || from == TDmode)) + return true; + if (from_size 8 || to_size 8) return true; @@ -19028,6 +19039,39 @@ rs6000_split_multireg_move (rtx dst, rtx gcc_assert (reg_mode_size * nregs == GET_MODE_SIZE (mode)); + /* TDmode residing in FP registers is special, since the ISA requires that + the lower-numbered word of a register pair is always the most significant + word, even in little-endian mode. This does not match the usual subreg + semantics, so we cannnot use simplify_gen_subreg in those cases. Access + the appropriate constituent registers by hand in little-endian mode. + + Note we do not need to check for destructive overlap here since TDmode + can only reside in even/odd register pairs. */ + if (FP_REGNO_P (reg) DECIMAL_FLOAT_MODE_P (mode) !BYTES_BIG_ENDIAN) +{ + rtx p_src, p_dst; + int i; + + for (i = 0; i nregs; i++) + { + if (REG_P (src) FP_REGNO_P (REGNO (src))) + p_src = gen_rtx_REG (reg_mode, REGNO (src) + nregs - 1 - i); + else + p_src = simplify_gen_subreg (reg_mode, src, mode, +i * reg_mode_size); + + if (REG_P (dst) FP_REGNO_P (REGNO (dst))) + p_dst = gen_rtx_REG (reg_mode, REGNO (dst) + nregs - 1 - i); + else + p_dst = simplify_gen_subreg (reg_mode, dst, mode, +i * reg_mode_size); + + emit_insn (gen_rtx_SET (VOIDmode, p_dst, p_src)); + } + + return; +} + if (REG_P (src) REG_P (dst) (REGNO (src) REGNO (dst))) { /* Move register range backwards, if we might have destructive
[4.8, PATCH 4/26] Backport Power8 and LE support: Libtool and configure bits 2
Hi, This patch (diff-le-libtool) backports changes to use a libtool.m4 that supports powerpc64le-*linux*. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-11-22 Ulrich Weigand ulrich.weig...@de.ibm.com * libgo/config/libtool.m4: Update to mainline version. * libgo/configure: Regenerate. 2013-11-17 Ulrich Weigand ulrich.weig...@de.ibm.com * libgo/config/libtool.m4: Update to mainline version. * libgo/configure: Regenerate. 2013-11-15 Ulrich Weigand ulrich.weig...@de.ibm.com * libtool.m4: Update to mainline version. * libjava/libltdl/acinclude.m4: Likewise. * gcc/configure: Regenerate. * boehm-gc/configure: Regenerate. * libatomic/configure: Regenerate. * libbacktrace/configure: Regenerate. * libffi/configure: Regenerate. * libgfortran/configure: Regenerate. * libgomp/configure: Regenerate. * libitm/configure: Regenerate. * libjava/configure: Regenerate. * libjava/libltdl/configure: Regenerate. * libjava/classpath/configure: Regenerate. * libmudflap/configure: Regenerate. * libobjc/configure: Regenerate. * libquadmath/configure: Regenerate. * libsanitizer/configure: Regenerate. * libssp/configure: Regenerate. * libstdc++-v3/configure: Regenerate. * lto-plugin/configure: Regenerate. * zlib/configure: Regenerate. Backport from mainline 2013-09-20 Alan Modra amo...@gmail.com * libtool.m4 (_LT_ENABLE_LOCK ld -m flags): Remove non-canonical ppc host match. Support little-endian powerpc linux hosts. * configure: Regenerate. Index: gcc-4_8-branch/gcc/configure === --- gcc-4_8-branch.orig/gcc/configure 2013-12-28 17:41:32.733630408 +0100 +++ gcc-4_8-branch/gcc/configure2013-12-28 17:50:38.646332701 +0100 @@ -13589,7 +13589,7 @@ ia64-*-hpux*) rm -rf conftest* ;; -x86_64-*kfreebsd*-gnu|x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*| \ +x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ s390*-*linux*|s390*-*tpf*|sparc*-*linux*) # Find out which ABI we are using. echo 'int i;' conftest.$ac_ext @@ -13614,7 +13614,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* ;; esac ;; - ppc64-*linux*|powerpc64-*linux*) + powerpc64le-*linux*) + LD=${LD-ld} -m elf32lppclinux + ;; + powerpc64-*linux*) LD=${LD-ld} -m elf32ppclinux ;; s390x-*linux*) @@ -13633,7 +13636,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* x86_64-*linux*) LD=${LD-ld} -m elf_x86_64 ;; - ppc*-*linux*|powerpc*-*linux*) + powerpcle-*linux*) + LD=${LD-ld} -m elf64lppc + ;; + powerpc-*linux*) LD=${LD-ld} -m elf64ppc ;; s390*-*linux*|s390*-*tpf*) @@ -17827,7 +17833,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 17830 configure +#line 17836 configure #include confdefs.h #if HAVE_DLFCN_H @@ -17933,7 +17939,7 @@ else lt_dlunknown=0; lt_dlno_uscore=1; lt_dlneed_uscore=2 lt_status=$lt_dlunknown cat conftest.$ac_ext _LT_EOF -#line 17936 configure +#line 17942 configure #include confdefs.h #if HAVE_DLFCN_H Index: gcc-4_8-branch/libtool.m4 === --- gcc-4_8-branch.orig/libtool.m4 2013-12-28 17:41:32.728630383 +0100 +++ gcc-4_8-branch/libtool.m4 2013-12-28 17:50:38.652332731 +0100 @@ -1220,7 +1220,7 @@ ia64-*-hpux*) rm -rf conftest* ;; -x86_64-*kfreebsd*-gnu|x86_64-*linux*|ppc*-*linux*|powerpc*-*linux*| \ +x86_64-*kfreebsd*-gnu|x86_64-*linux*|powerpc*-*linux*| \ s390*-*linux*|s390*-*tpf*|sparc*-*linux*) # Find out which ABI we are using. echo 'int i;' conftest.$ac_ext @@ -1241,7 +1241,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* ;; esac ;; - ppc64-*linux*|powerpc64-*linux*) + powerpc64le-*linux*) + LD=${LD-ld} -m elf32lppclinux + ;; + powerpc64-*linux*) LD=${LD-ld} -m elf32ppclinux ;; s390x-*linux*) @@ -1260,7 +1263,10 @@ s390*-*linux*|s390*-*tpf*|sparc*-*linux* x86_64-*linux*) LD=${LD-ld} -m elf_x86_64 ;; - ppc*-*linux*|powerpc*-*linux*) + powerpcle-*linux*) + LD=${LD-ld} -m elf64lppc + ;; + powerpc-*linux*) LD=${LD-ld} -m elf64ppc ;; s390*-*linux*|s390*-*tpf*) Index: gcc-4_8-branch/boehm-gc/configure === --- gcc-4_8-branch.orig/boehm-gc/configure
[4.8, PATCH 16/26] Backport Power8 and LE support: PR56843
Hi, This patch (diff-pr56843) backports the fix for PR56843. Thanks, Bill [gcc] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/56843 * config/rs6000/rs6000.c (rs6000_emit_swdiv_high_precision): Remove. (rs6000_emit_swdiv_low_precision): Remove. (rs6000_emit_swdiv): Rewrite to handle between one and four iterations of Newton-Raphson generally; modify required number of iterations for some cases. * config/rs6000/rs6000.h (RS6000_RECIP_HIGH_PRECISION_P): Remove. [gcc/testsuite] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-04-05 Bill Schmidt wschm...@linux.vnet.ibm.com PR target/56843 * gcc.target/powerpc/recip-1.c: Modify expected output. * gcc.target/powerpc/recip-3.c: Likewise. * gcc.target/powerpc/recip-4.c: Likewise. * gcc.target/powerpc/recip-5.c: Add expected output for iterations. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -29417,54 +29417,26 @@ rs6000_emit_nmsub (rtx dst, rtx m1, rtx emit_insn (gen_rtx_SET (VOIDmode, dst, r)); } -/* Newton-Raphson approximation of floating point divide with just 2 passes - (either single precision floating point, or newer machines with higher - accuracy estimates). Support both scalar and vector divide. Assumes no - trapping math and finite arguments. */ +/* Newton-Raphson approximation of floating point divide DST = N/D. If NOTE_P, + add a reg_note saying that this was a division. Support both scalar and + vector divide. Assumes no trapping math and finite arguments. */ -static void -rs6000_emit_swdiv_high_precision (rtx dst, rtx n, rtx d) +void +rs6000_emit_swdiv (rtx dst, rtx n, rtx d, bool note_p) { enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, y1, u0, v0; - enum insn_code code = optab_handler (smul_optab, mode); - insn_gen_fn gen_mul = GEN_FCN (code); - rtx one = rs6000_load_constant_and_splat (mode, dconst1); - - gcc_assert (code != CODE_FOR_nothing); - - /* x0 = 1./d estimate */ - x0 = gen_reg_rtx (mode); - emit_insn (gen_rtx_SET (VOIDmode, x0, - gen_rtx_UNSPEC (mode, gen_rtvec (1, d), - UNSPEC_FRES))); - - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - (d * x0) */ - - e1 = gen_reg_rtx (mode); - rs6000_emit_madd (e1, e0, e0, e0); /* e1 = (e0 * e0) + e0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e1, x0, x0); /* y1 = (e1 * x0) + x0 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y1)); /* u0 = n * y1 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n);/* v0 = n - (d * u0) */ - - rs6000_emit_madd (dst, v0, y1, u0); /* dst = (v0 * y1) + u0 */ -} + rtx one, x0, e0, x1, xprev, eprev, xnext, enext, u, v; + int i; -/* Newton-Raphson approximation of floating point divide that has a low - precision estimate. Assumes no trapping math and finite arguments. */ + /* Low precision estimates guarantee 5 bits of accuracy. High + precision estimates guarantee 14 bits of accuracy. SFmode + requires 23 bits of accuracy. DFmode requires 52 bits of + accuracy. Each pass at least doubles the accuracy, leading + to the following. */ + int passes = (TARGET_RECIP_PRECISION) ? 1 : 3; + if (mode == DFmode || mode == V2DFmode) +passes++; -static void -rs6000_emit_swdiv_low_precision (rtx dst, rtx n, rtx d) -{ - enum machine_mode mode = GET_MODE (dst); - rtx x0, e0, e1, e2, y1, y2, y3, u0, v0, one; enum insn_code code = optab_handler (smul_optab, mode); insn_gen_fn gen_mul = GEN_FCN (code); @@ -29478,46 +29450,44 @@ rs6000_emit_swdiv_low_precision (rtx dst gen_rtx_UNSPEC (mode, gen_rtvec (1, d), UNSPEC_FRES))); - e0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (e0, d, x0, one); /* e0 = 1. - d * x0 */ - - y1 = gen_reg_rtx (mode); - rs6000_emit_madd (y1, e0, x0, x0); /* y1 = x0 + e0 * x0 */ - - e1 = gen_reg_rtx (mode); - emit_insn (gen_mul (e1, e0, e0));/* e1 = e0 * e0 */ - - y2 = gen_reg_rtx (mode); - rs6000_emit_madd (y2, e1, y1, y1); /* y2 = y1 + e1 * y1 */ - - e2 = gen_reg_rtx (mode); - emit_insn (gen_mul (e2, e1, e1));/* e2 = e1 * e1 */ - - y3 = gen_reg_rtx (mode); - rs6000_emit_madd (y3, e2, y2, y2); /* y3 = y2 + e2 * y2 */ - - u0 = gen_reg_rtx (mode); - emit_insn (gen_mul (u0, n, y3)); /* u0 = n * y3 */ - - v0 = gen_reg_rtx (mode); - rs6000_emit_nmsub (v0, d, u0, n);/* v0 = n - d * u0
[4.8, PATCH 18/26] Backport Power8 and LE support: Configure bits 2
Hi, This patch (diff-le-config-2) backports more configure changes, particularly for multilib/multiarch targeting powerpc64le. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Apply mainline r202190, powerpc64le multilibs and multiarch dir 2013-09-03 Alan Modra amo...@gmail.com * config.gcc (powerpc*-*-linux*): Add support for little-endian multilibs to big-endian target and vice versa. * config/rs6000/t-linux64: Use := assignment on all vars. (MULTILIB_EXTRA_OPTS): Remove fPIC. (MULTILIB_OSDIRNAMES): Specify using mapping from multilib_options. * config/rs6000/t-linux64le: New file. * config/rs6000/t-linux64bele: New file. * config/rs6000/t-linux64lebe: New file. Index: gcc-4_8-test/gcc/config.gcc === --- gcc-4_8-test.orig/gcc/config.gcc +++ gcc-4_8-test/gcc/config.gcc @@ -2081,7 +2081,7 @@ powerpc*-*-linux*) tmake_file=rs6000/t-fprules rs6000/t-ppcos ${tmake_file} rs6000/t-ppccomm case ${target} in powerpc*le-*-*) - tm_file=${tm_file} rs6000/sysv4le.h ;; + tm_file=${tm_file} rs6000/sysv4le.h ;; esac maybe_biarch=yes case ${target} in @@ -2104,6 +2104,19 @@ powerpc*-*-linux*) fi tm_file=rs6000/biarch64.h ${tm_file} rs6000/linux64.h glibc-stdint.h tmake_file=$tmake_file rs6000/t-linux64 + case ${target} in + powerpc*le-*-*) + tmake_file=$tmake_file rs6000/t-linux64le + case ${enable_targets} in + all | *powerpc64-* | *powerpc-*) + tmake_file=$tmake_file rs6000/t-linux64lebe ;; + esac ;; + *) + case ${enable_targets} in + all | *powerpc64le-* | *powerpcle-*) + tmake_file=$tmake_file rs6000/t-linux64bele ;; + esac ;; + esac extra_options=${extra_options} rs6000/linux64.opt ;; *) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64 === --- gcc-4_8-test.orig/gcc/config/rs6000/t-linux64 +++ gcc-4_8-test/gcc/config/rs6000/t-linux64 @@ -25,8 +25,8 @@ # it doesn't tell anything about the 32bit libraries on those systems. Set # MULTILIB_OSDIRNAMES according to what is found on the target. -MULTILIB_OPTIONS= m64/m32 -MULTILIB_DIRNAMES = 64 32 -MULTILIB_EXTRA_OPTS = fPIC -MULTILIB_OSDIRNAMES= ../lib64$(call if_multiarch,:powerpc64-linux-gnu) -MULTILIB_OSDIRNAMES+= $(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call if_multiarch,:powerpc-linux-gnu) +MULTILIB_OPTIONS:= m64/m32 +MULTILIB_DIRNAMES := 64 32 +MULTILIB_EXTRA_OPTS := +MULTILIB_OSDIRNAMES := m64=../lib64$(call if_multiarch,:powerpc64-linux-gnu) +MULTILIB_OSDIRNAMES += m32=$(if $(wildcard $(shell echo $(SYSTEM_HEADER_DIR))/../../usr/lib32),../lib32,../lib)$(call if_multiarch,:powerpc-linux-gnu) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64bele === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64bele @@ -0,0 +1,7 @@ +#rs6000/t-linux64end + +MULTILIB_OPTIONS+= mlittle +MULTILIB_DIRNAMES += le +MULTILIB_OSDIRNAMES += $(subst =,.mlittle=,$(subst lible32,lib32le,$(subst lible64,lib64le,$(subst lib,lible,$(subst -linux,le-linux,$(MULTILIB_OSDIRNAMES)) +MULTILIB_OSDIRNAMES += $(subst $(if $(findstring 64,$(target)),m64,m32).,,$(filter $(if $(findstring 64,$(target)),m64,m32).mlittle%,$(MULTILIB_OSDIRNAMES))) +MULTILIB_MATCHES:= ${MULTILIB_MATCHES_ENDIAN} Index: gcc-4_8-test/gcc/config/rs6000/t-linux64le === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64le @@ -0,0 +1,3 @@ +#rs6000/t-linux64le + +MULTILIB_OSDIRNAMES := $(subst -linux,le-linux,$(MULTILIB_OSDIRNAMES)) Index: gcc-4_8-test/gcc/config/rs6000/t-linux64lebe === --- /dev/null +++ gcc-4_8-test/gcc/config/rs6000/t-linux64lebe @@ -0,0 +1,7 @@ +#rs6000/t-linux64leend + +MULTILIB_OPTIONS+= mbig +MULTILIB_DIRNAMES += be +MULTILIB_OSDIRNAMES += $(subst =,.mbig=,$(subst libbe32,lib32be,$(subst libbe64,lib64be,$(subst lib,libbe,$(subst le-linux,-linux,$(MULTILIB_OSDIRNAMES)) +MULTILIB_OSDIRNAMES += $(subst $(if $(findstring 64,$(target)),m64,m32).,,$(filter $(if $(findstring 64,$(target)),m64,m32).mbig%,$(MULTILIB_OSDIRNAMES))) +MULTILIB_MATCHES:= ${MULTILIB_MATCHES_ENDIAN} Index: gcc-4_8-test/libsanitizer/configure.tgt === ---
[4.8, PATCH 15/26] Backport Power8 and LE support: PR54537
Hi, This patch (diff-pr54537) backports a fix for PR54537 which is unrelated but necessary. Copying Richard and Jakub for the common code. Thanks, Bill [libstdc++-v3] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * include/tr1/cmath: Remove pow(double,double) overload, remove a duplicated comment about DR 550. Add a comment to explain the issue. * testsuite/tr1/8_c_compatibility/cmath/pow_cmath.cc: New. [gcc/cp] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org PR c++/54537 * cp-tree.h: Check OVL_USED with OVERLOAD_CHECK. * name-lookup.c (do_nonmember_using_decl): Make sure we have an OVERLOAD before calling OVL_USED. Call diagnose_name_conflict instead of issuing an error without mentioning the conflicting declaration. [gcc/testsuite] 2014-03-29 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2013-08-01 Fabien Chêne fab...@gcc.gnu.org Peter Bergner berg...@vnet.ibm.com PR c++/54537 * g++.dg/overload/using3.C: New. * g++.dg/overload/using2.C: Adjust. * g++.dg/lookup/using9.C: Likewise. Index: gcc-4_8-test/gcc/cp/cp-tree.h === --- gcc-4_8-test.orig/gcc/cp/cp-tree.h +++ gcc-4_8-test/gcc/cp/cp-tree.h @@ -331,7 +331,7 @@ typedef struct ptrmem_cst * ptrmem_cst_t /* If set, this was imported in a using declaration. This is not to confuse with being used somewhere, which is not important for this node. */ -#define OVL_USED(NODE) TREE_USED (NODE) +#define OVL_USED(NODE) TREE_USED (OVERLOAD_CHECK (NODE)) /* If set, this OVERLOAD was created for argument-dependent lookup and can be freed afterward. */ #define OVL_ARG_DEPENDENT(NODE) TREE_LANG_FLAG_0 (OVERLOAD_CHECK (NODE)) Index: gcc-4_8-test/gcc/cp/name-lookup.c === --- gcc-4_8-test.orig/gcc/cp/name-lookup.c +++ gcc-4_8-test/gcc/cp/name-lookup.c @@ -2286,8 +2286,7 @@ push_overloaded_decl_1 (tree decl, int f compparms (TYPE_ARG_TYPES (TREE_TYPE (fn)), TYPE_ARG_TYPES (TREE_TYPE (decl))) ! decls_match (fn, decl)) - error (%q#D conflicts with previous using declaration %q#D, - decl, fn); + diagnose_name_conflict (decl, fn); dup = duplicate_decls (decl, fn, is_friend); /* If DECL was a redeclaration of FN -- even an invalid @@ -2519,7 +2518,7 @@ do_nonmember_using_decl (tree scope, tre if (new_fn == old_fn) /* The function already exists in the current namespace. */ break; - else if (OVL_USED (tmp1)) + else if (TREE_CODE (tmp1) == OVERLOAD OVL_USED (tmp1)) continue; /* this is a using decl */ else if (compparms (TYPE_ARG_TYPES (TREE_TYPE (new_fn)), TYPE_ARG_TYPES (TREE_TYPE (old_fn @@ -2534,7 +2533,7 @@ do_nonmember_using_decl (tree scope, tre break; else { - error (%qD is already declared in this scope, name); + diagnose_name_conflict (new_fn, old_fn); break; } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/lookup/using9.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/lookup/using9.C @@ -21,11 +21,11 @@ void h() f('h'); f(1); // { dg-error ambiguous } // { dg-message candidate candidate note { target *-*-* } 22 } - void f(int); // { dg-error previous using declaration } + void f(int); // { dg-error previous declaration } } void m() { void f(int); - using B::f; // { dg-error already declared } + using B::f; // { dg-error previous declaration } } Index: gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C === --- gcc-4_8-test.orig/gcc/testsuite/g++.dg/overload/using2.C +++ gcc-4_8-test/gcc/testsuite/g++.dg/overload/using2.C @@ -45,7 +45,7 @@ using std::C1; extern C void exit (int) throw (); extern C void *malloc (__SIZE_TYPE__) throw () __attribute__((malloc)); - void abort (void) throw (); + void abort (void) throw (); // { dg-message previous } void _exit (int) throw (); // { dg-error conflicts conflicts } // { dg-message void _exit _exit { target *-*-* } 49 } @@ -54,14 +54,14 @@
[4.8, PATCH 20/26] Backport Power8 and LE support: LRA
Hi, This patch (diff-lra) backports the changes to enable -mlra for the PowerPC back end. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline 2014-02-04 Michael Meissner meiss...@linux.vnet.ibm.com * config/rs6000/rs6000.opt (-mlra): Add switch to enable the LRA register allocator. * config/rs6000/rs6000.c (TARGET_LRA_P): Add support for -mlra to enable the LRA register allocator. Back port the changes from the trunk to enable LRA. (rs6000_legitimate_offset_address_p): Likewise. (legitimate_lo_sum_address_p): Likewise. (use_toc_relative_ref): Likewise. (rs6000_legitimate_address_p): Likewise. (rs6000_emit_move): Likewise. (rs6000_secondary_memory_needed_mode): Likewise. (rs6000_alloc_sdmode_stack_slot): Likewise. (rs6000_lra_p): Likewise. * config/rs6000/sync.md (load_lockedti): Copy TI/PTI variables by 64-bit parts to force the register allocator to allocate even/odd register pairs for the quad word atomic instructions. (store_conditionalti): Likewise. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -1,5 +1,5 @@ /* Subroutines used for code generation on IBM RS/6000. - Copyright (C) 1991-2013 Free Software Foundation, Inc. + Copyright (C) 1991-2014 Free Software Foundation, Inc. Contributed by Richard Kenner (ken...@vlsi1.ultra.nyu.edu) This file is part of GCC. @@ -56,6 +56,7 @@ #include intl.h #include params.h #include tm-constrs.h +#include ira.h #include opts.h #include tree-vectorizer.h #include dumpfile.h @@ -1563,6 +1564,9 @@ static const struct attribute_spec rs600 #undef TARGET_MODE_DEPENDENT_ADDRESS_P #define TARGET_MODE_DEPENDENT_ADDRESS_P rs6000_mode_dependent_address_p +#undef TARGET_LRA_P +#define TARGET_LRA_P rs6000_lra_p + #undef TARGET_CAN_ELIMINATE #define TARGET_CAN_ELIMINATE rs6000_can_eliminate @@ -6242,7 +6246,7 @@ rs6000_legitimate_offset_address_p (enum return false; if (!reg_offset_addressing_ok_p (mode)) return virtual_stack_registers_memory_p (x); - if (legitimate_constant_pool_address_p (x, mode, strict)) + if (legitimate_constant_pool_address_p (x, mode, strict || lra_in_progress)) return true; if (GET_CODE (XEXP (x, 1)) != CONST_INT) return false; @@ -6383,9 +6387,21 @@ legitimate_lo_sum_address_p (enum machin if (TARGET_ELF || TARGET_MACHO) { + bool large_toc_ok; + if (DEFAULT_ABI == ABI_V4 flag_pic) return false; - if (TARGET_TOC) + /* LRA don't use LEGITIMIZE_RELOAD_ADDRESS as it usually calls +push_reload from reload pass code. LEGITIMIZE_RELOAD_ADDRESS +recognizes some LO_SUM addresses as valid although this +function says opposite. In most cases, LRA through different +transformations can generate correct code for address reloads. +It can not manage only some LO_SUM cases. So we need to add +code analogous to one in rs6000_legitimize_reload_address for +LOW_SUM here saying that some addresses are still valid. */ + large_toc_ok = (lra_in_progress TARGET_CMODEL != CMODEL_SMALL + small_toc_ref (x, VOIDmode)); + if (TARGET_TOC ! large_toc_ok) return false; if (GET_MODE_NUNITS (mode) != 1) return false; @@ -6395,7 +6411,7 @@ legitimate_lo_sum_address_p (enum machin (mode == DFmode || mode == DDmode))) return false; - return CONSTANT_P (x); + return CONSTANT_P (x) || large_toc_ok; } return false; @@ -7106,7 +7122,6 @@ use_toc_relative_ref (rtx sym) ASM_OUTPUT_SPECIAL_POOL_ENTRY_P (get_pool_constant (sym), get_pool_mode (sym))) || (TARGET_CMODEL == CMODEL_MEDIUM - !CONSTANT_POOL_ADDRESS_P (sym) SYMBOL_REF_LOCAL_P (sym))); } @@ -7394,7 +7409,8 @@ rs6000_legitimate_address_p (enum machin if (reg_offset_p legitimate_small_data_p (mode, x)) return 1; if (reg_offset_p - legitimate_constant_pool_address_p (x, mode, reg_ok_strict)) + legitimate_constant_pool_address_p (x, mode, +reg_ok_strict || lra_in_progress)) return 1; /* For TImode, if we have load/store quad and TImode in VSX registers, only allow register indirect addresses. This will allow the values to go in @@ -7680,6 +7696,7 @@ rs6000_conditional_register_usage (void) fixed_regs[i] = call_used_regs[i] = call_really_used_regs[i] = 1; } } + /* Try to output insns to set TARGET equal to the constant C if it can be done in less than N insns. Do all computations in MODE. @@ -8112,6 +8129,68 @@
[4.8, PATCH 26/26] Backport Power8 and LE support: Missing support
Hi, This patch (diff-trunk-missing) backports some LE pieces that were found not to have been backported from trunk to the IBM 4.8 branch until relatively recently. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from trunk 2013-04-25 Alan Modra amo...@gmail.com PR target/57052 * config/rs6000/rs6000.md (rotlsi3_internal7): Rename to rotlsi3_internal7le and condition on !BYTES_BIG_ENDIAN. (rotlsi3_internal8be): New BYTES_BIG_ENDIAN insn. Repeat for many other rotate/shift and mask patterns using subregs. Name lshiftrt insns. (ashrdisi3_noppc64): Rename to ashrdisi3_noppc64be and condition on WORDS_BIG_ENDIAN. 2013-06-07 Alan Modra amo...@gmail.com * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't override user -mfp-in-toc. (offsettable_ok_by_alignment): Consider just the current access rather than the whole object, unless BLKmode. Handle CONSTANT_POOL_ADDRESS_P constants that lack a decl too. (use_toc_relative_ref): Allow CONSTANT_POOL_ADDRESS_P constants for -mcmodel=medium. * config/rs6000/linux64.h (SUBSUBTARGET_OVERRIDE_OPTIONS): Don't override user -mfp-in-toc or -msum-in-toc. Default to -mno-fp-in-toc for -mcmodel=medium. 2013-06-18 Alan Modra amo...@gmail.com * config/rs6000/rs6000.h (enum data_align): New. (LOCAL_ALIGNMENT, DATA_ALIGNMENT): Use rs6000_data_alignment. (DATA_ABI_ALIGNMENT): Define. (CONSTANT_ALIGNMENT): Correct comment. * config/rs6000/rs6000-protos.h (rs6000_data_alignment): Declare. * config/rs6000/rs6000.c (rs6000_data_alignment): New function. 2013-07-11 Ulrich Weigand ulrich.weig...@de.ibm.com * config/rs6000/rs6000.md (*tls_gd_lowTLSmode:tls_abi_suffix): Require GOT register as additional operand in UNSPEC. (*tls_ld_lowTLSmode:tls_abi_suffix): Likewise. (*tls_got_dtprel_lowTLSmode:tls_abi_suffix): Likewise. (*tls_got_tprel_lowTLSmode:tls_abi_suffix): Likewise. (*tls_gdTLSmode:tls_abi_suffix): Update splitter. (*tls_ldTLSmode:tls_abi_suffix): Likewise. (tls_got_dtprel_TLSmode:tls_abi_suffix): Likewise. (tls_got_tprel_TLSmode:tls_abi_suffix): Likewise. 2014-01-23 Pat Haugen pthau...@us.ibm.com * config/rs6000/rs6000.c (rs6000_option_override_internal): Don't force flag_ira_loop_pressure if set via command line. 2014-02-06 Alan Modra amo...@gmail.com PR target/60032 * config/rs6000/rs6000.c (rs6000_secondary_memory_needed_mode): Only change SDmode to DDmode when lra_in_progress. Index: gcc-4_8-test/gcc/config/rs6000/linux64.h === --- gcc-4_8-test.orig/gcc/config/rs6000/linux64.h +++ gcc-4_8-test/gcc/config/rs6000/linux64.h @@ -149,8 +149,11 @@ extern int dot_symbols; SET_CMODEL (CMODEL_MEDIUM); \ if (rs6000_current_cmodel != CMODEL_SMALL)\ { \ - TARGET_NO_FP_IN_TOC = 0; \ - TARGET_NO_SUM_IN_TOC = 0; \ + if (!global_options_set.x_TARGET_NO_FP_IN_TOC) \ + TARGET_NO_FP_IN_TOC \ + = rs6000_current_cmodel == CMODEL_MEDIUM; \ + if (!global_options_set.x_TARGET_NO_SUM_IN_TOC) \ + TARGET_NO_SUM_IN_TOC = 0; \ } \ } \ } \ Index: gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-protos.h +++ gcc-4_8-test/gcc/config/rs6000/rs6000-protos.h @@ -152,6 +152,7 @@ extern void rs6000_split_logical (rtx [] #endif /* RTX_CODE */ #ifdef TREE_CODE +extern unsigned int rs6000_data_alignment (tree, unsigned int, enum data_align); extern unsigned int rs6000_special_round_type_align (tree, unsigned int, unsigned int); extern unsigned int darwin_rs6000_special_round_type_align (tree, unsigned int, Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -3031,7 +3031,8 @@ rs6000_option_override_internal (bool gl calculation works better for RTL loop invariant motion on targets with enough (= 32) registers. It is an expensive optimization. So it is on only for peak performance.
[4.8, PATCH 19/26] Backport Power8 and LE support: Quad memory atomic
Hi, This patch (diff-quad-memory) backports support for quad-memory atomic operations. Thanks, Bill [gcc/testsuite] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2014-01-23 Michael Meissner meiss...@linux.vnet.ibm.com PR target/59909 * gcc.target/powerpc/quad-atomic.c: New file to test power8 quad word atomic functions at runtime. [gcc] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Back port from mainline 2014-01-23 Michael Meissner meiss...@linux.vnet.ibm.com PR target/59909 * doc/invoke.texi (RS/6000 and PowerPC Options): Document -mquad-memory-atomic. Update -mquad-memory documentation to say it is only used for non-atomic loads/stores. * config/rs6000/predicates.md (quad_int_reg_operand): Allow either -mquad-memory or -mquad-memory-atomic switches. * config/rs6000/rs6000-cpus.def (ISA_2_7_MASKS_SERVER): Add -mquad-memory-atomic to ISA 2.07 support. * config/rs6000/rs6000.opt (-mquad-memory-atomic): Add new switch to separate support of normal quad word memory operations (ldq, stq) from the atomic quad word memory operations. * config/rs6000/rs6000.c (rs6000_option_override_internal): Add support to separate non-atomic quad word operations from atomic quad word operations. Disable non-atomic quad word operations in little endian mode so that we don't have to swap words after the load and before the store. (quad_load_store_p): Add comment about atomic quad word support. (rs6000_opt_masks): Add -mquad-memory-atomic to the list of options printed with -mdebug=reg. * config/rs6000/rs6000.h (TARGET_SYNC_TI): Use -mquad-memory-atomic as the test for whether we have quad word atomic instructions. (TARGET_SYNC_HI_QI): If either -mquad-memory-atomic, -mquad-memory, or -mp8-vector are used, allow byte/half-word atomic operations. * config/rs6000/sync.md (load_lockedti): Insure that the address is a proper indexed or indirect address for the lqarx instruction. On little endian systems, swap the hi/lo registers after the lqarx instruction. (load_lockedpti): Use indexed_or_indirect_operand predicate to insure the address is valid for the lqarx instruction. (store_conditionalti): Insure that the address is a proper indexed or indirect address for the stqcrx. instruction. On little endian systems, swap the hi/lo registers before doing the stqcrx. instruction. (store_conditionalpti): Use indexed_or_indirect_operand predicate to insure the address is valid for the stqcrx. instruction. * gcc/config/rs6000/rs6000-c.c (rs6000_target_modify_macros): Define __QUAD_MEMORY__ and __QUAD_MEMORY_ATOMIC__ based on what type of quad memory support is available. Index: gcc-4_8-test/gcc/config/rs6000/predicates.md === --- gcc-4_8-test.orig/gcc/config/rs6000/predicates.md +++ gcc-4_8-test/gcc/config/rs6000/predicates.md @@ -270,7 +270,7 @@ { HOST_WIDE_INT r; - if (!TARGET_QUAD_MEMORY) + if (!TARGET_QUAD_MEMORY !TARGET_QUAD_MEMORY_ATOMIC) return 0; if (GET_CODE (op) == SUBREG) @@ -633,6 +633,7 @@ (match_test offsettable_nonstrict_memref_p (op ;; Return 1 if the operand is suitable for load/store quad memory. +;; This predicate only checks for non-atomic loads/stores. (define_predicate quad_memory_operand (match_code mem) { Index: gcc-4_8-test/gcc/config/rs6000/rs6000-c.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-c.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000-c.c @@ -337,6 +337,10 @@ rs6000_target_modify_macros (bool define rs6000_define_or_undefine_macro (define_p, __HTM__); if ((flags OPTION_MASK_P8_VECTOR) != 0) rs6000_define_or_undefine_macro (define_p, __POWER8_VECTOR__); + if ((flags OPTION_MASK_QUAD_MEMORY) != 0) +rs6000_define_or_undefine_macro (define_p, __QUAD_MEMORY__); + if ((flags OPTION_MASK_QUAD_MEMORY_ATOMIC) != 0) +rs6000_define_or_undefine_macro (define_p, __QUAD_MEMORY_ATOMIC__); if ((flags OPTION_MASK_CRYPTO) != 0) rs6000_define_or_undefine_macro (define_p, __CRYPTO__); Index: gcc-4_8-test/gcc/config/rs6000/rs6000-cpus.def === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000-cpus.def +++ gcc-4_8-test/gcc/config/rs6000/rs6000-cpus.def @@ -53,7 +53,8 @@ | OPTION_MASK_CRYPTO \ | OPTION_MASK_DIRECT_MOVE \ | OPTION_MASK_HTM \ -|
Re: [RFA jit v2 2/2] introduce auto_timevar
On Wed, 2014-03-19 at 11:52 -0600, Tom Tromey wrote: This introduces a new auto_timevar class. It pushes a given timevar in its constructor, and pops it in the destructor, giving a much simpler way to use timevars in the typical case where they can be scoped. --- gcc/ChangeLog.jit | 4 gcc/jit/ChangeLog.jit | 4 gcc/jit/internal-api.c | 16 +--- gcc/timevar.h | 26 +- 4 files changed, 38 insertions(+), 12 deletions(-) OK (and it fixes a bug in the earlier version of the patch in the dtor, which pushed rather than popped). Are you able to push this to my branch yourself, or do you need me to do this?
[4.8, PATCH 1/26 too big]
Hi, The main patch for this series was too large for the mailer to accept. Sorry about that. This piece is all powerpc-related and seems to have been delivered to David ok. If anyone else wants a copy of the patch, please contact me privately and I'll send it your way. Thanks, Bill
Re: [PATCH] [gomp4] Initial support of OpenACC loop directive in C front-end.
Hi Ilmir! On Tue, 18 Mar 2014 16:37:24 +0400, Ilmir Usmanov i.usma...@samsung.com wrote: This patch introduces support of OpenACC loop directive (and combined directives) in C front-end up to GENERIC. Currently no clause is allowed. --- /dev/null +++ b/gcc/testsuite/c-c++-common/goacc/loop-1.c @@ -0,0 +1,89 @@ +/* { dg-do compile } */ + +int test1() +{ + int i, j, k, b[10]; + int a[30]; + double d; + float r; + i = 0; + #pragma acc loop + for (i = 1; i 10; i++) +{ +} Do you intend to support loop constructs that are not nested in a parallel or kernels construct? As I'm reading it, the specification is not clear on this. (I guess I'll raise this question with the OpenACC guys.) Grüße, Thomas pgpJV43AkyNA2.pgp Description: PGP signature
[4.8, PATCH 24/26] Backport Power8 and LE support: Reload issues
Hi, This patch (diff-reload) backports fixes for a couple of problems in PowerPC reload handling. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Apply mainline r207798 2014-02-26 Alan Modra amo...@gmail.com PR target/58675 PR target/57935 * config/rs6000/rs6000.c (rs6000_secondary_reload_inner): Use find_replacement on parts of insn rtl that might be reloaded. Backport from mainline r208287 2014-03-03 Bill Schmidt wschm...@linux.vnet.ibm.com * config/rs6000/rs6000.c (rs6000_preferred_reload_class): Disallow reload of PLUS rtx's outside of GENERAL_REGS or BASE_REGS; relax constraint on constants to permit them being loaded into GENERAL_REGS or BASE_REGS. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.c === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.c +++ gcc-4_8-test/gcc/config/rs6000/rs6000.c @@ -16380,7 +16380,7 @@ rs6000_secondary_reload_inner (rtx reg, rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); rclass = REGNO_REG_CLASS (regno); - addr = XEXP (mem, 0); + addr = find_replacement (XEXP (mem, 0)); switch (rclass) { @@ -16391,19 +16391,18 @@ rs6000_secondary_reload_inner (rtx reg, if (GET_CODE (addr) == AND) { and_op2 = XEXP (addr, 1); - addr = XEXP (addr, 0); + addr = find_replacement (XEXP (addr, 0)); } if (GET_CODE (addr) == PRE_MODIFY) { - scratch_or_premodify = XEXP (addr, 0); + scratch_or_premodify = find_replacement (XEXP (addr, 0)); if (!REG_P (scratch_or_premodify)) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (GET_CODE (XEXP (addr, 1)) != PLUS) + addr = find_replacement (XEXP (addr, 1)); + if (GET_CODE (addr) != PLUS) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - addr = XEXP (addr, 1); } if (GET_CODE (addr) == PLUS @@ -16411,6 +16410,8 @@ rs6000_secondary_reload_inner (rtx reg, || !rs6000_legitimate_offset_address_p (PTImode, addr, false, true))) { + /* find_replacement already recurses into both operands of +PLUS so we don't need to call it here. */ addr_op1 = XEXP (addr, 0); addr_op2 = XEXP (addr, 1); if (!legitimate_indirect_address_p (addr_op1, false)) @@ -16486,7 +16487,7 @@ rs6000_secondary_reload_inner (rtx reg, || !VECTOR_MEM_ALTIVEC_P (mode))) { and_op2 = XEXP (addr, 1); - addr = XEXP (addr, 0); + addr = find_replacement (XEXP (addr, 0)); } /* If we aren't using a VSX load, save the PRE_MODIFY register and use it @@ -16498,14 +16499,13 @@ rs6000_secondary_reload_inner (rtx reg, || and_op2 != NULL_RTX || !legitimate_indexed_address_p (XEXP (addr, 1), false))) { - scratch_or_premodify = XEXP (addr, 0); + scratch_or_premodify = find_replacement (XEXP (addr, 0)); if (!legitimate_indirect_address_p (scratch_or_premodify, false)) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - if (GET_CODE (XEXP (addr, 1)) != PLUS) + addr = find_replacement (XEXP (addr, 1)); + if (GET_CODE (addr) != PLUS) rs6000_secondary_reload_fail (__LINE__, reg, mem, scratch, store_p); - - addr = XEXP (addr, 1); } if (legitimate_indirect_address_p (addr, false) /* reg */ @@ -16765,8 +16765,14 @@ rs6000_preferred_reload_class (rtx x, en easy_vector_constant (x, mode)) return ALTIVEC_REGS; - if (CONSTANT_P (x) reg_classes_intersect_p (rclass, FLOAT_REGS)) -return NO_REGS; + if ((CONSTANT_P (x) || GET_CODE (x) == PLUS)) +{ + if (reg_class_subset_p (GENERAL_REGS, rclass)) + return GENERAL_REGS; + if (reg_class_subset_p (BASE_REGS, rclass)) + return BASE_REGS; + return NO_REGS; +} if (GET_MODE_CLASS (mode) == MODE_INT rclass == NON_SPECIAL_REGS) return GENERAL_REGS;
[4.8, PATCH 22/26] Backport Power8 and LE support: -mcall-* endianness
Hi, This patch (diff-mcall) fixes big-endian assumptions for -mcall-aixdesc and various others. Thanks, Bill 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r207658 2014-02-06 Ulrich Weigand ulrich.weig...@de.ibm.com * config/rs6000/sysv4.h (ENDIAN_SELECT): Do not attempt to enforce big-endian mode for -mcall-aixdesc, -mcall-freebsd, -mcall-netbsd, -mcall-openbsd, or -mcall-linux. (CC1_ENDIAN_BIG_SPEC): Remove. (CC1_ENDIAN_LITTLE_SPEC): Remove. (CC1_ENDIAN_DEFAULT_SPEC): Remove. (CC1_SPEC): Remove (always empty) %cc1_endian_... spec. (SUBTARGET_EXTRA_SPECS): Remove %cc1_endian_big, %cc1_endian_little, and %cc1_endian_default. * config/rs6000/sysv4le.h (CC1_ENDIAN_DEFAULT_SPEC): Remove. Index: gcc-4_8-test/gcc/config/rs6000/sysv4.h === --- gcc-4_8-test.orig/gcc/config/rs6000/sysv4.h +++ gcc-4_8-test/gcc/config/rs6000/sysv4.h @@ -522,8 +522,6 @@ extern int fixuplabelno; #define ENDIAN_SELECT(BIG_OPT, LITTLE_OPT, DEFAULT_OPT)\ %{mlittle|mlittle-endian:LITTLE_OPT ; \ mbig|mbig-endian: BIG_OPT; \ - mcall-aixdesc|mcall-freebsd|mcall-netbsd| \ - mcall-openbsd|mcall-linux: BIG_OPT; \ mcall-i960-old:LITTLE_OPT ; \ : DEFAULT_OPT } @@ -536,20 +534,12 @@ extern int fixuplabelno; %{memb|msdata=eabi: -memb} \ ENDIAN_SELECT( -mbig, -mlittle, DEFAULT_ASM_ENDIAN) -#defineCC1_ENDIAN_BIG_SPEC - -#defineCC1_ENDIAN_LITTLE_SPEC - -#defineCC1_ENDIAN_DEFAULT_SPEC %(cc1_endian_big) - #ifndef CC1_SECURE_PLT_DEFAULT_SPEC #define CC1_SECURE_PLT_DEFAULT_SPEC #endif -/* Pass -G xxx to the compiler and set correct endian mode. */ +/* Pass -G xxx to the compiler. */ #defineCC1_SPEC %{G*} %(cc1_cpu) \ - ENDIAN_SELECT( %(cc1_endian_big), %(cc1_endian_little), \ -%(cc1_endian_default)) \ %{meabi: %{!mcall-*: -mcall-sysv }} \ %{!meabi: %{!mno-eabi: \ %{mrelocatable: -meabi } \ @@ -903,9 +893,6 @@ ncrtn.o%s { link_os_netbsd, LINK_OS_NETBSD_SPEC }, \ { link_os_openbsd, LINK_OS_OPENBSD_SPEC }, \ { link_os_default, LINK_OS_DEFAULT_SPEC }, \ - { cc1_endian_big, CC1_ENDIAN_BIG_SPEC }, \ - { cc1_endian_little, CC1_ENDIAN_LITTLE_SPEC }, \ - { cc1_endian_default, CC1_ENDIAN_DEFAULT_SPEC }, \ { cc1_secure_plt_default, CC1_SECURE_PLT_DEFAULT_SPEC }, \ { cpp_os_ads, CPP_OS_ADS_SPEC }, \ { cpp_os_yellowknife, CPP_OS_YELLOWKNIFE_SPEC }, \ Index: gcc-4_8-test/gcc/config/rs6000/sysv4le.h === --- gcc-4_8-test.orig/gcc/config/rs6000/sysv4le.h +++ gcc-4_8-test/gcc/config/rs6000/sysv4le.h @@ -22,9 +22,6 @@ #undef TARGET_DEFAULT #define TARGET_DEFAULT MASK_LITTLE_ENDIAN -#undef CC1_ENDIAN_DEFAULT_SPEC -#defineCC1_ENDIAN_DEFAULT_SPEC %(cc1_endian_little) - #undef DEFAULT_ASM_ENDIAN #defineDEFAULT_ASM_ENDIAN -mlittle
[4.8, PATCH 23/26] Backport Power8 and LE support: PR60137, PR60203
Hi, This patch (diff-pr60137-pr60203) backports fixes for two little-endian vector mode problems. Thanks, Bill [gcc] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r207699. 2014-02-11 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60137 * config/rs6000/rs6000.md (128-bit GPR splitter): Add a splitter for VSX/Altivec vectors that land in GPR registers. Backport from mainline r207808. 2014-02-15 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60203 * config/rs6000/rs6000.md (rreg): Add TFmode, TDmode constraints. (movmode_internal, TFmode/TDmode): Split TFmode/TDmode moves into 64-bit and 32-bit moves. On 64-bit moves, add support for using direct move instructions on ISA 2.07. Also adjust instruction length for 64-bit. (movmode_64bit, TFmode/TDmode): Likewise. (movmode_32bit, TFmode/TDmode): Likewise. Backport from mainline r207868. 2014-02-18 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60203 * config/rs6000/rs6000.md (movmode_64bit, TF/TDmode moves): Split 64-bit moves into 2 patterns. Do not allow the use of direct move for TDmode in little endian, since the decimal value has little endian bytes within a word, but the 64-bit pieces are ordered in a big endian fashion, and normal subreg's of TDmode are not allowed. (movmode_64bit_dm): Likewise. (movtd_64bit_nodm): Likewise. [gcc/testsuite] 2014-03-19 Bill Schmidt wschm...@linux.vnet.ibm.com Backport from mainline r207699. 2014-02-11 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60137 * gcc.target/powerpc/pr60137.c: New file. Backport from mainline r207808. 2014-02-15 Michael Meissner meiss...@linux.vnet.ibm.com PR target/60203 * gcc.target/powerpc/pr60203.c: New testsuite. Index: gcc-4_8-test/gcc/config/rs6000/rs6000.md === --- gcc-4_8-test.orig/gcc/config/rs6000/rs6000.md +++ gcc-4_8-test/gcc/config/rs6000/rs6000.md @@ -378,6 +378,8 @@ (define_mode_attr rreg [(SF f) (DF ws) + (TF f) + (TD f) (V4SF wf) (V2DF wd)]) @@ -8990,10 +8992,40 @@ ;; It's important to list Y-r and r-Y before r-r because otherwise ;; reload, given m-r, will try to pick r-r and reload it, which ;; doesn't make progress. -(define_insn_and_split *movmode_internal + +;; We can't split little endian direct moves of TDmode, because the words are +;; not swapped like they are for TImode or TFmode. Subregs therefore are +;; problematical. Don't allow direct move for this case. + +(define_insn_and_split *movmode_64bit_dm + [(set (match_operand:FMOVE128 0 nonimmediate_operand =m,d,d,Y,r,r,r,wm) + (match_operand:FMOVE128 1 input_operand d,m,d,r,YGHF,r,wm,r))] + TARGET_HARD_FLOAT TARGET_FPRS TARGET_POWERPC64 +(MODEmode != TDmode || WORDS_BIG_ENDIAN) +(gpc_reg_operand (operands[0], MODEmode) + || gpc_reg_operand (operands[1], MODEmode)) + # + reload_completed + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; } + [(set_attr length 8,8,8,12,12,8,8,8)]) + +(define_insn_and_split *movtd_64bit_nodm + [(set (match_operand:TD 0 nonimmediate_operand =m,d,d,Y,r,r) + (match_operand:TD 1 input_operand d,m,d,r,YGHF,r))] + TARGET_HARD_FLOAT TARGET_FPRS TARGET_POWERPC64 !WORDS_BIG_ENDIAN +(gpc_reg_operand (operands[0], TDmode) + || gpc_reg_operand (operands[1], TDmode)) + # + reload_completed + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; } + [(set_attr length 8,8,8,12,12,8)]) + +(define_insn_and_split *movmode_32bit [(set (match_operand:FMOVE128 0 nonimmediate_operand =m,d,d,Y,r,r) (match_operand:FMOVE128 1 input_operand d,m,d,r,YGHF,r))] - TARGET_HARD_FLOAT TARGET_FPRS + TARGET_HARD_FLOAT TARGET_FPRS !TARGET_POWERPC64 (gpc_reg_operand (operands[0], MODEmode) || gpc_reg_operand (operands[1], MODEmode)) # @@ -9429,6 +9461,15 @@ [(set_attr length 12) (set_attr type three)]) +(define_split + [(set (match_operand:FMOVE128_GPR 0 nonimmediate_operand ) + (match_operand:FMOVE128_GPR 1 input_operand ))] + reload_completed +(int_reg_operand (operands[0], MODEmode) + || int_reg_operand (operands[1], MODEmode)) + [(pc)] +{ rs6000_split_multireg_move (operands[0], operands[1]); DONE; }) + ;; Move SFmode to a VSX from a GPR register. Because scalar floating point ;; type is stored internally as double precision in the VSX registers, we have ;; to convert it from the vector format. Index: gcc-4_8-test/gcc/testsuite/gcc.target/powerpc/pr60137.c
Re: [Fortran][PATCH][gomp4]: Transform OpenACC loop directive
Hi Illmir, Ilmir Usmanov: This patch implements transformation of OpenACC loop directive from Fortran AST to GENERIC. If I followed correctly, with this patch the Fortran FE implementation of OpenACC is complete, except for: * !$acc cache() - parsing supported, but then aborting with a not-implemented error * OpenACC 2.0a additions. Am I right? Successfully bootstrapped and tested with no new regressions on x86_64-unknown-linux-gnu. OK for gomp4 branch? I leave the review of gcc/tree-pretty-print.c part (looks good to me) to Thomas, who might have also a comment to the Fortran part. For a DO loop, the code looks okay. For DO CONCURRENT, it is not. I think we should really consider to reject DO CONCURRENT with a not permitted; it is currently not explicitly supported by OpenACC; I think we can still worry about it, when it will be explicitly added to OpenACC. Otherwise, see gfc_trans_do_concurrent for how to handle the do concurrent loops. Issues with DO CONCURRENT: * You use code-ext.iterator-var - that's fine with DO but not with DO CONCURRENT, which uses a code-ext.forall_iterator * Do concurrent also handles multiple variables in a single statement, such as: integer :: i, j, b(3,5) DO CONCURRENT(i=1:3, j=1:5:2) b(i, j) = -42 END DO end * And do concurrent also supports masks: logical :: my_mask(3) integer :: i, b(3) b(i) = [5, 5, 2] my_mask = [.true., .false., .true.] do concurrent (i=1:3, b(i) == 5 .and. my_mask(i)) b(i) = -42 end do end Tobias
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Early *ping* - I think this wrong-code GCC 4.7/4.8/4.9 issue is pretty severe. Tobias Burnus wrote: This patch fixes two issues, where gfortran claims that a function is implicit pure, but it is not. That will cause a wrong-code optimization in the middle end. First problem, cf. PR60543, is that implicit pure was not set to 0 for calls to impure intrinsic subroutines. (BTW: There are no impure intrinsic functions.) Example: module m contains REAL(8) FUNCTION random() CALL RANDOM_NUMBER(random) END FUNCTION random end module m The second problem pops up if one adds a BLOCK ... END BLOCK around the random_number call after applying the patch of the PR, which just does: gfc_current_ns-proc_name-attr.implicit_pure = 0. The problem is that one sets only the implicit_pure of the block to 0 and not of the function. That's the reason that the patch became much longer and that I added gfc_unset_implicit_pure as new function. Thus, the suspicion I had when reviewing the OpenACC patches turned out to be founded. Cf. PR60283. Build and regtested on x86-64-gnu-linux. OK for the trunk and for the 4.7 and 4.8 branches? Note: I failed to create a test case. Tobias
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, Mar 19, 2014 at 02:23:58PM -0500, Bill Schmidt wrote: Support for Power8 features and the new powerpc64le-linux-gnu target, including the ELFv2 ABI, has been developed up till now on the ibm/gcc-4_8-branch. It was appropriate to use this separate branch while the support was unstable, but this branch will not represent a particularly good support mechanism for distributions going forward. Most distros are set up to pull from the major release branches, and having a separate branch for one target is quite inconvenient. Also, the ibm/gcc-4_8-branch's original purpose is to serve as the code base for IBM's Advance Toolchain 7.0. Over time the two purposes that the branch currently serves will diverge and make things even more complicated. The code is now tested and stable enough that we are ready to backport this support to the FSF 4.8 branch. This patch series constitutes that backport. I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, Mar 19, 2014 at 4:05 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Mar 19, 2014 at 02:23:58PM -0500, Bill Schmidt wrote: Support for Power8 features and the new powerpc64le-linux-gnu target, including the ELFv2 ABI, has been developed up till now on the ibm/gcc-4_8-branch. It was appropriate to use this separate branch while the support was unstable, but this branch will not represent a particularly good support mechanism for distributions going forward. Most distros are set up to pull from the major release branches, and having a separate branch for one target is quite inconvenient. Also, the ibm/gcc-4_8-branch's original purpose is to serve as the code base for IBM's Advance Toolchain 7.0. Over time the two purposes that the branch currently serves will diverge and make things even more complicated. The code is now tested and stable enough that we are ready to backport this support to the FSF 4.8 branch. This patch series constitutes that backport. I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Before this patch is approved, we are going to thoroughly confirm that it does not harm any other PowerPC targets (big endian PowerLinux, eABI, nor AIX). Any help with testng from the PPC eABI community is appreciated. - David
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Dear Tobias, The patch looks OK to me. If nothing else, it offers a rationalisation of all the lines of code that unset the attribute! I am somewhat puzzled by Note: I failed to create a test case, wheras I find one at the end of the patch. Can you explain what you mean? Cheers Paul On 19 March 2014 21:21, Tobias Burnus bur...@net-b.de wrote: Early *ping* - I think this wrong-code GCC 4.7/4.8/4.9 issue is pretty severe. Tobias Burnus wrote: This patch fixes two issues, where gfortran claims that a function is implicit pure, but it is not. That will cause a wrong-code optimization in the middle end. First problem, cf. PR60543, is that implicit pure was not set to 0 for calls to impure intrinsic subroutines. (BTW: There are no impure intrinsic functions.) Example: module m contains REAL(8) FUNCTION random() CALL RANDOM_NUMBER(random) END FUNCTION random end module m The second problem pops up if one adds a BLOCK ... END BLOCK around the random_number call after applying the patch of the PR, which just does: gfc_current_ns-proc_name-attr.implicit_pure = 0. The problem is that one sets only the implicit_pure of the block to 0 and not of the function. That's the reason that the patch became much longer and that I added gfc_unset_implicit_pure as new function. Thus, the suspicion I had when reviewing the OpenACC patches turned out to be founded. Cf. PR60283. Build and regtested on x86-64-gnu-linux. OK for the trunk and for the 4.7 and 4.8 branches? Note: I failed to create a test case. Tobias -- The knack of flying is learning how to throw yourself at the ground and miss. --Hitchhikers Guide to the Galaxy
[C++ PATCH] Fix ICE in build_zero_init_1 (PR c++/60572)
Hi! On the following testcase starting with r199779 we have a FIELD_DECL with error_mark_node type, on which we ICE. Fixed by ignoring such FIELD_DECLs. Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? 2014-03-19 Jakub Jelinek ja...@redhat.com PR c++/60572 * init.c (build_zero_init_1): Ignore fields with error_mark_node type. * g++.dg/init/pr60572.C: New test. --- gcc/cp/init.c.jj2014-03-10 10:50:14.0 +0100 +++ gcc/cp/init.c 2014-03-19 07:43:54.077795662 +0100 @@ -192,6 +192,9 @@ build_zero_init_1 (tree type, tree nelts if (TREE_CODE (field) != FIELD_DECL) continue; + if (TREE_TYPE (field) == error_mark_node) + continue; + /* Don't add virtual bases for base classes if they are beyond the size of the current field, that means it is present somewhere else in the object. */ --- gcc/testsuite/g++.dg/init/pr60572.C.jj 2014-03-19 07:46:33.607894844 +0100 +++ gcc/testsuite/g++.dg/init/pr60572.C 2014-03-19 07:46:49.752804722 +0100 @@ -0,0 +1,13 @@ +// PR c++/60572 +// { dg-do compile } + +struct A +{ + A x; // { dg-error incomplete type } + virtual ~A () {} +}; + +struct B : A +{ + B () : A () {} +}; Jakub
Re: [RFA jit v2 1/2] introduce class toplev
David OK. Are you able to push this to my branch, or do you need me to do David this? Thanks, I was able to push them. Tom
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub The three pieces that are somewhat controversial for non-powerpc targets are 9/26, 10/26, 15/26. * Uli and Alan, can you speak to any concerns for 9/26? * 10/26 hits libstdc++, but only in a minor way for the extract_symvers script; it adds a sed to ignore a string added for powerpc64le, so shouldn't be a problem. * 15/26 might be one we can do without. I need to check with Peter Bergner, who originally backported Fabien's patch, but unfortunately he is on vacation. That patch fixed a problem that originated on an x86 platform. I can try respinning the patch series without this one and see what breaks, or if Peter happens to see this while he's on vacation, perhaps he can comment. For PowerPC targets, I have already checked out powerpc64-linux (big endian). As David mentioned, I need to apply the patch series on an AIX machine and test it before this can be accepted. We don't have any way of testing the eabi stuff, so community help would be very much appreciated there. Thanks, Bill
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On Wed, 2014-03-19 at 16:03 -0500, Bill Schmidt wrote: On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub The three pieces that are somewhat controversial for non-powerpc targets are 9/26, 10/26, 15/26. I forgot to mention that these bits have all been upstream in trunk since last autumn, so there's been quite a bit of burn-in at that level. Obviously that is not the same as being burned in on 4.8, but it does help provide a bit of confidence. Bill * Uli and Alan, can you speak to any concerns for 9/26? * 10/26 hits libstdc++, but only in a minor way for the extract_symvers script; it adds a sed to ignore a string added for powerpc64le, so shouldn't be a problem. * 15/26 might be one we can do without. I need to check with Peter Bergner, who originally backported Fabien's patch, but unfortunately he is on vacation. That patch fixed a problem that originated on an x86 platform. I can try respinning the patch series without this one and see what breaks, or if Peter happens to see this while he's on vacation, perhaps he can comment. For PowerPC targets, I have already checked out powerpc64-linux (big endian). As David mentioned, I need to apply the patch series on an AIX machine and test it before this can be accepted. We don't have any way of testing the eabi stuff, so community help would be very much appreciated there. Thanks, Bill
Re: [4.8, PATCH 0/26] Backport Power8 and LE support
On 03/19/14 15:03, Bill Schmidt wrote: On Wed, 2014-03-19 at 21:05 +0100, Jakub Jelinek wrote: I guess the most important question is what guarantees there are that it won't affect non-powerpc* ports too much (my main concern is the 9/26 patch, plus the C++ FE / libstdc++ changes), and how much does this affect code generation and overall stability of the PowerPC big endian existing targets. Jakub The three pieces that are somewhat controversial for non-powerpc targets are 9/26, 10/26, 15/26. * Uli and Alan, can you speak to any concerns for 9/26? I've got no concerns about 9/26. Uli, Alan and myself worked through this pretty thoroughly. I've had those in the back of my mind as something we're going to want to make sure to pull in. Jeff
PR libstdc++/60587
I'm debugging http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60587 and have found a number of problems. Firstly, the bug report is correct, this overload dereferences the __other argument without checking if that is OK: templatetypename _Iterator, typename _Sequence, typename _InputIterator inline bool __foreign_iterator_aux3(const _Safe_iterator_Iterator, _Sequence __it, _InputIterator __other, std::true_type) Secondly, in this testcase we should never even have reached that overload, because we should have gone to this overload of _aux2: templatetypename _Iterator, typename _Sequence, typename _OtherIterator inline bool __foreign_iterator_aux2(const _Safe_iterator_Iterator, _Sequence __it, const _Safe_iterator_OtherIterator, _Sequence __other, std::input_iterator_tag) { return __it._M_get_sequence() != __other._M_get_sequence(); } However that is not chosen by overload resolution because this is a better match when __other is non-const: templatetypename _Iterator, typename _Sequence, typename _InputIterator inline bool __foreign_iterator_aux2(const _Safe_iterator_Iterator, _Sequence __it, _InputIterator __other, std::random_access_iterator_tag) Fixing the overload resolution bug makes the testcase in the PR pass, but the underlying problem of dereferencing an invalid iterator still exists and can be shown by changing the testcase slightly: #define _GLIBCXX_DEBUG #include vector int main() { std::vectorint a; std::vectorlong b; a.push_back(1); a.insert(a.end(), b.begin(), b.end()); } That still dereferences b.begin(), but that too can be fixed (either as suggested in the PR or by passing the begin and end iterators into the __foreign_iter function) but I think there's still another problem. I'm looking again at the code that attempts to check if we have contiguous storage: if (std::addressof(*(__it._M_get_sequence()-_M_base().end() - 1)) - std::addressof(*(__it._M_get_sequence()-_M_base().begin())) == __it._M_get_sequence()-size() - 1) Are we really sure that ensures contiguous iterators? What if we have a deque with three blocks laid out in memory like this: 1XXX3XXx2XXX ^ ^ begin()end() 1 is the start of the first block, 2 is the start of the second block and 3 is the start of the third block. X is an element, x is reserved but uninitialized capacity . is unallocated memory (or memory not used by the deque) Here we have end() - begin() == size() but non-contiguous memory. If the __other iterator happens to point to the unallocated memory between 1 and 3 then it will appear to be part of the deque, but isn't. I think the safe thing to do is (as I suggested at the time) to have a trait saying which iterator types refer to contiguous memory. Our debug mode only supports our own containers, so the ones which are contiguous are known. For 4.9.0 I think the right option is simply to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 completely. The fixed version of __foreign_iterator_aux2() can detect when we have iterators referring to the same sequence, which is what we really want to detect. That's what the attached patch does and what I'm going to test. --- debug/functions.h.orig 2014-03-19 21:34:43.038647394 + +++ debug/functions.h 2014-03-19 21:35:53.502617461 + @@ -175,62 +175,6 @@ return __first; } -#if __cplusplus = 201103L - // Default implementation. - templatetypename _Iterator, typename _Sequence -inline bool -__foreign_iterator_aux4(const _Safe_iterator_Iterator, _Sequence __it, - typename _Sequence::const_pointer __begin, - typename _Sequence::const_pointer __other) -{ - typedef typename _Sequence::const_pointer _PointerType; - constexpr std::less_PointerType __l{}; - - return (__l(__other, __begin) - || __l(std::addressof(*(__it._M_get_sequence()-_M_base().end() - - 1)), __other)); -} - - // Fallback when address type cannot be implicitely casted to sequence - // const_pointer. - templatetypename _Iterator, typename _Sequence, - typename _InputIterator -inline bool -__foreign_iterator_aux4(const _Safe_iterator_Iterator, _Sequence, - _InputIterator, ...) -{ return true; } - - templatetypename _Iterator, typename _Sequence, typename _InputIterator -inline bool -__foreign_iterator_aux3(const _Safe_iterator_Iterator, _Sequence __it, - _InputIterator __other, - std::true_type) -{ - // Only containers with all elements in contiguous memory can have their - // elements passed through pointers. - // Arithmetics is here just to make sure we are not dereferencing - // past-the-end iterator. - if
Re: [Patch, Fortran] PRs 60283/60543: Fix two wrong-code bugs related for implicit pure
Paul Richard Thomas wrote: The patch looks OK to me. If nothing else, it offers a rationalisation of all the lines of code that unset the attribute! I am somewhat puzzled by Note: I failed to create a test case, wheras I find one at the end of the patch. Can you explain what you mean? What I meant was that I failed to create a run-time test case, which fails without the patch. However, after I wrote that, I saw that there is a dg-* which permits to check the .mod file for a string. That's why I could include a test case. Committed to the trunk as Rev. 208687. While looking at the patch again for backporting, I saw that I have missed the following parts. I will commit them tomorrow as obvious, unless someone protests. Tobias 2014-03-19 Tobias Burnus burnus@net-b. PR fortran/60543 * io.c (check_io_constraints): Use gfc_unset_implicit_pure. * resolve.c (resolve_ordinary_assign): Ditto. Index: gcc/fortran/io.c === --- gcc/fortran/io.c (Revision 208687) +++ gcc/fortran/io.c (Arbeitskopie) @@ -3259,9 +3259,8 @@ if (condition) \ an internal file in a PURE procedure, io_kind_name (k)); - if (gfc_implicit_pure (NULL) (k == M_READ || k == M_WRITE)) - gfc_current_ns-proc_name-attr.implicit_pure = 0; - + if (k == M_READ || k == M_WRITE) + gfc_unset_implicit_pure (NULL); } if (k != M_READ) Index: gcc/fortran/resolve.c === --- gcc/fortran/resolve.c (Revision 208687) +++ gcc/fortran/resolve.c (Arbeitskopie) @@ -9165,7 +9165,7 @@ resolve_ordinary_assign (gfc_code *code, gfc_names if (lhs-expr_type == EXPR_VARIABLE lhs-symtree-n.sym != gfc_current_ns-proc_name lhs-symtree-n.sym-ns != gfc_current_ns) - gfc_current_ns-proc_name-attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); if (lhs-ts.type == BT_DERIVED lhs-expr_type == EXPR_VARIABLE @@ -9173,11 +9173,11 @@ resolve_ordinary_assign (gfc_code *code, gfc_names rhs-expr_type == EXPR_VARIABLE (gfc_impure_variable (rhs-symtree-n.sym) || gfc_is_coindexed (rhs))) - gfc_current_ns-proc_name-attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); /* Fortran 2008, C1283. */ if (gfc_is_coindexed (lhs)) - gfc_current_ns-proc_name-attr.implicit_pure = 0; + gfc_unset_implicit_pure (NULL); } /* F2008, 7.2.1.2. */
Re: PR libstdc++/60587
On 19/03/14 21:39 +, Jonathan Wakely wrote: I think the safe thing to do is (as I suggested at the time) to have a trait saying which iterator types refer to contiguous memory. Our debug mode only supports our own containers, so the ones which are contiguous are known. For 4.9.0 I think the right option is simply to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 completely. The fixed version of __foreign_iterator_aux2() can detect when we have iterators referring to the same sequence, which is what we really want to detect. That's what the attached patch does and what I'm going to test. With my suggested change we get an XPASS for testsuite/23_containers/vector/debug/57779_neg.cc An __is_contiguous trait would solve that.
Re: PR libstdc++/60587
Hi On 19/mar/2014, at 23:28, Jonathan Wakely jwak...@redhat.com wrote: On 19/03/14 21:39 +, Jonathan Wakely wrote: I think the safe thing to do is (as I suggested at the time) to have a trait saying which iterator types refer to contiguous memory. Our debug mode only supports our own containers, so the ones which are contiguous are known. For 4.9.0 I think the right option is simply to remove __foreign_iterator_aux3 and __foreign_iterator_aux4 completely. The fixed version of __foreign_iterator_aux2() can detect when we have iterators referring to the same sequence, which is what we really want to detect. That's what the attached patch does and what I'm going to test. With my suggested change we get an XPASS for testsuite/23_containers/vector/debug/57779_neg.cc An __is_contiguous trait would solve that. Funny, I thought we already had it... Paolo
[patch committed SH] Fix target/60039
I've committed the attached patch to fix PR target/60039 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60039 which is a regression from 4.5 for some sh3 users. Tested on sh4-unknown-linux-gnu with -mdiv=call-div1. I'd like to backport it to 4.8 in a week or two as usual. Regards, kaz -- 2014-03-19 Kaz Kojima kkoj...@gcc.gnu.org PR target/60039 * config/sh/sh.md (udivsi3_i1): Clobber R1 register. --- ORIG/trunk/gcc/config/sh/sh.md 2014-03-02 09:49:58.0 +0900 +++ trunk/gcc/config/sh/sh.md 2014-03-18 14:43:26.515319735 +0900 @@ -2314,6 +2314,7 @@ (udiv:SI (reg:SI R4_REG) (reg:SI R5_REG))) (clobber (reg:SI T_REG)) (clobber (reg:SI PR_REG)) + (clobber (reg:SI R1_REG)) (clobber (reg:SI R4_REG)) (use (match_operand:SI 1 arith_reg_operand r))] TARGET_SH1 TARGET_DIVIDE_CALL_DIV1
Re: [C++ PATCH] [gomp4] Initial OpenACC support to C++ front-end
On Thu, 13 Mar 2014, Ilmir Usmanov wrote: * gcc/testsuite/c-c++-common/goacc/deviceptr-1.c: Move to ... * gcc/testsuite/gcc.dg/goacc/deviceptr-1.c ... here. * gcc/testsuite/g++.dg/goacc/goacc.exp: New test directory. * gcc/testsuite/g++.dg/goacc-gomp/goacc-gomp.exp: Likewise. The ChangeLog file is in gcc/testsuite/, so paths should be given relative to that directory (i.e. without the gcc/testsuite/ part). gcc/testsuite/g++.dg/goacc/ * deviceptr-1.cpp: New test. * sb-1.cpp: Likewise. * sb-2.cpp: Likewise. Here, each entry should contain the g++.dg/goacc/ part. And the ChangeLog entry should be updated for the change in filenames to *.C. + for (t = vars; t t; t = TREE_CHAIN (t)) This use of t t seems odd. + c_parser_omp_var_list_parens() should construct a list of No use of () when referring to a function in a comment. +static tree +cp_parser_oacc_all_clauses (cp_parser *parser, omp_clause_mask mask, +const char *where, cp_token *pragma_tok, +bool finish_p = true) No caller seems to set this finish_p argument, so I don't see a need for it. +/* OpenACC 2.0: + # pragma acc data oacc-data-clause[optseq] new-line + structured-block + + LOC is the location of the #pragma token. +*/ +static tree +cp_parser_oacc_data (cp_parser *parser, cp_token *pragma_tok) There's no parameter LOC, so it seems wrong for the comment to mention one. (This applies to other functions with such a comment as well.) Observations on the tests: I don't see anything testing diagnostics for the case where it's a return statement that branches out of a block for which isn't not permitted (you have tests for goto and switch statements doing such branches) - is that because such tests are also missing for C? There are questions of how OpenACC constructs interact with C++ features not present in C. I think a lot of such questions would apply more to the implementation of the routine directive than to the things in this patch (as there may well be C++ features not readily supported on an accelerator). For the features in this patch, I suppose exception handling is another form of invalid jump out of a structured block, but it must be considered undefined behavior at runtime because it can't be detected at compile time. I guess something to include in the testsuite is testing use of OpenACC directives within templates. Thus, you have a diagnostic for non-pointer variables being used in a deviceptr clause; the testsuite should verify that if the clause is used within a template, and the type of the variable depends on the type for which a template is instantiated, you only get the error for an instantiation giving it a non-pointer type, not if all instantiations give it a pointer type. (Generally, this applies to any check of something that can only be determined for a particular instantiation.) -- Joseph S. Myers jos...@codesourcery.com