Re: [google] Backport r178628 from trunk to google/gcc-4_6 (issue5139050)
This is okay On Mon, Sep 26, 2011 at 6:14 PM, Guozhi Wei car...@google.com wrote: Hi This patch removes the XPASS of test case 20040204-1.c on arm. Tested by on arm: make check-gcc RUNTESTFLAGS=--target_board=arm-sim/thumb/arch=armv7-a tree-ssa.exp=20040204-1.c on x86_64: make check-gcc RUNTESTFLAGS=tree-ssa.exp=20040204-1.c OK for google/gcc-4_6? thanks Carrot 2011-09-27 Guozhi Wei car...@google.com 2011-09-07 Jiangning Liu jiangning@arm.com PR tree-optimization/46021 * gcc.dg/tree-ssa/20040204-1.c: Don't XFAIL on arm*-*-*. Index: gcc.dg/tree-ssa/20040204-1.c === --- gcc.dg/tree-ssa/20040204-1.c (revision 179225) +++ gcc.dg/tree-ssa/20040204-1.c (working copy) @@ -33,5 +33,5 @@ void test55 (int x, int y) that the should be emitted (based on BRANCH_COST). Fix this by teaching dom to look through and register all components as true. */ -/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! alpha*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */ +/* { dg-final { scan-tree-dump-times link_error 0 optimized { xfail { ! alpha*-*-* arm*-*-* powerpc*-*-* cris-*-* crisv32-*-* hppa*-*-* i?86-*-* mmix-*-* mips*-*-* m68k*-*-* moxie-*-* sparc*-*-* spu-*-* x86_64-*-* } } } } */ /* { dg-final { cleanup-tree-dump optimized } } */ -- This patch is available for review at http://codereview.appspot.com/5139050
RE: [PATCH] Fix stack red zone bug (PR38644)
-static inline bool +extern bool static bool ix86_using_red_zone (void) { return TARGET_RED_ZONE !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif ... +/* Return true if the ABI allows red zone access. */ +extern bool static bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + Uros, Thanks for your review. Accept and new patch is as below. No change for ChangeLog. Thanks, -Jiangning diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff8c49f..e200974 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2631,7 +2631,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = /* Return true if a red-zone is in use. */ -static inline bool +static bool ix86_using_red_zone (void) { return TARGET_RED_ZONE !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone + #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE ix86_function_value diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1ab57e5..1e64d14 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1537,6 +1537,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone + /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors The PowerPC architecture requires only weak consistency among processors--that is, memory accesses between processors need not be @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) } } +/* Return true if the ABI allows red zone access. */ +static bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + /* Return true if OFFSET from stack pointer can be clobbered by signals. V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes below stack pointer not cloberred by signals. */ @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) static inline bool offset_below_red_zone_p (HOST_WIDE_INT offset) { - return offset (DEFAULT_ABI == ABI_V4 + return offset (!TARGET_STACK_USING_RED_ZONE ? 0 : TARGET_32BIT ? -220 : -288); } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 335c1d1..c848806 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void) +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @deftypevr {Target Hook} {unsigned HOST_WIDE_INT} TARGET_CONST_ANCHOR On some architectures it can take multiple instructions to synthesize a constant. If there is another constant already in a register that diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in index 6783826..0fa9e10 100644 --- a/gcc/doc/tm.texi.in +++ b/gcc/doc/tm.texi.in @@ -11223,6 +11223,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@hook TARGET_STACK_USING_RED_ZONE +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to return false which is safe and appropriate for most targets. +@end deftypefn + @hook TARGET_CONST_ANCHOR On some architectures it can take multiple
[arm-embedded] Backport mainline r173371
Backport mainline r173371 to ARM/embedded-4_6-branch. Committed. 2011-09-28 Terry Guo terry@arm.com Backport r173371 from mainline 2011-05-04 Andreas Krebbel andreas.kreb...@de.ibm.com * calls.c (emit_library_call_value_1): Invoke promote_function_mode hook on libcall arguments. * explow.c (promote_function_mode, promote_mode): Handle TYPE argument being NULL. * targhooks.c (default_promote_function_mode): Lisewise. * config/s390/s390.c (s390_promote_function_mode): Likewise. * config/sparc/sparc.c (sparc_promote_function_mode): Likewise. * doc/tm.texi: Document that TYPE argument might be NULL.
RE: [PATCH] Fix stack red zone bug (PR38644)
Just realized ChangeLog needs to be changed as well. ChangeLog: * config/i386/i386.c (ix86_using_red_zone): Remove inline. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Jiangning Liu Sent: Wednesday, September 28, 2011 2:24 PM To: 'Uros Bizjak' Cc: gcc-patches@gcc.gnu.org; j...@suse.cz; geo...@geoffk.org; dje@gmail.com; r...@redhat.com; Richard Earnshaw; Matthew Gretton- Dann Subject: RE: [PATCH] Fix stack red zone bug (PR38644) -static inline bool +extern bool static bool ix86_using_red_zone (void) { return TARGET_RED_ZONE !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif ... +/* Return true if the ABI allows red zone access. */ +extern bool static bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + Uros, Thanks for your review. Accept and new patch is as below. No change for ChangeLog. Thanks, -Jiangning diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index ff8c49f..e200974 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -2631,7 +2631,7 @@ static const char *const cpu_names[TARGET_CPU_DEFAULT_max] = /* Return true if a red-zone is in use. */ -static inline bool +static bool ix86_using_red_zone (void) { return TARGET_RED_ZONE !TARGET_64BIT_MS_ABI; @@ -35958,6 +35958,9 @@ ix86_autovectorize_vector_sizes (void) #define TARGET_STACK_PROTECT_FAIL ix86_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE ix86_using_red_zone + #undef TARGET_FUNCTION_VALUE #define TARGET_FUNCTION_VALUE ix86_function_value diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index 1ab57e5..1e64d14 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1537,6 +1537,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #define TARGET_STACK_PROTECT_FAIL rs6000_stack_protect_fail #endif +#undef TARGET_STACK_USING_RED_ZONE +#define TARGET_STACK_USING_RED_ZONE rs6000_stack_using_red_zone + /* MPC604EUM 3.5.2 Weak Consistency between Multiple Processors The PowerPC architecture requires only weak consistency among processors--that is, memory accesses between processors need not be @@ -20660,6 +20663,13 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) } } +/* Return true if the ABI allows red zone access. */ +static bool +rs6000_stack_using_red_zone (void) +{ + return (DEFAULT_ABI != ABI_V4); +} + /* Return true if OFFSET from stack pointer can be clobbered by signals. V.4 doesn't have any stack cushion, AIX ABIs have 220 or 288 bytes below stack pointer not cloberred by signals. */ @@ -20667,7 +20677,7 @@ rs6000_restore_saved_cr (rtx reg, int using_mfcr_multiple) static inline bool offset_below_red_zone_p (HOST_WIDE_INT offset) { - return offset (DEFAULT_ABI == ABI_V4 + return offset (!TARGET_STACK_USING_RED_ZONE ? 0 : TARGET_32BIT ? -220 : -288); } diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi index 335c1d1..c848806 100644 --- a/gcc/doc/tm.texi +++ b/gcc/doc/tm.texi @@ -11332,6 +11332,22 @@ to the stack. Therefore, this hook should return true in general, but false for naked functions. The default implementation always returns true. @end deftypefn +@deftypefn {Target Hook} bool TARGET_STACK_USING_RED_ZONE (void) +This hook returns true if the target has a red zone (an area beyond the +current extent of the stack that cannot be modified by asynchronous events +on the processor). + +If this hook returns false then the compiler mid-end will not move an access +to memory in the stack frame past a stack adjustment insn. + +If this hook returns true then the compiler mid-end will assume that it is +safe to move an access to memory in the stack frame past a stack adjustment +insn. The target back-end must emit scheduling barrier insns when this is +unsafe. + +The default is to
Re: [ARM] Optimise handling of neon_vget_high/low
Ramana Radhakrishnan ramana.radhakrish...@linaro.org writes: On 14 September 2011 13:30, Richard Sandiford richard.sandif...@linaro.org wrote: neon_vget_high and neon_vget_low extract one half of a vector. The patterns look like: (define_insn neon_vget_highv16qi [(set (match_operand:V8QI 0 s_register_operand =w) (vec_select:V8QI (match_operand:V16QI 1 s_register_operand w) (parallel [(const_int 8) (const_int 9) (const_int 10) (const_int 11) (const_int 12) (const_int 13) (const_int 14) (const_int 15)])))] TARGET_NEON { int dest = REGNO (operands[0]); int src = REGNO (operands[1]); if (dest != src + 2) return vmov\t%P0, %f1; else return ; } [(set_attr neon_type neon_bp_simple)] ) But there's nothing here to tell the register allocator what's expected of it, so we do often get the move. The patch below makes the patterns expand to normal subreg moves instead. Unfortunately, when I first tried this, I ran across some bugs in simplify-rtx.c. They should be fixed now. Of course, I can't guarantee that there are other similar bugs elsewhere, but I'll try to fix any that crop up. The new patterns preserve the current treatment on big-endian targets. Namely, the low half is always in the lower-numbered registers (subreg byte offset 0). Tested on arm-linux-gnueabi. OK to install? This is OK . Please watch out for any fallout that comes as a result of this . It would be useful to do some big endian testing at some point but I'm not going to let that hold up this patch. While you are here can you look at the quad_halves_codev4si etc. patterns neon_move_lo_quad_mode and neon_move_hi_quad_mode patterns which seem to have the same problem ? Ah, yeah, thanks for the pointer. Tested on arm-linux-gnueabi, and on benchmarks which should (and did) benefit from it. OK to install? Richard gcc/ * config/arm/neon.md (neon_move_lo_quad_mode): Delete. (neon_move_hi_quad_mode): Likewise. (move_hi_quad_mode, move_lo_quad_mode): Use subreg moves. Index: gcc/config/arm/neon.md === --- gcc/config/arm/neon.md 2011-09-27 09:27:18.0 +0100 +++ gcc/config/arm/neon.md 2011-09-27 09:45:26.008275971 +0100 @@ -1251,66 +1251,14 @@ (define_insn quad_halves_codev16qi (const_string neon_int_1) (const_string neon_int_5)))] ) -; FIXME: We wouldn't need the following insns if we could write subregs of -; vector registers. Make an attempt at removing unnecessary moves, though -; we're really at the mercy of the register allocator. - -(define_insn neon_move_lo_quad_mode - [(set (match_operand:ANY128 0 s_register_operand +w) -(vec_concat:ANY128 - (match_operand:V_HALF 1 s_register_operand w) - (vec_select:V_HALF - (match_dup 0) - (match_operand:ANY128 2 vect_par_constant_high ] - TARGET_NEON -{ - int dest = REGNO (operands[0]); - int src = REGNO (operands[1]); - - if (dest != src) -return vmov\t%e0, %P1; - else -return ; -} - [(set_attr neon_type neon_bp_simple)] -) - -(define_insn neon_move_hi_quad_mode - [(set (match_operand:ANY128 0 s_register_operand +w) -(vec_concat:ANY128 - (vec_select:V_HALF - (match_dup 0) - (match_operand:ANY128 2 vect_par_constant_low )) - (match_operand:V_HALF 1 s_register_operand w)))] - - TARGET_NEON -{ - int dest = REGNO (operands[0]); - int src = REGNO (operands[1]); - - if (dest != src) -return vmov\t%f0, %P1; - else -return ; -} - [(set_attr neon_type neon_bp_simple)] -) - (define_expand move_hi_quad_mode [(match_operand:ANY128 0 s_register_operand ) (match_operand:V_HALF 1 s_register_operand )] TARGET_NEON { - rtvec v = rtvec_alloc (V_mode_nunits/2); - rtx t1; - int i; - - for (i=0; i (V_mode_nunits/2); i++) - RTVEC_ELT (v, i) = GEN_INT (i); - - t1 = gen_rtx_PARALLEL (MODEmode, v); - emit_insn (gen_neon_move_hi_quad_mode (operands[0], operands[1], t1)); - + emit_move_insn (simplify_gen_subreg (V_HALFmode, operands[0], MODEmode, + GET_MODE_SIZE (V_HALFmode)), + operands[1]); DONE; }) @@ -1319,16 +1267,9 @@ (define_expand move_lo_quad_mode (match_operand:V_HALF 1 s_register_operand )] TARGET_NEON { - rtvec v = rtvec_alloc (V_mode_nunits/2); - rtx t1; - int i; - - for (i=0; i (V_mode_nunits/2); i++) - RTVEC_ELT (v, i) = GEN_INT ((V_mode_nunits/2) + i); - - t1 = gen_rtx_PARALLEL (MODEmode, v); - emit_insn (gen_neon_move_lo_quad_mode (operands[0], operands[1], t1)); - + emit_move_insn (simplify_gen_subreg (V_HALFmode, operands[0], + MODEmode, 0), + operands[1]); DONE; })
Re: [PATCH] Fix stack red zone bug (PR38644)
On Wed, Sep 28, 2011 at 8:42 AM, Jiangning Liu jiangning@arm.com wrote: Just realized ChangeLog needs to be changed as well. ChangeLog: * config/i386/i386.c (ix86_using_red_zone): Remove inline. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New. OK for x86, but you need approval from middle-end maintainer for the patch. Thanks, Uros.
Re: Intrinsics for N2965: Type traits and base classes
On 28 September 2011 04:22, Michael Spertus wrote: Benjamin, I think tuple is wrong both for performance reasons (I believe these are likely to be serious enough to depress use due to inordinately long compiles) and because it prematurely locks us into a rigid choice of how our typelists are implemented. My inclination is to make it type-independent by returning an unspecified type that can have a sequence of types extracted from it (this is the approach taken by boost::mpl and has loads of experience that shows it is a good approach to metaprogramming). In other words, firstbasesA::type would be the first base of A, etc. Citing Boost MPL as a good way to avoid inordinately long compiles ... interesting! Have you ever tried to reduce a GCC bug report from 20k lines to 20, because most Boost libs include every MPL header?! I hope we can get a simple typelist _without_ needing everything else in MPL, such as the apply and lambda metafunctions (and maybe a lot of that could be massively simplified using variadic templates anyway.)
Re: [PATCH 1/2] Change random seeds to 64bit and drop re-crcing
On Tue, Sep 27, 2011 at 11:30 PM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com I had some trouble with random build failures in a large LTO project and it turned out to be random seed collisions in a highly parallel build (thanks to Honza for suggesting that) There were multiple problems: - The way to generate the random seed is pure (milli seconds plus pid) - It's only 32bit - Several users take the existing ascii seed and re-CRC32 it again, which doesn't exactly improve it. This patch changes that to: - Always use 64bit seeds as numbers (no re-crcing) - Change all users to use HOST_WIDE_INT - When the user specifies a random seed it's still crc32ed, but only in this case. Passes bootstrap + testsuite on x86_64-linux. Ok to commit? Ok if there are no objections from other folks within 24h. Thanks, Richard. -Andi gcc/cp: 2011-09-26 Andi Kleen a...@linux.intel.com * repo.c (finish_repo): Use HOST_WIDE_INT_PRINT_HEX_PURE. gcc/: 2011-09-26 Andi Kleen a...@linux.intel.com * hwint.h (HOST_WIDE_INT_PRINT_HEX_PURE): Add. * lto-streamer.c (lto_get_section_name): Remove crc32_string. Handle numerical random seed. * lto-streamer.h (lto_file_decl_data): Change id to unsigned HOST_WIDE_INT. * toplev.c (random_seed): Add. (init_random_seed): Change for numerical random seed. (get_random_seed): Return as HOST_WIDE_INT. (set_random_seed): Crc32 existing string. * toplev.h (get_random_seed): Change to numercal return. * tree.c (get_file_function_name): Remove CRC. Handle numerical random seed. gcc/lto/: 2011-09-26 Andi Kleen a...@linux.intel.com * lto.c (lto_resolution_read): Remove id dumping. (lto_section_with_id): Turn id HOST_WIDE_ID. (create_subid_section_table): Dito. --- gcc/cp/repo.c | 3 ++- gcc/hwint.h | 1 + gcc/lto-streamer.c | 8 gcc/lto-streamer.h | 2 +- gcc/lto/lto.c | 11 --- gcc/toplev.c | 23 --- gcc/toplev.h | 2 +- gcc/tree.c | 6 +++--- 8 files changed, 28 insertions(+), 28 deletions(-) diff --git a/gcc/cp/repo.c b/gcc/cp/repo.c index 16a192e..ca971b6 100644 --- a/gcc/cp/repo.c +++ b/gcc/cp/repo.c @@ -263,7 +263,8 @@ finish_repo (void) anonymous namespaces will get the same mangling when this file is recompiled. */ if (!strstr (args, '-frandom-seed=)) - fprintf (repo_file, '-frandom-seed=%s', get_random_seed (false)); + fprintf (repo_file, '-frandom-seed= HOST_WIDE_INT_PRINT_HEX_PURE ', + get_random_seed (false)); fprintf (repo_file, \n); } diff --git a/gcc/hwint.h b/gcc/hwint.h index 2643aee..f5e0bee 100644 --- a/gcc/hwint.h +++ b/gcc/hwint.h @@ -102,6 +102,7 @@ extern char sizeof_long_long_must_be_8[sizeof(long long) == 8 ? 1 : -1]; #define HOST_WIDE_INT_PRINT_DEC_C HOST_WIDE_INT_PRINT_DEC HOST_WIDE_INT_PRINT_C #define HOST_WIDE_INT_PRINT_UNSIGNED % HOST_WIDE_INT_PRINT u #define HOST_WIDE_INT_PRINT_HEX %# HOST_WIDE_INT_PRINT x +#define HOST_WIDE_INT_PRINT_HEX_PURE % HOST_WIDE_INT_PRINT x /* Set HOST_WIDEST_INT. This is a 64-bit type unless the compiler in use has no 64-bit type at all; in that case it's 32 bits. */ diff --git a/gcc/lto-streamer.c b/gcc/lto-streamer.c index 633c3ce..e3ccb79 100644 --- a/gcc/lto-streamer.c +++ b/gcc/lto-streamer.c @@ -166,13 +166,13 @@ lto_get_section_name (int section_type, const char *name, struct lto_file_decl_d doesn't confuse the reader with merged sections. For options don't add a ID, the option reader cannot deal with them - and merging should be ok here. - - XXX: use crc64 to minimize collisions? */ + and merging should be ok here. */ if (section_type == LTO_section_opts) strcpy (post, ); + else if (f != NULL) + sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, f-id); else - sprintf (post, .%x, f ? f-id : crc32_string(0, get_random_seed (false))); + sprintf (post, . HOST_WIDE_INT_PRINT_HEX_PURE, get_random_seed (false)); return concat (LTO_SECTION_NAME_PREFIX, sep, add, post, NULL); } diff --git a/gcc/lto-streamer.h b/gcc/lto-streamer.h index 190d6a3..2564bd2 100644 --- a/gcc/lto-streamer.h +++ b/gcc/lto-streamer.h @@ -552,7 +552,7 @@ struct GTY(()) lto_file_decl_data struct lto_file_decl_data *next; /* Sub ID for merged objects. */ - unsigned id; + unsigned HOST_WIDE_INT id; /* Symbol resolutions for this file */ VEC(ld_plugin_symbol_resolution_t,heap) * GTY((skip)) resolutions; diff --git a/gcc/lto/lto.c b/gcc/lto/lto.c index 0b1dcb9..77eb1a1 100644 --- a/gcc/lto/lto.c +++ b/gcc/lto/lto.c @@ -976,9 +976,6 @@ lto_resolution_read (splay_tree file_ids, FILE *resolution, lto_file *file) } file_data = (struct lto_file_decl_data *)nd-value; - if
Re: [PATCH] Fix stack red zone bug (PR38644)
On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard. Thanks, -Jiangning Richard. No matter what solution itself is, the problem itself is a quite a common one on ISA level, so we should solve it in middle-end, because middle-end is shared for all ports. My proposal avoids problems in future. Any new ports and new back-end implementations needn't explicitly define this hook in a very safe way. But if one port wants to unsafely introduce red zone, we can explicitly define this hook in back-end. This way, we would avoid the bug in the earliest time. Do you really want to hide this problem in back-end silently? And wait others to report bug and then waste time to get it fixed again? The facts I see is over the years, for different ports reported the similar issue around this, and somebody tried to fix them. Actually, this kind of problem should be fixed in design level. If the first people solve this bug can give solution in middle end, we would not need to waste time on filing those bugs in bug zilla and fixing them around this problem. Thanks, -Jiangning
[patch, testsuite, ARM] Skip architecture option in pr42575.c
2011-09-28 Joey Ye joey...@arm.com * gcc.target/arm/pr42575.c: Remove architecture option. Index: gcc/testsuite/gcc.target/arm/pr42575.c === --- gcc/testsuite/gcc.target/arm/pr42575.c (revision 179308) +++ gcc/testsuite/gcc.target/arm/pr42575.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options -O2 -march=armv7-a } */ +/* { dg-options -O2 } */ /* Make sure RA does good job allocating registers and avoids unnecessary moves. */ /* { dg-final { scan-assembler-not mov } } */
[PATCH, PR50485, committed] Initialize variable in sse4_1-blendps{,-2}.c with random floats
Committed as obvious. Thanks, - Tom 2011-09-28 Tom de Vries t...@codesourcery.com PR testsuite/50485 * gcc.target/i386/sse4_1-blendps.c: Include stdlib.h. (TEST): Initialize src3 with random floats. * gcc.target/i386/sse4_1-blendps-2.c (sse4_1_test): Remove field i from union src3. Initialize src3 with random floats. Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps.c === --- gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (revision 179043) +++ gcc/testsuite/gcc.target/i386/sse4_1-blendps.c (working copy) @@ -14,6 +14,7 @@ #include smmintrin.h #include string.h +#include stdlib.h #define NUM 20 @@ -66,6 +67,9 @@ TEST (void) init_blendps (src1.f, src2.f); + for (i = 0; i 4; i++) +src3.f[i] = (int) random (); + /* Check blendps imm8, m128, xmm */ for (i = 0; i NUM; i++) { Index: gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c === --- gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (revision 179043) +++ gcc/testsuite/gcc.target/i386/sse4_1-blendps-2.c (working copy) @@ -53,14 +53,13 @@ sse4_1_test (void) { __m128 x; float f[4]; - int i[4]; } src3; int i; init_blendps (src1.f, src2.f); for (i = 0; i 4; i++) -src3.i[i] = (int) random (); +src3.f[i] = (int) random (); /* Check blendps imm8, m128, xmm */ for (i = 0; i NUM; i++)
RE: [PATCH] Fix stack red zone bug (PR38644)
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 4:39 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this. I doubt there isn't bug for them. 4) There are over 300 TARGET HOOKS being defined in target.def. I don't think adding this interface is hurting GCC. BTW, really appreciate your close attention to this specific issue. Thanks, -Jiangning Richard. Thanks, -Jiangning Richard. No matter what solution itself is, the problem itself is a quite a common one on ISA level, so we should solve it in middle-end, because middle-end is shared for all ports. My proposal avoids problems in future. Any new ports and new back- end implementations needn't explicitly define this hook in a very safe way. But if one port wants to unsafely introduce red zone, we can explicitly define this hook in back-end. This way, we would avoid the bug in the earliest time. Do you really want to hide this problem in back-end silently? And wait others to report bug and then waste time to get it fixed again? The facts I see is over the years,
Re: [PATCH] Fix stack red zone bug (PR38644)
On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 4:39 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this. I doubt there isn't bug for them. 4) There are over 300 TARGET HOOKS being defined in target.def. I don't think adding this interface is hurting GCC. I don't argue that your solution is not acceptable, just your reasoning is bogus IMHO. 1) is a target bug, 2) huh, I doubt that this is the biggest issue one faces when implementing a new target, 3) I'm sure there are very many latent backend bugs not related to red-zone The middle-end isn't safe by default either if you have bogus instruction patterns in your .md file, or you generate bogus IL from the va-arg gimplification hook. A target bug is a target bug, the middle-end can't and should not do anything to try to workaround bugs in targets. BTW, really appreciate your close attention to this specific issue. Thanks, -Jiangning Richard. Thanks, -Jiangning Richard. No matter what solution itself is,
[PATCH, PR50527] Don't assume alignment of vla-related allocas.
Richard, I got a patch for PR50527. The patch prevents the alignment of vla-related allocas to be set to BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding the alloca. Bootstrapped and regtested on x86_64. OK for trunk? Thanks, - Tom 2011-09-27 Tom de Vries t...@codesourcery.com * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related allocas. * gcc.dg/pr50527.c: New test. Index: gcc/tree-ssa-ccp.c === --- gcc/tree-ssa-ccp.c (revision 179210) +++ gcc/tree-ssa-ccp.c (working copy) @@ -1632,6 +1632,8 @@ evaluate_stmt (gimple stmt) break; case BUILT_IN_ALLOCA: + if (gimple_call_alloca_for_var_p (stmt)) + break; val.lattice_val = CONSTANT; val.value = build_int_cst (TREE_TYPE (gimple_get_lhs (stmt)), 0); val.mask = shwi_to_double_int Index: gcc/testsuite/gcc.dg/pr50527.c === --- /dev/null (new file) +++ gcc/testsuite/gcc.dg/pr50527.c (revision 0) @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options -Os --param large-stack-frame=30 } */ + +extern void abort (void); + +void __attribute__((noinline)) +bar (char *a) +{ +} + +void __attribute__((noinline)) +foo (char *a, int b) +{ +} + +void __attribute__((noinline)) +test_align (char *p, int aligned, unsigned int mask) +{ + int p_aligned = ((unsigned long int)p mask) == 0; + if (aligned != p_aligned) +abort (); +} + +int +main () +{ + const int kIterations = 4; + char results[kIterations]; + int i; + unsigned int mask; + + mask = 0xf; + test_align (results, ((unsigned long int)results mask) == 0, mask); + mask = 0x7; + test_align (results, ((unsigned long int)results mask) == 0, mask); + mask = 0x3; + test_align (results, ((unsigned long int)results mask) == 0, mask); + mask = 0x1; + test_align (results, ((unsigned long int)results mask) == 0, mask); + + bar (results); + for (i = 0; i kIterations; i++) +foo (%d , results[i]); + + return 0; +}
RE: [PATCH] Fix stack red zone bug (PR38644)
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 5:20 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 4:39 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this. I doubt there isn't bug for them. 4) There are over 300 TARGET HOOKS being defined in target.def. I don't think adding this interface is hurting GCC. I don't argue that your solution is not acceptable, just your reasoning is bogus IMHO. 1) is a target bug, Why should back-ends handle a thing that doesn't exist at all for them? I don't see clear logic here. 2) huh, I doubt that this is the biggest issue one faces when implementing a new target, I never say it is a
Re: [patch, testsuite, ARM] Skip architecture option in pr42575.c
On 28 September 2011 09:48, Joey Ye joey...@arm.com wrote: 2011-09-28 Joey Ye joey...@arm.com * gcc.target/arm/pr42575.c: Remove architecture option. What happens if this test is run with a multilib of march=armv5te ? Ramana Index: gcc/testsuite/gcc.target/arm/pr42575.c === --- gcc/testsuite/gcc.target/arm/pr42575.c (revision 179308) +++ gcc/testsuite/gcc.target/arm/pr42575.c (working copy) @@ -1,4 +1,4 @@ -/* { dg-options -O2 -march=armv7-a } */ +/* { dg-options -O2 } */ /* Make sure RA does good job allocating registers and avoids unnecessary moves. */ /* { dg-final { scan-assembler-not mov } } */
Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.
On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries tom_devr...@mentor.com wrote: Richard, I got a patch for PR50527. The patch prevents the alignment of vla-related allocas to be set to BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after folding the alloca. Bootstrapped and regtested on x86_64. OK for trunk? Hmm. As gfortran with -fstack-arrays uses VLAs it's probably bad that the vectorizer then will no longer see that the arrays are properly aligned. I'm not sure what the best thing to do is here, other than trying to record the alignment requirement of the VLA somewhere. Forcing the alignment of the alloca replacement decl to BIGGEST_ALIGNMENT has the issue that it will force stack-realignment which isn't free (and the point was to make the decl cheaper than the alloca). But that might possibly be the better choice. Any other thoughts? Thanks, Richard. Thanks, - Tom 2011-09-27 Tom de Vries t...@codesourcery.com * tree-ssa-ccp.c (evaluate_stmt): Don't assume alignment for vla-related allocas. * gcc.dg/pr50527.c: New test.
Re: [PATCH] Fix stack red zone bug (PR38644)
On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 5:20 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 4:39 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this. I doubt there isn't bug for them. 4) There are over 300 TARGET HOOKS being defined in target.def. I don't think adding this interface is hurting GCC. I don't argue that your solution is not acceptable, just your reasoning is bogus IMHO. 1) is a target bug, Why should back-ends handle a thing that doesn't exist at all for them? I don't see clear logic here. I don't think this is the case here. You need
Re: Go patch committed: Update libgo to Go release r60
Ian Lance Taylor i...@google.com writes: Rainer Orth r...@cebitec.uni-bielefeld.de writes: Solaris 8 and 9 suffer from the same problem. The following patch allowed the bootstrap to complete. An IRIX bootstrap is currently running, but will take some time to complete. Rainer 2011-09-23 Rainer Orth r...@cebitec.uni-bielefeld.de * mksysinfo.sh: Provide TIOCSCTTY if missing. Thanks. I committed this patch, though I moved the new lines farther down in the script next to the handling of the ioctl constants. Thanks, I'd missed that. It turned out that IRIX 6 needs one more change to return to bootstrap land: sys/termios.h only defines TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff (cf. configure.ac). I've cheated and use sys/ttold.h instead, which doesn't have this check. With this patch, a Go-only IRIX 6.5 bootstrap completed successfully. Rainer 2011-09-28 Rainer Orth r...@cebitec.uni-bielefeld.de * mksysinfo.sh [__sgi__]: Include sys/ttold.h. # HG changeset patch # Parent 4530aeaf12a2b1576a7bf67de9cf5569719107c6 Provide TIOCNOTTY on IRIX 6 diff --git a/libgo/mksysinfo.sh b/libgo/mksysinfo.sh --- a/libgo/mksysinfo.sh +++ b/libgo/mksysinfo.sh @@ -36,9 +36,12 @@ cat sysinfo.c EOF #include netinet/in.h /* netinet/tcp.h needs u_char/u_short, but sys/bsd_types is only included by netinet/in.h if _SGIAPI (i.e. _SGI_SOURCE -!_XOPEN_SOURCE. */ +!_XOPEN_SOURCE. + sys/termios.h only defines TIOCNOTTY if !_XOPEN_SOURCE, while + sys/ttold.h does so unconditionally. */ #ifdef __sgi__ #include sys/bsd_types.h +#include sys/ttold.h #endif #include netinet/tcp.h #include signal.h -- - Rainer Orth, Center for Biotechnology, Bielefeld University
Re: [RFC] Context sensitive inline analysis
Jan Hubicka hubi...@ucw.cz writes: the problem is sign overflow in time computation. Time should be capped by MAX_TIME and we compute MAX_TIME * INLINE_SIZE_SCALE * 2. This happens to be 2^31 2^32 so we overflow here because of use of signed arithmetics. Index: ipa-inline-analysis.c === --- ipa-inline-analysis.c (revision 179266) +++ ipa-inline-analysis.c (working copy) @@ -92,7 +92,7 @@ along with GCC; see the file COPYING3. /* Estimate runtime of function can easilly run into huge numbers with many nested loops. Be sure we can compute time * INLINE_SIZE_SCALE in integer. For anything larger we use gcov_type. */ -#define MAX_TIME 100 +#define MAX_TIME 50 /* Number of bits in integer, but we really want to be stable across different hosts. */ Could you update the comment too? (time * INLINE_SIZE_SCALE * 2) Richard
Re: Use of vector instructions in memmov/memset expanding
Attached is a part 1 of patch that enables use of vector-instructions in memset and memcopy (middle-end part). The main part of the changes is in functions move_by_pieces/set_by_pieces. In new version algorithm of move-mode selection was changed – now it checks if alignment is known at compile time and uses cost-models to choose between aligned and unaligned vector or not-vector move-modes. Build and 'make check' was tested - in 'make check' there is a fail, that would be cured when complete patch is applied. On 27 September 2011 18:44, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: I divided the patch into three smaller ones: 1) Patch with target-independent changes (see attached file memfunc-mid.patch). The main part of the changes is in functions move_by_pieces/set_by_pieces. In new version algorithm of move-mode selection was changed – now it checks if alignment is known at compile time and uses cost-models to choose between aligned and unaligned vector or not-vector move-modes. 2) Patch with target-dependent changes (memfunc-be.patch). The main part of the changes is in functions ix86_expand_setmem/ix86_expand_movmem. The other changes are only needed to support it. The changes mostly touched unrolled_loop strategy – now vector move modes could be used here. That resulted in large epilogues and prologues, so their generation also was modified. This patch contains some changes in middle-end (to make build possible) - but all these changes are present in the first patch, so there is no need to review them here. 3) Patch with all new tests (memfunc-tests.patch). This patch contains a lot of small tests for different memset and memcopy cases. Separately from each other, these patches won't give performance gain. The positive effect will be noticeable only if they are applied together (I attach the complete patch also - see file memfunc-complete.patch). If you have any questions regarding these changes, please don't hesitate to ask them. On 18 July 2011 15:00, Michael Zolotukhin michael.v.zolotuk...@gmail.com wrote: Here is a summary - probably, it doesn't cover every single piece in the patch, but I tried to describe the major changes. I hope this will help you a bit - and of course I'll answer your further questions if they appear. The changes could be logically divided into two parts (though, these parts have something in common). The first part is changes in target-independent part, in functions move_by_pieces() and store_by_pieces() - mostly located in expr.c. The second part touches ix86_expand_movmem() and ix86_expand_setmem() - mostly located in config/i386/i386.c. Changes in i386.c (target-dependent part): 1) Strategies for cases with known and unknown alignment are separated from each other. When alignment is known at compile time, we could generate optimized code without libcalls. When it's unknown, we sometimes could create runtime-checks to reach desired alignment, but not always. Strategies for atom and generic_32, generic_64 were chosen according to set of experiments, strategies in other cost models are unchanged (strategies for unknown alignment are copied from existing strategies). 2) unrolled_loop algorithm was modified - now it uses SSE move-modes, if they're available. 3) As size of data, moved in one iteration, greatly increased, and epilogues became bigger - so some changes were needed in epilogue generation. In some cases a special loop (not unrolled) is generated in epilogue to avoid slow copying by bytes (changes in expand_set_or_movmem_via_loop() and introducing of expand_set_or_movmem_via_loop_with_iter() is made for these cases). 4) As bigger alignment might be needed than previously, prologue generation was also modified. Changes in expr.c (target-independent part): There are two possible strategies now: use of aligned and unaligned moves. For each of them a cost model was implemented and the choice is made according to the cost of each option. Move-mode choice is made by functions widest_mode_for_unaligned_mov() and widest_mode_for_aligned_mov(). Cost estimation is implemented in functions compute_aligned_cost() and compute_unaligned_cost(). Choice between these two strategies and the generation of moves themselves are in function move_by_pieces(). Function store_by_pieces() calls set_by_pieces_1() instead of store_by_pieces_1(), if this is memset-case (I needed to introduce set_by_pieces_1 to separate memset-case from others - store_by_pieces_1 is sometimes called for strcpy and some other functions, not only for memset). Set_by_pieces_1() estimates costs of aligned and unaligned strategies (as in move_by_pieces() ) and generates moves for memset. Single move is generated via generate_move_with_mode(). If it's called first time, a promoted value (register, filled with one-byte value of memset argument) is generated - later calls reuse this value. Changes in
Re: [ARM] Optimise handling of neon_vget_high/low
Tested on arm-linux-gnueabi, and on benchmarks which should (and did) benefit from it. OK to install? OK. Ramana
Re: Use of vector instructions in memmov/memset expanding
Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? -Andi -- a...@linux.intel.com -- Speaking for myself only
Re: Intrinsics for N2965: Type traits and base classes
Don't worry, I'm not suggesting including boost::mpl at all, just leaving the return type of the bases trait unspecified. IMO, your example illustrates my point that without performance tuning, compiling metaprograms can be prohibitively expensive, so I want to avoid running the tuple metaprogram that creates the fields when we never need to instantiate the type. Benchmarks soon. Mike On 9/28/2011 2:53 AM, Jonathan Wakely wrote: On 28 September 2011 04:22, Michael Spertus wrote: Benjamin, I think tuple is wrong both for performance reasons (I believe these are likely to be serious enough to depress use due to inordinately long compiles) and because it prematurely locks us into a rigid choice of how our typelists are implemented. My inclination is to make it type-independent by returning an unspecified type that can have a sequence of types extracted from it (this is the approach taken by boost::mpl and has loads of experience that shows it is a good approach to metaprogramming). In other words, firstbasesA::type would be the first base of A, etc. Citing Boost MPL as a good way to avoid inordinately long compiles ... interesting! Have you ever tried to reduce a GCC bug report from 20k lines to 20, because most Boost libs include every MPL header?! I hope we can get a simple typelist _without_ needing everything else in MPL, such as the apply and lambda metafunctions (and maybe a lot of that could be massively simplified using variadic templates anyway.) .
Re: Use of vector instructions in memmov/memset expanding
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote: Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? Especially when glibc these days has very well optimized implementations tuned for various CPUs and it is very unlikely beneficial to inline memcpy/memset if they aren't really short or have unknown number of iterations. Jakub
Support for V2 plugin API
Hi, this patch adds support for V2 plugin API (thanks, Cary) that adds LDPR_PREVAILING_DEF_IRONLY_EXP. The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported out of DSO. It is up to the compiler to optimize it out or keep it based on the knowledge whether the symbol can be optimized out at all (i.e. most COMDATs can, other types can't). This solve quite few problems with building C++ APPS, see the PR log. I was originally wrong about gold implementation being buggy. The problem turned out to be subtle lto-symtab bug that was mostly latent because of the COMDAT hack we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when it is available but it doesn't when resolution of very first entry in the list is UNKNOWN. This can happen because we add into symtab also declarations that are not in varpool (i.e. they are neither defined or used by the object file), but they are used otherwise, i.e. referred from stuff used for debug info or TB devirtualization. To ensure backward compatibility I am keeping the COMDAT hack in place. It won't help letting compiler know the plugin API version, since we decide on that at a time we output object files and thus we are not called from plugin. I suppose we could keep the hack in place for next release and remove it afterwards penalizing builds with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time. Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in progress. OK if it passes? Honza PR lto/47247 * lto-plugin.c (get_symbols_v2): New variable. (write_resolution): Use V2 API when available. (onload): Handle LDPT_GET_SYMBOLS_V2. * lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve when resolution is already availbale from plugin. (lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP. * cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp. * lto-cgraph.c (LDPR_NUM_KNOWN): Update. * ipa.c (varpool_externally_visible_p): IRONLY variables are never externally visible. * varasm.c (resolution_to_local_definition_p): Add LDPR_PREVAILING_DEF_IRONLY_EXP. (resolution_local_p): Likewise. * common.c (lto_resolution_str): Add new resolution. * common.h (lto_resolution_str): Likewise. Index: lto-plugin/lto-plugin.c === *** lto-plugin/lto-plugin.c (revision 179274) --- lto-plugin/lto-plugin.c (working copy) *** enum symbol_style *** 129,135 static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; ! static ld_plugin_get_symbols get_symbols; static ld_plugin_register_cleanup register_cleanup; static ld_plugin_add_input_file add_input_file; static ld_plugin_add_input_library add_input_library; --- 129,135 static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; ! static ld_plugin_get_symbols get_symbols, get_symbols_v2; static ld_plugin_register_cleanup register_cleanup; static ld_plugin_add_input_file add_input_file; static ld_plugin_add_input_library add_input_library; *** write_resolution (void) *** 441,447 struct plugin_symtab *symtab = info-symtab; struct ld_plugin_symbol *syms = symtab-syms; ! get_symbols (info-handle, symtab-nsyms, syms); finish_conflict_resolution (symtab, info-conflicts); --- 441,452 struct plugin_symtab *symtab = info-symtab; struct ld_plugin_symbol *syms = symtab-syms; ! /* Version 2 of API supports IRONLY_EXP resolution that is ! accepted by GCC-4.7 and newer. */ ! if (get_symbols_v2) ! get_symbols_v2 (info-handle, symtab-nsyms, syms); ! else ! get_symbols (info-handle, symtab-nsyms, syms); finish_conflict_resolution (symtab, info-conflicts); *** onload (struct ld_plugin_tv *tv) *** 986,991 --- 991,999 case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK: register_all_symbols_read = p-tv_u.tv_register_all_symbols_read; break; + case LDPT_GET_SYMBOLS_V2: + get_symbols_v2 = p-tv_u.tv_get_symbols; + break; case LDPT_GET_SYMBOLS: get_symbols = p-tv_u.tv_get_symbols; break; Index: gcc/lto-symtab.c === *** gcc/lto-symtab.c(revision 179274) --- gcc/lto-symtab.c(working copy) *** lto_symtab_resolve_symbols (void **slot) *** 441,452 e-node = cgraph_get_node (e-decl); else if (TREE_CODE (e-decl) == VAR_DECL) e-vnode = varpool_get_node (e-decl); } - e =
Re: Use of vector instructions in memmov/memset expanding
On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote: Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? Especially when glibc these days has very well optimized implementations tuned for various CPUs and it is very unlikely beneficial to inline memcpy/memset if they aren't really short or have unknown number of iterations. I guess we should update the expansion tables so we produce function calls more often. I will look how things behave on my setup. Do you know glibc version numbers when the optimized string functions was introduced? Concerning inline SSE, I think it makes a lot of sense when we know size alignment so we can output just few SSE moves instead of more integer moves. We definitely need some numbers for the loop variants. Honza
Re: Use of vector instructions in memmov/memset expanding
This expanding only works on relatively small sizes (up to 4k), where overhead of library call could be quite significant. In some cases new implementation gives 5x acceleration (especially on small sizes - less than ~256 bytes). Almost on all sizes from 16 to 4096 bytes there is a some gain, in average it's 20-30% on 64-bits and 40-50% on 32-bits (on Atom). This inlining implementation isn't intended to replace glibc, it's intended to replace old implementation which sometimes is quite slow. If glibc-calls turn out to be faster than this expanding, libcall is generated (special experiments were carried out to find threshold values in cost models). If the size is unknown at all, this inlining doesn't work (i.e glibc is called). On 28 September 2011 15:55, Jakub Jelinek ja...@redhat.com wrote: On Wed, Sep 28, 2011 at 04:41:47AM -0700, Andi Kleen wrote: Michael Zolotukhin michael.v.zolotuk...@gmail.com writes: Build and 'make check' was tested. Could you expand a bit on the performance benefits? Where does it help? Especially when glibc these days has very well optimized implementations tuned for various CPUs and it is very unlikely beneficial to inline memcpy/memset if they aren't really short or have unknown number of iterations. Jakub -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: Support for V2 plugin API
On Wed, 28 Sep 2011, Jan Hubicka wrote: Hi, this patch adds support for V2 plugin API (thanks, Cary) that adds LDPR_PREVAILING_DEF_IRONLY_EXP. The reoslution is like LDPR_PREVAILING_DEF_IRONLY but the symbol is exported out of DSO. It is up to the compiler to optimize it out or keep it based on the knowledge whether the symbol can be optimized out at all (i.e. most COMDATs can, other types can't). This solve quite few problems with building C++ APPS, see the PR log. I was originally wrong about gold implementation being buggy. The problem turned out to be subtle lto-symtab bug that was mostly latent because of the COMDAT hack we use. lto_symtab_resolve_symbols is supposed to honor plugin decision when it is available but it doesn't when resolution of very first entry in the list is UNKNOWN. This can happen because we add into symtab also declarations that are not in varpool (i.e. they are neither defined or used by the object file), but they are used otherwise, i.e. referred from stuff used for debug info or TB devirtualization. To ensure backward compatibility I am keeping the COMDAT hack in place. It won't help letting compiler know the plugin API version, since we decide on that at a time we output object files and thus we are not called from plugin. I suppose we could keep the hack in place for next release and remove it afterwards penalizing builds with old binutils? Or perhaps even in GCC 4.7 if GNU LD gets updated in time. Bootstrapped/regtested x86_64-linux, built Mozilla and lto-bootstrap in progress. OK if it passes? Ok. Any idea when GNU ld will catch up? Thanks, Richard. Honza PR lto/47247 * lto-plugin.c (get_symbols_v2): New variable. (write_resolution): Use V2 API when available. (onload): Handle LDPT_GET_SYMBOLS_V2. * lto-symtab.c (lto_symtab_resolve_symbols): Do not resolve when resolution is already availbale from plugin. (lto_symtab_merge_decls_1): Handle LDPR_PREVAILING_DEF_IRONLY_EXP. * cgraph.c (ld_plugin_symbol_resolution): Add prevailing_def_ironly_exp. * lto-cgraph.c (LDPR_NUM_KNOWN): Update. * ipa.c (varpool_externally_visible_p): IRONLY variables are never externally visible. * varasm.c (resolution_to_local_definition_p): Add LDPR_PREVAILING_DEF_IRONLY_EXP. (resolution_local_p): Likewise. * common.c (lto_resolution_str): Add new resolution. * common.h (lto_resolution_str): Likewise. Index: lto-plugin/lto-plugin.c === *** lto-plugin/lto-plugin.c (revision 179274) --- lto-plugin/lto-plugin.c (working copy) *** enum symbol_style *** 129,135 static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; ! static ld_plugin_get_symbols get_symbols; static ld_plugin_register_cleanup register_cleanup; static ld_plugin_add_input_file add_input_file; static ld_plugin_add_input_library add_input_library; --- 129,135 static char *arguments_file_name; static ld_plugin_register_claim_file register_claim_file; static ld_plugin_register_all_symbols_read register_all_symbols_read; ! static ld_plugin_get_symbols get_symbols, get_symbols_v2; static ld_plugin_register_cleanup register_cleanup; static ld_plugin_add_input_file add_input_file; static ld_plugin_add_input_library add_input_library; *** write_resolution (void) *** 441,447 struct plugin_symtab *symtab = info-symtab; struct ld_plugin_symbol *syms = symtab-syms; ! get_symbols (info-handle, symtab-nsyms, syms); finish_conflict_resolution (symtab, info-conflicts); --- 441,452 struct plugin_symtab *symtab = info-symtab; struct ld_plugin_symbol *syms = symtab-syms; ! /* Version 2 of API supports IRONLY_EXP resolution that is ! accepted by GCC-4.7 and newer. */ ! if (get_symbols_v2) ! get_symbols_v2 (info-handle, symtab-nsyms, syms); ! else ! get_symbols (info-handle, symtab-nsyms, syms); finish_conflict_resolution (symtab, info-conflicts); *** onload (struct ld_plugin_tv *tv) *** 986,991 --- 991,999 case LDPT_REGISTER_ALL_SYMBOLS_READ_HOOK: register_all_symbols_read = p-tv_u.tv_register_all_symbols_read; break; + case LDPT_GET_SYMBOLS_V2: + get_symbols_v2 = p-tv_u.tv_get_symbols; + break; case LDPT_GET_SYMBOLS: get_symbols = p-tv_u.tv_get_symbols; break; Index: gcc/lto-symtab.c === *** gcc/lto-symtab.c (revision 179274) --- gcc/lto-symtab.c (working copy) *** lto_symtab_resolve_symbols (void **slot) ***
Updated random seed patchkit
I addressed all review comments, clarified some code. The random seed generation in gcc.c is now fixed too and toplev.c detects this case and doesn't rerun the CRC. Repasses bootstrap and testsuite on x86_64-linux. Since the previous version was approved I will commit in 24h, unless there are new objections. -Andi
[PATCH 2/3] Use urandom to get random seed
From: Andi Kleen a...@linux.intel.com When available use /dev/urandom to get the random seem. This will lower the probability of collisions. On other systems it will fallback to the old methods. Passes bootstrap + testsuite on x86_64. Ok? gcc/: * 2011-09-26 Andi Kleen a...@linux.intel.com * toplev.c (init_local_tick): Try reading random seed from /dev/urandom --- gcc/toplev.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/gcc/toplev.c b/gcc/toplev.c index 78583fc..ab6b5a4 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -262,7 +262,17 @@ init_local_tick (void) { if (!flag_random_seed) { - /* Get some more or less random data. */ + /* Try urandom first. Time of day is too likely to collide. +In case of any error we just use the local tick. */ + + int fd = open (/dev/urandom, O_RDONLY); + if (fd = 0) +{ + read (fd, random_seed, sizeof (random_seed)); + close (fd); +} + + /* Now get the tick anyways */ #ifdef HAVE_GETTIMEOFDAY { struct timeval tv; -- 1.7.5.4
[PATCH 3/3] Use urandom in gcc.c too
From: Andi Kleen a...@linux.intel.com gcc also takes generates a random number in some special circumstances, so teach it about /dev/urandom too. gcc/: 2011-09-27 Andi Kleen a...@linux.intel.com * gcc.c (get_local_tick). Rename to get_random_number. Read from /dev/urandom. Add getpid call. (compare_debug_dump_opt_spec_function): Drop getpid call. --- gcc/gcc.c | 22 -- 1 files changed, 16 insertions(+), 6 deletions(-) diff --git a/gcc/gcc.c b/gcc/gcc.c index ddec8db..3bfdf77 100644 --- a/gcc/gcc.c +++ b/gcc/gcc.c @@ -8062,12 +8062,22 @@ print_asm_header_spec_function (int arg ATTRIBUTE_UNUSED, return NULL; } -/* Compute a timestamp to initialize flag_random_seed. */ +/* Get a random number for -frandom-seed */ -static unsigned -get_local_tick (void) +static unsigned HOST_WIDE_INT +get_random_number (void) { - unsigned ret = 0; + unsigned HOST_WIDE_INT ret = 0; + int fd; + + fd = open (/dev/urandom, O_RDONLY); + if (fd = 0) +{ + read (fd, ret, sizeof (HOST_WIDE_INT)); + close (fd); + if (ret) +return ret; +} /* Get some more or less random data. */ #ifdef HAVE_GETTIMEOFDAY @@ -8086,7 +8096,7 @@ get_local_tick (void) } #endif - return ret; + return ret ^ getpid(); } /* %:compare-debug-dump-opt spec function. Save the last argument, @@ -8145,7 +8155,7 @@ compare_debug_dump_opt_spec_function (int arg, if (!which) { - unsigned HOST_WIDE_INT value = get_local_tick () ^ getpid (); + unsigned HOST_WIDE_INT value = get_random_number (); sprintf (random_seed, HOST_WIDE_INT_PRINT_HEX, value); } -- 1.7.5.4
Re: Use of vector instructions in memmov/memset expanding
Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13.
Re: Use of vector instructions in memmov/memset expanding
Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13. I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13 based, too. I would tend to say that yes and thus would suggest to tamn down inlining that is no longer profitable on newer glibcs with a note in changes.html... (I worry about the tables in i386.c deciding what strategy to use for block of given size. This is more or less unrelated to the actual patch) Honza
Re: [PATCH 2/3] Use urandom to get random seed
On Wed, Sep 28, 2011 at 2:49 PM, Andi Kleen a...@firstfloor.org wrote: From: Andi Kleen a...@linux.intel.com When available use /dev/urandom to get the random seem. This will lower the probability of collisions. On other systems it will fallback to the old methods. Passes bootstrap + testsuite on x86_64. Ok? gcc/: * 2011-09-26 Andi Kleen a...@linux.intel.com * toplev.c (init_local_tick): Try reading random seed from /dev/urandom --- gcc/toplev.c | 12 +++- 1 files changed, 11 insertions(+), 1 deletions(-) diff --git a/gcc/toplev.c b/gcc/toplev.c index 78583fc..ab6b5a4 100644 --- a/gcc/toplev.c +++ b/gcc/toplev.c @@ -262,7 +262,17 @@ init_local_tick (void) { if (!flag_random_seed) { - /* Get some more or less random data. */ + /* Try urandom first. Time of day is too likely to collide. + In case of any error we just use the local tick. */ + + int fd = open (/dev/urandom, O_RDONLY); + if (fd = 0) + { + read (fd, random_seed, sizeof (random_seed)); I suppose we might get interrupted before anything is read and read can return with -1 (I suppose partial reads are quite unlikely though)? Thus, don't we need the usual EINTR loop? If not, the patch is ok. Thanks, Richard. + close (fd); + } + + /* Now get the tick anyways */ #ifdef HAVE_GETTIMEOFDAY { struct timeval tv; -- 1.7.5.4
Re: Vector shuffling
On Thu, Sep 15, 2011 at 8:05 PM, Richard Henderson r...@redhat.com wrote: +The elements of the input vectors are numbered from left to right across +one or both of the vectors. Each element in the mask specifies a number +of element from the input vector(s). Consider the following example. It would be more preferable to talk about the memory ordering of the elements rather than left and right which are ambiguous at best. + if (TREE_CODE (mask) == VECTOR_CST) + { + tree m_type, call; + tree fn = targetm.vectorize.builtin_vec_perm (TREE_TYPE (v0), m_type); + /*rtx t;*/ Leftover crap. Fixed. + + if (!fn) + goto vshuffle; + + if (m_type != TREE_TYPE (TREE_TYPE (mask))) + { + int units = TYPE_VECTOR_SUBPARTS (TREE_TYPE (mask)); + tree cvt = build_vector_type (m_type, units); + mask = fold_convert (cvt, mask); + } + + call = fold_build1 (ADDR_EXPR, build_pointer_type (TREE_TYPE (fn)), fn); + call = build_call_nary (type /* ? */, call, 3, v0, v1, mask); + + return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL); + } + +vshuffle: + gcc_assert (operand_equal_p (v0, v1, 0)); Why can't a non-constant shuffle have different V0 and V1? That seems like a direct violation of the documentation, and any sort of usefulness. Ok, I agree. The reason why this assert is here is that noone in the middle-end generates the code that does not meet this assert. In principle we definitely want to support it in the upcoming patches, but it would be nice to start with a simple thing. Also, while I'm ok with the use of builtin_vec_perm here in the short term, I think that in the long term we should simply force the named pattern to handle constants. Then the vectorizer can simply use the predicate and the tree code and we can drop the large redundancy of builtins with different argument types. Indeed, once this patch is applied, I think that ought to be the very next task in this domain. +/* Vector shuffle expression. A = VEC_SHUFFLE_EXPRv0, v1, maks Typo in mask. + foreach i in length (mask): + A = mask[i] length (v0) ? v0[mask[i]] : v1[mask[i]] Surely it's v1[mask[i] - length]. + if (TREE_CODE (vect) == VECTOR_CST) + { + unsigned i; Indentation is off all through this function. Fixed. + mask = gen_rtx_AND (maskmode, mask, mm); + + /* Convert mask to vector of chars. */ + mask = simplify_gen_subreg (V16QImode, mask, maskmode, 0); + mask = force_reg (V16QImode, mask); Why are you using force_reg to do all the dirty work? Seems to me this should be using expand_normal. All throughout this function. That would also avoid the need for all of the extra force_reg stuff that ought not be there for -O0. I don't really understand this. As far as I know, expand_normal converts tree to rtx. All my computations are happening at the level of rtx and force_reg is needed just to bring an rtx expression to the register of the correct mode. If I am missing something, could you give an example how can I use expand_normal instead of force_reg in this particular code. I also see that you're not even attempting to use xop_pperm. As I said, I am happy to experiment with the cases v0 != v1 in the upcoming patches. Let's just start with a simple thing and see what kind of issues/problems it would bring. Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1? My personal feeling is that it may be the case with v0 != v1, that it would be more efficient to perform piecewise shuffling rather than bitwise dances around the masks. And give the vshuffle named pattern the wrong number of arguments? Ok, If I'll make vshuffle to accept only two arguments -- vector and mask, would it be ok? It's certainly possible to handle it, though it takes a few more steps, and might well be more efficient as a libgcc function rather than inline. I don't really understand why it could be more efficient. I thought that inline gives more chances to the final RTL optimisation. r~ Thanks, Artem.
Re: Use of vector instructions in memmov/memset expanding
(I worry about the tables in i386.c deciding what strategy to use for block of given size. This is more or less unrelated to the actual patch) Yep, the threshold values I mentioned above are the values in these tables. Even with fast glibs there are some cases when inlining is profitable (e.g. if alignment is known at compile time). On 28 September 2011 16:54, Jan Hubicka hubi...@ucw.cz wrote: Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13. I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13 based, too. I would tend to say that yes and thus would suggest to tamn down inlining that is no longer profitable on newer glibcs with a note in changes.html... (I worry about the tables in i386.c deciding what strategy to use for block of given size. This is more or less unrelated to the actual patch) Honza -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: Go patch committed: Update libgo to Go release r60
Rainer Orth r...@cebitec.uni-bielefeld.de writes: Thanks, I'd missed that. It turned out that IRIX 6 needs one more change to return to bootstrap land: sys/termios.h only defines TIOCNOTTY if !_XOPEN_SOURCE, which we need for other stuff (cf. configure.ac). I've cheated and use sys/ttold.h instead, which doesn't have this check. With this patch, a Go-only IRIX 6.5 bootstrap completed successfully. Thanks. Committed to mainline. Ian
Re: Intrinsics for N2965: Type traits and base classes
OK. Here are some simple benchmarks. I simulated heavy use of reflection with 1000 classes that each had about a thousand base classes. I also created a super-simple typelist class templatetypename... T struct typelist {}; // Variadic templates rock If bases returns a typelist, the program takes about 4 sec. If bases returns a tuple, the program takes about 4 min. If I make the program any bigger, the tuple case fails to compile with spurious error messages, while the typelist version stays quick. Given that metaprograms typically create large class hierarchies (look at Alexandrescu's CreateScatterHierarchy that he uses to implement factory in the Modern C++ design book) and that compile times are an enormous obstacle to metaprogramming, I don't think these tests are at all ridiculous. I think this shows we need to return a typelist instead of a tuple. As I mentioned earlier, I could just return the typelist, or hide it by returning an unspecified type (which would actually be a typelist) that you would apply a first and a rest template to walk through. This would give us more flexibility for the future (e.g., if a standard typelist type is adopted. Likewise, we would be covered if wanted to change bases implementation in the future to return an associative container. For example, if using sizegrepA, basesE::type::value to count the number of occurrences of A as a base class of E turns out to be useful). Thanks, Mike On 9/28/2011 6:54 AM, Mike Spertus wrote: Don't worry, I'm not suggesting including boost::mpl at all, just leaving the return type of the bases trait unspecified. IMO, your example illustrates my point that without performance tuning, compiling metaprograms can be prohibitively expensive, so I want to avoid running the tuple metaprogram that creates the fields when we never need to instantiate the type. Benchmarks soon. Mike On 9/28/2011 2:53 AM, Jonathan Wakely wrote: On 28 September 2011 04:22, Michael Spertus wrote: Benjamin, I think tuple is wrong both for performance reasons (I believe these are likely to be serious enough to depress use due to inordinately long compiles) and because it prematurely locks us into a rigid choice of how our typelists are implemented. My inclination is to make it type-independent by returning an unspecified type that can have a sequence of types extracted from it (this is the approach taken by boost::mpl and has loads of experience that shows it is a good approach to metaprogramming). In other words, firstbasesA::type would be the first base of A, etc. Citing Boost MPL as a good way to avoid inordinately long compiles ... interesting! Have you ever tried to reduce a GCC bug report from 20k lines to 20, because most Boost libs include every MPL header?! I hope we can get a simple typelist _without_ needing everything else in MPL, such as the apply and lambda metafunctions (and maybe a lot of that could be massively simplified using variadic templates anyway.) .
Re: Use of vector instructions in memmov/memset expanding
On Wed, Sep 28, 2011 at 02:56:30PM +0400, Michael Zolotukhin wrote: Attached is a part 1 of patch that enables use of vector-instructions in memset and memcopy (middle-end part). The main part of the changes is in functions move_by_pieces/set_by_pieces. In new version algorithm of move-mode selection was changed – now it checks if alignment is known at compile time and uses cost-models to choose between aligned and unaligned vector or not-vector move-modes. Michael, It appears that part 1 of the patch wasn't really attached. Jack -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: [PATCH, ARM] Unaligned accesses for builtin memcpy [2/2]
On 6 May 2011 14:13, Julian Brown jul...@codesourcery.com wrote: Hi, This is the second of two patches to add unaligned-access support to the ARM backend. It builds on the first patch to provide support for unaligned accesses when expanding block moves (i.e. for builtin memcpy operations). It makes some effort to use load/store multiple instructions where appropriate (when accessing sufficiently-aligned source or destination addresses), and also makes some effort to generate fast code (for -O1/2/3) or small code (for -Os), though some of the heuristics may need tweaking still Sorry it's taken me a while to get around to this one. Do you know what difference this makes to performance on some standard benchmarks on let's say an A9 and an M4 as I see that this gets triggered only when we have less than 64 bytes to copy. ? Please add a few testcases from the examples that you've shown here to be sure that ldm's are being generated in the right cases. cheers Ramana
Re: Use of vector instructions in memmov/memset expanding
It appears that part 1 of the patch wasn't really attached. Thanks, resending. memfunc-mid.patch Description: Binary data
[PATCH] Fix PR50460
This extends try_move_mult_to_index folding to be less picky about the outermost array reference and handles a component-ref of an array the same as if that were wrapped with an array-ref with minimum index. This consolidates array-ref reconstruction to the frontend-closest position we have right now (eventually that function should move to some c-family helper I guess). Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2011-09-28 Richard Guenther rguent...@suse.de PR middle-end/50460 * fold-const.c (try_move_mult_to_index): Handle a.array the same as a.array[0]. Index: gcc/fold-const.c === *** gcc/fold-const.c(revision 179308) --- gcc/fold-const.c(working copy) *** try_move_mult_to_index (location_t loc, *** 6870,6875 --- 6870,6929 break; } + else if (TREE_CODE (ref) == COMPONENT_REF + TREE_CODE (TREE_TYPE (ref)) == ARRAY_TYPE) + { + tree domain; + + /* Remember if this was a multi-dimensional array. */ + if (TREE_CODE (TREE_OPERAND (ref, 0)) == ARRAY_REF) + mdim = true; + + domain = TYPE_DOMAIN (TREE_TYPE (ref)); + if (! domain) + continue; + itype = TREE_TYPE (domain); + + step = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (ref))); + if (TREE_CODE (step) != INTEGER_CST) + continue; + + if (s) + { + if (! tree_int_cst_equal (step, s)) + continue; + } + else + { + /* Try if delta is a multiple of step. */ + tree tmp = div_if_zero_remainder (EXACT_DIV_EXPR, op1, step); + if (! tmp) + continue; + delta = tmp; + } + + /* Only fold here if we can verify we do not overflow one +dimension of a multi-dimensional array. */ + if (mdim) + { + tree tmp; + + if (!TYPE_MIN_VALUE (domain) + || !TYPE_MAX_VALUE (domain) + || TREE_CODE (TYPE_MAX_VALUE (domain)) != INTEGER_CST) + continue; + + tmp = fold_binary_loc (loc, PLUS_EXPR, itype, +fold_convert_loc (loc, itype, + TYPE_MIN_VALUE (domain)), +fold_convert_loc (loc, itype, delta)); + if (TREE_CODE (tmp) != INTEGER_CST + || tree_int_cst_lt (TYPE_MAX_VALUE (domain), tmp)) + continue; + } + + break; + } else mdim = false; *** try_move_mult_to_index (location_t loc, *** 6892,6903 pos = TREE_OPERAND (pos, 0); } ! TREE_OPERAND (pos, 1) = fold_build2_loc (loc, PLUS_EXPR, itype, ! fold_convert_loc (loc, itype, !TREE_OPERAND (pos, 1)), ! fold_convert_loc (loc, itype, delta)); ! ! return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret); } --- 6946,6974 pos = TREE_OPERAND (pos, 0); } ! if (TREE_CODE (ref) == ARRAY_REF) ! { ! TREE_OPERAND (pos, 1) ! = fold_build2_loc (loc, PLUS_EXPR, itype, ! fold_convert_loc (loc, itype, TREE_OPERAND (pos, 1)), ! fold_convert_loc (loc, itype, delta)); ! return fold_build1_loc (loc, ADDR_EXPR, TREE_TYPE (addr), ret); ! } ! else if (TREE_CODE (ref) == COMPONENT_REF) ! { ! gcc_assert (ret == pos); ! ret = build4_loc (loc, ARRAY_REF, TREE_TYPE (TREE_TYPE (ref)), ret, ! fold_build2_loc ! (loc, PLUS_EXPR, itype, ! fold_convert_loc (loc, itype, !TYPE_MIN_VALUE ! (TYPE_DOMAIN (TREE_TYPE (ref, ! fold_convert_loc (loc, itype, delta)), ! NULL_TREE, NULL_TREE); ! return build_fold_addr_expr_loc (loc, ret); ! } ! else ! gcc_unreachable (); }
Re: Scalar vector binary operation
On Wed, Aug 10, 2011 at 7:44 AM, Richard Guenther richard.guent...@gmail.com wrote: On Tue, Aug 9, 2011 at 10:23 PM, Artem Shinkarov artyom.shinkar...@gmail.com wrote: Sorry, I didn't attach the patch itself. Here we go, in the attachment. I have committed the patch after re-bootstrapping and testing it on x86_64-unknown-linux-gnu with {,-m32}. This is new functionality for gcc's vector extension, and I think it should be mentioned at http://gcc.gnu.org/gcc-4.7/changes.html. Does somebody want to take care of that? Ian
Commit: RX: Fix problems building libgcc
Hi Guys, I am applying the patch below to fix a couple of problems building libgcc for the RX target. The first is that when 32-bit doubles are enabled we need to make sure that we never try to construct a 64-bit double type. This is done in rx-lib.h, but it was only being enabled when constructing the float-type functions. The patch enables the fix for when double-type functions are being constructed as well. The second fix is to prevent a extraneous renaming of the floatsisf and floatunsisf functions when building libgcc width 32-bit doubles enabled. Cheers Nick libgcc/ChangeLog 2011-09-28 Nick Clifton ni...@redhat.com * config/rx/rx-lib.h: Always restrict doubles to the SF type when 64-bit doubles are not enabled. * config/rx/rx-abi.h: Fix extraneous renaming of the floatsisf and floatunsisf functions. Index: libgcc/config/rx/rx-lib.h === --- libgcc/config/rx/rx-lib.h (revision 179307) +++ libgcc/config/rx/rx-lib.h (working copy) @@ -1,6 +1,5 @@ -#ifdef FLOAT #ifndef __RX_64BIT_DOUBLES__ #define DF SF #define FLOAT_ONLY #endif -#endif + Index: libgcc/config/rx/rx-abi.h === --- libgcc/config/rx/rx-abi.h (revision 179307) +++ libgcc/config/rx/rx-abi.h (working copy) @@ -80,16 +80,7 @@ #endif -#ifdef L_si_to_sf -#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf) -#endif -#ifdef L_usi_to_sf -#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf) -#endif - - - #ifdef __RX_64BIT_DOUBLES__ /* Float (32-bit) aliases... */ @@ -176,6 +167,14 @@ #define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (negdf2, NEGd) #endif +#ifdef L_si_to_sf +#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatsisf, CONV32sf) +#endif + +#ifdef L_usi_to_sf +#define DECLARE_LIBRARY_RENAMES RENAME_LIBRARY (floatunsisf, CONV32uf) +#endif + /* The 64-bit comparison functions do not have aliases because libgcc2 does not provide them. Instead they have to be supplied in rx-abi-functions.c. */
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
Hi Tom What's the behavior of your patch to the following case typedef int int_unaligned __attribute__((aligned(1))); int foo (int_unaligned *p) { return *p; } thanks Carrot On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries vr...@codesourcery.com wrote: Hi Richard, I have a patch for PR43814. It introduces an option that assumes that function arguments of pointer type are aligned, and uses that information in tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined. I tested the patch successfully on-by-default on x86_64 and i686 (both gcc only builds). I also tested the patch on-by-default for ARM (gcc/glibc build). The patch generated wrong code for uselocale.c: ... glibc/locale/locale.h: ... /* This value can be passed to `uselocale' and may be returned by it. Passing this value to any other function has undefined behavior. */ # define LC_GLOBAL_LOCALE ((__locale_t) -1L) ... glibc/locale/uselocale.c: ... locale_t __uselocale (locale_t newloc) { locale_t oldloc = _NL_CURRENT_LOCALE; if (newloc != NULL) { const locale_t locobj = newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc; ... The assumption that function arguments of pointer type are aligned, allowed the test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false. But the usage of ((__locale_t) -1L) as function argument in uselocale violates that assumption. Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without regressions for ARM. Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM, discussed here: - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html But, since glibc uses this construct currently, the option is off-by-default for now. OK for trunk? Thanks, - Tom 2011-09-20 Tom de Vries t...@codesourcery.com PR target/43814 * tree-ssa-ccp.c (get_align_value): New function, factored out of get_value_from_alignment. (get_value_from_alignment): Use get_align_value. (get_value_for_expr): Use get_align_value to handle alignment of function argument pointers. * common.opt (faligned-pointer-argument): New option. * doc/invoke.texi (Optimization Options): Add -faligned-pointer-argument. (-faligned-pointer-argument): New item. * gcc/testsuite/gcc.dg/pr43814.c: New test. * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
Re: Vector Comparison patch
On Mon, Sep 26, 2011 at 5:43 PM, Richard Guenther richard.guent...@gmail.com wrote: On Mon, Sep 26, 2011 at 4:25 PM, Richard Guenther richard.guent...@gmail.com wrote: On Wed, Sep 7, 2011 at 5:06 PM, Joseph S. Myers jos...@codesourcery.com wrote: This looks like it has the same issue with maybe needing to use TYPE_MAIN_VARIANT in type comparisons as the shuffle patch. I don't think so, we move qualifiers to the vector type from the element type in make_vector_type and the tests only look at the component type. I am re-testing the patch currently and will commit it if that succeeds. Unfortunately gcc.c-torture/execute/vector-compare-1.c fails with -m32 for vector (2, double) d0; vector (2, double) d1; vector (2, long) idres; d0 = (vector (2, double)){(double)argc, 10.}; d1 = (vector (2, double)){0., (double)-23}; idres = (d0 d1); as appearantly the type we chose to assign to (d0 d1) is different from that of idres: /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long int' from type '__vector(2) long long int'^M Adjusting it to vector (2, long long) otoh yields, for -m64: /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.c-torture/execute/vector-compare-1.c:118:5: error: incompatible types when assigning to type '__vector(2) long long int' from type '__vector(2) long int' But those two types are at least compatible from their modes. Joseph, should we accept mode-compatible types in assignments or maybe transparently convert them? Looks like we have a more suitable solution for these automatically generated vector types - mark them with TYPE_VECTOR_OPAQUE. I'm testing the following incremental patch. Richard. Index: gcc/c-typeck.c === --- gcc/c-typeck.c.orig 2011-09-28 16:22:10.0 +0200 +++ gcc/c-typeck.c 2011-09-28 16:18:39.0 +0200 @@ -9928,8 +9928,10 @@ build_binary_op (location_t location, en } /* Always construct signed integer vector type. */ - intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)), 0); - result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + intt = c_common_type_for_size (GET_MODE_BITSIZE + (TYPE_MODE (TREE_TYPE (type0))), 0); + result_type = build_opaque_vector_type (intt, + TYPE_VECTOR_SUBPARTS (type0)); converted = 1; break; } @@ -10063,8 +10065,10 @@ build_binary_op (location_t location, en } /* Always construct signed integer vector type. */ - intt = c_common_type_for_size (TYPE_PRECISION (TREE_TYPE (type0)), 0); - result_type = build_vector_type (intt, TYPE_VECTOR_SUBPARTS (type0)); + intt = c_common_type_for_size (GET_MODE_BITSIZE + (TYPE_MODE (TREE_TYPE (type0))), 0); + result_type = build_opaque_vector_type (intt, + TYPE_VECTOR_SUBPARTS (type0)); converted = 1; break; }
[Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING
This patch makes the GCC extension __float128 (_Complex) available in the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX. Additionally, I have improved the diagnostic for explicitly use associating -std= versioned symbols. And I have finally added the iso*.def files to the makefile dependencies. As usual, with -std=f2008, the GNU extensions are not loaded. I have also updated the documentation. OK for the trunk? Tobias PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for C's __float128, please speak up before gfortran - and other compilers implement it. (At least one vendor is implementing __float128 support and plans to modify ISO_C_BINDING.) The proper name would be C___FLOAT128, but that looks awkward! 2011-09-28 Tobias Burnus bur...@net-b.de * Make-lang.in (F95_PARSER_OBJS, GFORTRAN_TRANS_DEPS): Add dependency on iso-c-binding.def and iso-fortran-env.def. * module.c (import_iso_c_binding_module): Add error when explicitly importing a nonstandard symbol; extend standard- depending loading. * iso-c-binding.def: Add c_float128 and c_float128_complex integer parameters (for -std=gnu). * intrinsic.texi (ISO_C_Binding): Document them. * symbol.c (generate_isocbinding_symbol): Change macros to ignore GFC_STD_* data. * trans-types.c (gfc_init_c_interop_kinds): Ditto; make nonstatic and renamed from init_c_interop_kinds. (gfc_init_kinds): Don't call it * trans-types.h (gfc_init_c_interop_kinds): Add prototype. * f95-lang.c (gfc_init_decl_processing): Call it. 2011-09-28 Tobias Burnus bur...@net-b.de * gfortran.dg/iso_c_binding_param_1.f90: New. * gfortran.dg/iso_c_binding_param_2.f90: New. * gfortran.dg/c_sizeof_2.f90: Update dg-error. diff --git a/gcc/fortran/Make-lang.in b/gcc/fortran/Make-lang.in index 0816458..b766da6 100644 --- a/gcc/fortran/Make-lang.in +++ b/gcc/fortran/Make-lang.in @@ -329,14 +329,16 @@ $(F95_PARSER_OBJS): fortran/gfortran.h fortran/libgfortran.h \ fortran/parse.h fortran/arith.h fortran/target-memory.h \ $(CONFIG_H) $(SYSTEM_H) $(TM_H) $(TM_P_H) coretypes.h \ $(RTL_H) $(TREE_H) $(TREE_DUMP_H) $(GGC_H) $(EXPR_H) \ - $(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) + $(FLAGS_H) output.h $(DIAGNOSTIC_H) errors.h $(FUNCTION_H) \ + fortran/iso-c-binding.def fortran/iso-fortran-env.def fortran/openmp.o: pointer-set.h $(TARGET_H) toplev.h GFORTRAN_TRANS_DEPS = fortran/gfortran.h fortran/libgfortran.h \ fortran/intrinsic.h fortran/trans-array.h \ fortran/trans-const.h fortran/trans-const.h fortran/trans.h \ fortran/trans-stmt.h fortran/trans-types.h \ -$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H) +$(CONFIG_H) $(SYSTEM_H) $(TREE_H) $(TM_H) coretypes.h $(GGC_H) \ +fortran/iso-c-binding.def fortran/iso-fortran-env.def fortran/f95-lang.o: $(GFORTRAN_TRANS_DEPS) fortran/mathbuiltins.def \ gt-fortran-f95-lang.h gtype-fortran.h $(CGRAPH_H) $(TARGET_H) fortran/cpp.h \ diff --git a/gcc/fortran/f95-lang.c b/gcc/fortran/f95-lang.c index 648831f..8f8dd7d 100644 --- a/gcc/fortran/f95-lang.c +++ b/gcc/fortran/f95-lang.c @@ -595,6 +595,7 @@ gfc_init_decl_processing (void) /* Set up F95 type nodes. */ gfc_init_kinds (); gfc_init_types (); + gfc_init_c_interop_kinds (); } diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index 54e0b20..1bd5ec3 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -610,8 +610,8 @@ iso_fortran_env_symbol; #undef NAMED_DERIVED_TYPE #define NAMED_INTCST(a,b,c,d) a, -#define NAMED_REALCST(a,b,c) a, -#define NAMED_CMPXCST(a,b,c) a, +#define NAMED_REALCST(a,b,c,d) a, +#define NAMED_CMPXCST(a,b,c,d) a, #define NAMED_LOGCST(a,b,c) a, #define NAMED_CHARKNDCST(a,b,c) a, #define NAMED_CHARCST(a,b,c) a, diff --git a/gcc/fortran/intrinsic.texi b/gcc/fortran/intrinsic.texi index 9adeeab..a093bec 100644 --- a/gcc/fortran/intrinsic.texi +++ b/gcc/fortran/intrinsic.texi @@ -13006,7 +13006,9 @@ type default integer, which can be used as KIND type parameters. In addition to the integer named constants required by the Fortran 2003 standard, GNU Fortran provides as an extension named constants for the 128-bit integer types supported by the C compiler: @code{C_INT128_T, -C_INT_LEAST128_T, C_INT_FAST128_T}. +C_INT_LEAST128_T, C_INT_FAST128_T}. Furthermore, if @code{__float} is +supported in C, the named constants @code{C_FLOAT128, C_FLOAT128_COMPLEX} +are defined. @multitable @columnfractions .15 .35 .35 .35 @item Fortran Type @tab Named constant @tab C type@tab Extension @@ -13036,9 +13038,11 @@ C_INT_LEAST128_T, C_INT_FAST128_T}. @item @code{REAL} @tab @code{C_FLOAT} @tab @code{float} @item @code{REAL} @tab @code{C_DOUBLE}@tab @code{double} @item @code{REAL} @tab @code{C_LONG_DOUBLE} @tab @code{long double} +@item @code{REAL} @tab @code{C_FLOAT128} @tab @code{__float128}@tab Ext. @item @code{COMPLEX}@tab
Commit: RX: Add support for MIN and MAX instructions in QI and HI modes
Hi Guys, I am going to apply the patch below to the RX backend to add support for generating MIN and MAX instructions for HI and QI modes. Cheers Nick gcc/ChangeLog 2011-09-28 Nick Clifton ni...@redhat.com * config/rx/predicates.md (rx_minmax_operand): New predicate. Accepts immediates and a restricted subset of MEMs. * config/rx/rx.md (int_modes): New iterator. (smaxsi3, sminsi3): Delete and replace with... (smaxint_mode3, sminint_mode3): New patterns. (umax3_u, umax3_ur, umax3, umin3): New patterns. Index: gcc/config/rx/predicates.md === --- gcc/config/rx/predicates.md (revision 179307) +++ gcc/config/rx/predicates.md (working copy) @@ -72,6 +72,16 @@ (match_operand 0 rx_restricted_mem_operand)) ) +;; Check that the operand is suitable as the source operand +;; for a min/max instruction. This is the same as +;; rx_source_operand except that CONST_INTs are allowed but +;; REGs and SUBREGs are not. + +(define_predicate rx_minmaxex_operand + (ior (match_operand 0 immediate_operand) + (match_operand 0 rx_restricted_mem_operand)) +) + ;; Return true if OP is a store multiple operation. This looks like: ;; ;; [(set (SP) (MINUS (SP) (INT))) Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 179307) +++ gcc/config/rx/rx.md (working copy) @@ -22,6 +22,9 @@ ;; This code iterator is used for sign- and zero- extensions. (define_mode_iterator small_int_modes [(HI ) (QI )]) +;; This code iterator is used for max and min operations. +(define_mode_iterator int_modes [(SI ) (HI ) (QI )]) + ;; We do not handle DFmode here because it is either ;; the same as SFmode, or if -m64bit-doubles is active ;; then all operations on doubles have to be handled by @@ -1160,28 +1163,109 @@ (set_attr timings 22,44)] ) -(define_insn smaxsi3 - [(set (match_operand:SI 0 register_operand =r,r,r,r,r,r) - (smax:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0) -(match_operand:SI 2 rx_source_operand - r,Sint08,Sint16,Sint24,i,Q)))] +(define_insn smaxint_modes:mode3 + [(set (match_operand:int_modes 0 register_operand =r,r,r,r,r,r) + (smax:int_modes (match_operand:int_modes 1 register_operand %0,0,0,0,0,0) + (match_operand:int_modes 2 rx_source_operand + r,Sint08,Sint16,Sint24,i,Q)))] max\t%Q2, %0 [(set_attr length 3,4,5,6,7,6) (set_attr timings 11,11,11,11,11,33)] ) -(define_insn sminsi3 - [(set (match_operand:SI 0 register_operand =r,r,r,r,r,r) - (smin:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0) -(match_operand:SI 2 rx_source_operand - r,Sint08,Sint16,Sint24,i,Q)))] +(define_insn sminint_modes:mode3 + [(set (match_operand:int_modes 0 register_operand =r,r,r,r,r,r) + (smin:int_modes (match_operand:int_modes 1 register_operand %0,0,0,0,0,0) +(match_operand:int_modes2 rx_source_operand + r,Sint08,Sint16,Sint24,i,Q)))] min\t%Q2, %0 [(set_attr length 3,4,5,6,7,6) (set_attr timings 11,11,11,11,11,33)] ) +(define_insn umaxsmall_int_modes:mode3_u + [(set (match_operand:SI 0 register_operand =r,r,r,r,r,r) + (smax:SI (match_operand:SI 1 register_operand %0,0,0,0,0,0) +(zero_extend:SI (match_operand:small_int_modes 2 rx_minmaxex_operand + r,Sint08,Sint16,Sint24,i,Q] + + max\t%R2, %0 + [(set_attr length 3,4,5,6,7,6) + (set_attr timings 11,11,11,11,11,33)] +) + +(define_insn uminsmall_int_modes:mode3_ur + [(set (match_operand:SI 0 register_operand =r,r,r,r,r,r) + (smin:SI (zero_extend:SI (match_operand:small_int_modes 2 rx_minmaxex_operand + r,Sint08,Sint16,Sint24,i,Q)) +(match_operand:SI 1 register_operand %0,0,0,0,0,0)))] + + min\t%R2, %0 + [(set_attr length 3,4,5,6,7,6) + (set_attr timings 11,11,11,11,11,33)] +) + +(define_insn umaxsmall_int_modes:mode3_ur + [(set (match_operand:SI 0 register_operand =r,r,r,r,r,r) + (smax:SI (zero_extend:SI (match_operand:small_int_modes 2 rx_minmaxex_operand + r,Sint08,Sint16,Sint24,i,Q)) +(match_operand:SI 1 register_operand %0,0,0,0,0,0)))] + + max\t%R2, %0 + [(set_attr length 3,4,5,6,7,6) + (set_attr timings 11,11,11,11,11,33)] +) + +(define_expand umaxsmall_int_modes:mode3 + [(set (match_dup 4) + (zero_extend:SI (match_operand:small_int_modes 1 register_operand %0,0,0,0,0,0))) + (set (match_dup
Re: Vector shuffling
On 09/28/2011 05:59 AM, Artem Shinkarov wrote: I don't really understand this. As far as I know, expand_normal converts tree to rtx. All my computations are happening at the level of rtx and force_reg is needed just to bring an rtx expression to the register of the correct mode. If I am missing something, could you give an example how can I use expand_normal instead of force_reg in this particular code. Sorry, I meant expand_(simple_)binop. Is ssse3_pshufb why you do the wrong thing in the expander for v0 != v1? My personal feeling is that it may be the case with v0 != v1, that it would be more efficient to perform piecewise shuffling rather than bitwise dances around the masks. Maybe for V2DI and V2DFmode, but probably not otherwise. We can perform the double-word shuffle in 12 insns; 10 for SSE 4.1. Example assembly attached. It's certainly possible to handle it, though it takes a few more steps, and might well be more efficient as a libgcc function rather than inline. I don't really understand why it could be more efficient. I thought that inline gives more chances to the final RTL optimisation. We'll not be able to optimize this at the rtl level. There are too many UNSPEC instructions in the way. In any case, even if that weren't so we'd only be able to do useful optimization for a constant permutation. And we should have been able to prove that at the gimple level. r~ .data .align 16 vec3: .long 3,3,3,3 vec4: .long 4,4,4,4 dup4: .byte 0,0,0,0, 4,4,4,4, 8,8,8,8, 12,12,12,12 ofs4: .byte 0,1,2,3, 0,1,2,3, 0,1,2,3, 0,1,2,3 .text shuffle2: // Convert the low bits of the mask to a shuffle movdqa %xmm2, %xmm3 pandvec3, %xmm3 pmulld vec4, %xmm3 pshufb dup4, %xmm3 paddb ofs4, %xmm3 // Shuffle both inputs pshufb %xmm3, %xmm0 pshufb %xmm3, %xmm1 // Select and merge the inputs // Use ix86_expand_int_vcond for use of pblendvb for SSE4_1. pandvec4, %xmm2 pcmpeqd vec4, %xmm2 pand%xmm2, %xmm1 pandn %xmm2, %xmm0 por %xmm1, %xmm0 ret
Re: Scalar vector binary operation
Artem Shinkarov artyom.shinkar...@gmail.com writes: I can try to put a description in the document. I am not sure that I have rights to commit to the svn, but at least I can try to write the text. There are also pending patches for vector-comparison (almost submitted) and vector shuffling (still under discussion), but I hope to finish both of them until the new release is coming. Could you point out where the cnhanges.html is located in svn? It's actually still in CVS, at gcc.gnu.org:/cvs/gcc. Ian
Re: [4/4] Make SMS schedule register moves
Ayal Zaks ayal.z...@gmail.com writes: + /* The cyclic lifetime of move-new_reg starts and ends at move-def + (the instruction that defines move-old_reg). So instruction I_REG_MOVE (new_reg=reg) must be scheduled before the next I_MUST_FOLLOW move/original-def (due to anti-dependence: it overwrites reg), but after the previous instance of I_MUST_FOLLOW (due to true dependence; i.e. account for latency also). Why do moves, except for the one closest to move-def (which is directly dependent upon it, i.e. for whom move-def == I_MUST_FOLLOW), have to worry about move-def at all? (Or have their cyclic lifetimes start and end there?) Because the uses of new_reg belong to the same move-def based cycle. the cycle (overloaded term; rather iteration in this context) to which the uses belong, is inferred from the cycle (absolute schedule time) in which they are scheduled, regardless of move-def. Just to prove your point about cycle being an overloaded term: I wasn't actually meaning it in the sense of (loop) iteration. I meant a circular window based on move-def. So (I think this is the uncontroversial bit): [M1] must be scheduled cyclically before [B] and cyclically after [C], with the cycle based at [B]: row 3 after [B]: empty row 4: [C] row 5: [D] row 0: empty row 1: empty row 2: [A] row 3 before [B]: empty [M1] could therefore go in row 1. This part is OK. Here's how I see it: [M1] feeds [C] which is scheduled at cycle 10, so it must be scheduled before cycle 10-M_latency and after cycle 10-ii. [M1] uses the result of [B] which is scheduled at cycle 3, so must be scheduled after cycle 3+B_latency and before cycle 3+ii. Taking all latencies to be 1 and ii=6, this yields a scheduling window of cycles [4,9]\cap[4,9]=[4,9]; if scheduled at cycle 4 it must_follow [C], if scheduled at cycle 9 it must_precede [B]. This is identical to the logic behind the sched_window of any instruction, based on its dependencies (as you've updated not too long ago..), if we do not allow reg_moves (and arguably, one should not allow reg_moves when scheduling reg_moves...). To address the potential erroneous scenario of Loop 2, suppose [A] is scheduled as in the beginning in cycle 20, and that [M1] is scheduled in cycle 7 (\in[4,9]). Then [M2] feeds [D] and [A] which are scheduled at cycles 17 and 20, so it must be scheduled before cycle 17-1 and after cycle 20-6. [M2] uses the result of [M1], so must be scheduled after cycle 7+1 and before cycle 7+6. This yields the desired [14,16]\cap[8,13]=\emptyset. I agree it's natural to schedule moves for intra-iteration dependencies in the normal get_sched_window way. But suppose we have a dependency: A --(T,N,1)-- B that requires two moves M1 and M2. If we think in terms of cycles (in the SCHED_TIME sense), then this effectively becomes: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B because it is now M1 that is fed by both the loop and the incoming edge. But if there is a second dependency: A --(T,M,0)-- C that also requires two moves, we end up with: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B --(T,M3,-1)-- B and dependence distances of -1 feel a bit weird. :-) Of course, what we really have are two parallel dependencies: A --(T,N1,1)-- M1 --(T,N2,0)-- M2 --(T,N3,0)-- B A --(T,M1,0)-- M1' --(T,M2,0)-- M2' --(T,N3,0)-- B where M1' and M2' occupy the same position as M1 and M2 in the schedule, but are one stage further along. But we only schedule them once, so if we take the cycle/SCHED_TIME route, we have to introduce dependencies of distance -1. Another reason why it seemed a little confusing to think in terms of cycles (in the SCHED_TIME sense) was that, by this stage, we were already thinking in terms of rows and columns to some extent: /* If dest precedes src in the schedule of the kernel, then dest will read before src writes and we can save one reg_copy. */ if (SCHED_ROW (e-dest-cuid) == SCHED_ROW (e-src-cuid) SCHED_COLUMN (e-dest-cuid) SCHED_COLUMN (e-src-cuid)) nreg_moves4e--; However... Also note that if moves are assigned absolute cycles, it becomes clear to which stage each belongs (just like any other instruction), regulating their generation in prolog and epilog. ...I have to concede that it makes the stage calculation easier, and that that tips the balance. (Although again, a move can belong to two stages simultanouesly.) Anyway, here's an updated patch that uses cycle times. I've also dropped the code that tried to allow windows to start and end on the same row (and therefore schedule either side of that row). I thought it might help with some VLIWy DFAs, but it was always going to be of minor benefit, and we don't try that elsewhere, so let's avoid the complication.
Re: [3/4] SMS: Record moves in the partial schedule
Ayal Zaks ayal.z...@gmail.com writes: Only request is to document that the register moves are placed/assigned-id's in a specific order. I suppose this is the downside of splitting the patches up, sorry, but the ids are only ordered for the throw-away loop: FOR_EACH_VEC_ELT_REVERSE (ps_reg_move_info, ps-reg_moves, i, move) add_insn_before (move-insn, ps_first_note (ps, move-def), NULL); and for the prologue/epilogue code. Both are replaced in patch 4 with code that doesn't rely on the ordering of ids. Ok then, very well. I was mostly referring to the following in prologue/epiloque code, which merely relies on assigning all regmoves of a node consecutive id's: FWIW, the 4/4 that I just posted did finally get rid of the first_reg_move nreg_moves fields. Here's a slightly updated patch in line with your 4/4 comments. The move-def is now always the id of the predecessor, rather than the id of the original ddg node producer. I've adapted the throw-away loop accordingly, but there are no other changes. Tested on powerpc64-linux-gnu and with ARM benchmarks. Richard gcc/ * modulo-sched.c (ps_insn): Adjust comment. (ps_reg_move_info): New structure. (partial_schedule): Add reg_moves field. (SCHED_PARAMS): Use node_sched_param_vec instead of node_sched_params. (node_sched_params): Turn first_reg_move into an identifier. (ps_reg_move): New function. (ps_rtl_insn): Cope with register moves. (ps_first_note): Adjust comment and assert that the instruction isn't a register move. (node_sched_params): Replace with... (node_sched_param_vec): ...this vector. (set_node_sched_params): Adjust accordingly. (print_node_sched_params): Take a partial schedule instead of a ddg. Use ps_rtl_insn and ps_reg_move. (generate_reg_moves): Rename to... (schedule_reg_moves): ...this. Remove rescan parameter. Record each move in the partial schedule, but don't emit it here. Don't perform register substitutions here either. (apply_reg_moves): New function. (duplicate_insns_of_cycles): Use register indices directly, rather than finding instructions using PREV_INSN. Use ps_reg_move. (sms_schedule): Call schedule_reg_moves before committing to a partial schedule. Try the next ii if the schedule fails. Use apply_reg_moves instead of generate_reg_moves. Adjust call to print_node_sched_params. Free node_sched_param_vec instead of node_sched_params. (create_partial_schedule): Initialize reg_moves. (free_partial_schedule): Free reg_moves. Index: gcc/modulo-sched.c === --- gcc/modulo-sched.c 2011-09-28 09:03:10.334301485 +0100 +++ gcc/modulo-sched.c 2011-09-28 11:24:26.530300781 +0100 @@ -124,7 +124,9 @@ #define PS_STAGE_COUNT(ps) (((partial_sc /* A single instruction in the partial schedule. */ struct ps_insn { - /* The number of the ddg node whose instruction is being scheduled. */ + /* Identifies the instruction to be scheduled. Values smaller than + the ddg's num_nodes refer directly to ddg nodes. A value of + X - num_nodes refers to register move X. */ int id; /* The (absolute) cycle in which the PS instruction is scheduled. @@ -137,6 +139,30 @@ struct ps_insn }; +/* Information about a register move that has been added to a partial + schedule. */ +struct ps_reg_move_info +{ + /* The source of the move is defined by the ps_insn with id DEF. + The destination is used by the ps_insns with the ids in USES. */ + int def; + sbitmap uses; + + /* The original form of USES' instructions used OLD_REG, but they + should now use NEW_REG. */ + rtx old_reg; + rtx new_reg; + + /* An instruction that sets NEW_REG to the correct value. The first + move associated with DEF will have an rhs of OLD_REG; later moves + use the result of the previous move. */ + rtx insn; +}; + +typedef struct ps_reg_move_info ps_reg_move_info; +DEF_VEC_O (ps_reg_move_info); +DEF_VEC_ALLOC_O (ps_reg_move_info, heap); + /* Holds the partial schedule as an array of II rows. Each entry of the array points to a linked list of PS_INSNs, which represents the instructions that are scheduled for that row. */ @@ -148,6 +174,10 @@ struct partial_schedule /* rows[i] points to linked list of insns scheduled in row i (0=iii). */ ps_insn_ptr *rows; + /* All the moves added for this partial schedule. Index X has + a ps_insn id of X + g-num_nodes. */ + VEC (ps_reg_move_info, heap) *reg_moves; + /* rows_length[i] holds the number of instructions in the row. It is used only (as an optimization) to back off quickly from trying to schedule a node in a full row; that is, to avoid running @@ -201,7 +231,7 @@ static void remove_node_from_ps (partial #define NODE_ASAP(node)
Re: Use of vector instructions in memmov/memset expanding
On Wed, Sep 28, 2011 at 02:54:34PM +0200, Jan Hubicka wrote: Do you know glibc version numbers when the optimized string functions was introduced? Afaik, it's 2.13. I also compared my implementation to 2.13. I wonder if we can assume that most of GCC 4.7 based systems will be glibc 2.13 based, too. I would tend to say that yes and thus would suggest to tamn down inlining that is no longer profitable on newer glibcs with a note in changes.html... You could add a check to configure and generate based on that? BTW I know that the tables need tuning for Nehalem and Sandy Bridge too. Michael are you planning to do that? -Andi
Re: [PATCH 2/3] Use urandom to get random seed
I suppose we might get interrupted before anything is read and read can return with -1 (I suppose partial reads are quite unlikely though)? Thus, don't we need the usual EINTR loop? When we get interrupted gcc will die. I don't think gcc handles any asynchronous signals, right? -Andi -- a...@linux.intel.com -- Speaking for myself only.
[Patch,AVR]: Better log output with -mdeb
This is a tentative patch for better support of logging information for avr BE developers. There are situations where it is more convenient to let the compiler produce information than to debug into the compiler. One example are -da dumps. This patch proposes a better support to print information by means of a printf-like function via %-codes. The current debug output with avr-gcc's option -mdeb produces bulk of information that is very hard to read because - there is much output - there is no information on current_function_decl - there is no information on current pass name/number - there is no print-like function so the trees, rtxes so that it is tedious to get formatted output. For example, the following call to avr_edump static int avr_OS_main_function_p (tree func) { avr_edump (%?: %t\n, func); return avr_lookup_function_attribute1 (func, OS_main); } prints additional information in a convenient way (foo is function to be compiled): avr_OS_main_function_p[foo:pro_and_epilogue(202)]: function_decl # foo type function_type # type void_type # void VOID ... Wondering that similar functionality is not provided by GCC itself, I wrote this patch. GCC's diagnostic seems to be overwhelmingly complicated and not intended for the purpose mentioned above. The code is dead code at the moment. No function in the BE uses the new functions. This patch just adds them. Moreover; I don't know if avr port maintainers or global reviewers like such stuff in the compiler... Therefore it's just tentative draft. Supported %-codes are: r: rtx t: tree T: tree (brief) C: enum rtx_code m: enum machine_mode R: enum reg_class L: insn list H: location_t -- no args -- A: call abort() f: current_function_name() F: caller (via __FUNCTION__) P: Pass name and number ?: Print caller, current function and pass info -- same as printf -- %: % c: char s: string d: int (decimal) x: int (hex) These codes cover great deal of BE developers needs and if something is missing it can be added easily. The calling itself is not as straight forward as it could be in the presence of variadic macros (to get __FUNCTION__), but that should not matter here. Johann * config/avr/avr-protos.h (avr_edump, avr_fdump): New macros. (avr__set_caller_e, avr__set_caller_f): New prototypes. * config/avr/avr.c: Include tree-pass.h. (avr__caller, avr__stream): New static variables. (avr__fdump_e, avr__fdump_f, avr__vadump): New static functions. (avr__set_caller_e, avr__set_caller_f): New functions. Index: config/avr/avr-protos.h === --- config/avr/avr-protos.h (revision 179257) +++ config/avr/avr-protos.h (working copy) @@ -111,3 +111,9 @@ extern rtx avr_incoming_return_addr_rtx #ifdef REAL_VALUE_TYPE extern void asm_output_float (FILE *file, REAL_VALUE_TYPE n); #endif + +#define avr_edump (avr__set_caller_e (__FUNCTION__)) +#define avr_fdump (avr__set_caller_f (__FUNCTION__)) + +extern int (*avr__set_caller_e (const char*))(const char*, ...); +extern int (*avr__set_caller_f (const char*))(FILE*, const char*, ...); Index: config/avr/avr.c === --- config/avr/avr.c (revision 179257) +++ config/avr/avr.c (working copy) @@ -7810,5 +7810,233 @@ avr_expand_builtin (tree exp, rtx target gcc_unreachable (); } + +/*** + ** Logging, for BE developers only + ***/ + +#include tree-pass.h + +static const char* avr__caller = ?; +static FILE* avr__stream; + +static void avr__vadump (FILE*, const char*, va_list); + +static int +avr__fdump_f (FILE *stream, const char* fmt, ...) +{ + va_list ap; + + va_start (ap, fmt); + if (stream) +avr__vadump (stream, fmt, ap); + va_end (ap); + + return 1; +} + +int (* +avr__set_caller_f (const char* caller) + )(FILE*, const char*, ...) +{ + avr__caller = caller; + + return avr__fdump_f; +} + +static int +avr__fdump_e (const char* fmt, ...) +{ + va_list ap; + + va_start (ap, fmt); + avr__vadump (stderr, fmt, ap); + va_end (ap); + + return 1; +} + +int (* +avr__set_caller_e (const char* caller) + )(const char*, ...) +{ + avr__caller = caller; + avr__stream = stderr; + + return avr__fdump_e; +} + +/* + -- known %-codes -- + + r: rtx + t: tree + T: tree (brief) + C: enum rtx_code + m: enum machine_mode + R: enum reg_class + L: insn list + H: location_t + + -- no args -- + A: call abort() + f: current_function_name() + F: caller (via __FUNCTION__) + P: Pass name and number + ?: Print caller, current function and pass info + + -- same as printf -- + %: % + c: char + s: string + d: int (decimal) + x: int (hex) +*/ + +static void +avr__vadump (FILE *file, const char *fmt, va_list ap)
Re: Use of vector instructions in memmov/memset expanding
You could add a check to configure and generate based on that? Do you mean check if glibc is newer than 2.13? I think that when new glibc version is released, the tables should be re-examined anyway - we shouldn't just stop inlining, or stop generating libcalls. BTW I know that the tables need tuning for Nehalem and Sandy Bridge too. Michael are you planning to do that? There is no separate cost-table for Nehalem or SandyBridge - however, I tuned generic32 and generic64 tables, that should improve performance on modern processors. In old version REP-MOV was used - it turns out to be slower than SSE-moves or libcalls (in my version the fastest from these options is used).
Re: PowerPC shrink-wrap support 3 of 3
This supercedes http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01004.html and http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01593.html, fixing the two regressions introduced by those patches. The first patch is unchanged except to leave all the out-of-line restore functions using return rather than simple_return. We don't want these being confused with a plain simple_return and perhaps used by the shrink- wrapping to return from pre-prologue code. The second of these two patches was way too simplistic. It was a real pain getting the cfa_restores correct. A lot were missing, or emitted at the wrong place (due to bug in rs6000_emit_stack_reset). I also had the real restore insn move past the cfa_restores (mtlr 0 insn scheduled over loads from stack). * config/rs6000/rs6000.c (rs6000_make_savres_rtx): Delete unneeded declaration. (rs6000_emit_stack_reset): Only return insn emitted when it adjusts sp. (rs6000_make_savres_rtx): Rename to rs6000_emit_savres_rtx. Use simple_return in pattern, emit instruction, and set jump_label. (rs6000_emit_prologue): Update for rs6000_emit_savres_rtx. Use simple_return rather than return. (emit_cfa_restores): New function. (rs6000_emit_epilogue): Emit cfa_restores when flag_shrink_wrap. Add missing cfa_restores for SAVE_WORLD. Add missing LR cfa_restore when using out-of-line gpr restore. Add missing LR and FP regs cfa_restores for out-of-line fpr restore. Consolidate code setting up cfa_restores. Formatting. Use LR_REGNO define. (rs6000_output_mi_thunk): Use simple_return rather than return. * config/rs6000/rs6000.md (sibcall*, sibcall_value*): Likewise. (return_internal*): Likewise. (any_return, return_pred, return_str): New iterators. (return, conditional return insns): Provide both return and simple_return variants. * gcc/config/rs6000/rs6000.h (EARLY_R12, LATE_R12): Define. (REG_ALLOC_ORDER): Move r12 before call-saved regs when FIXED_R13. Move r11 and r0 later to suit shrink-wrapping. Index: gcc/config/rs6000/rs6000.c === --- gcc/config/rs6000/rs6000.c (revision 178876) +++ gcc/config/rs6000/rs6000.c (working copy) @@ -899,8 +900,6 @@ static const char *rs6000_mangle_type (c static void rs6000_set_default_type_attributes (tree); static rtx rs6000_savres_routine_sym (rs6000_stack_t *, bool, bool, bool); static rtx rs6000_emit_stack_reset (rs6000_stack_t *, rtx, rtx, int, bool); -static rtx rs6000_make_savres_rtx (rs6000_stack_t *, rtx, int, - enum machine_mode, bool, bool, bool); static bool rs6000_reg_live_or_pic_offset_p (int); static tree rs6000_builtin_vectorized_libmass (tree, tree, tree); static tree rs6000_builtin_vectorized_function (tree, tree, tree); @@ -19704,8 +19728,10 @@ rs6000_emit_stack_reset (rs6000_stack_t if (sp_offset != 0) { rtx dest_reg = savres ? gen_rtx_REG (Pmode, 11) : sp_reg_rtx; - return emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, - GEN_INT (sp_offset))); + rtx insn = emit_insn (gen_add3_insn (dest_reg, frame_reg_rtx, + GEN_INT (sp_offset))); + if (!savres) + return insn; } else if (!savres) return emit_move_insn (sp_reg_rtx, frame_reg_rtx); @@ -19729,10 +19755,11 @@ rs6000_emit_stack_reset (rs6000_stack_t } /* Construct a parallel rtx describing the effect of a call to an - out-of-line register save/restore routine. */ + out-of-line register save/restore routine, and emit the insn + or jump_insn as appropriate. */ static rtx -rs6000_make_savres_rtx (rs6000_stack_t *info, +rs6000_emit_savres_rtx (rs6000_stack_t *info, rtx frame_reg_rtx, int save_area_offset, enum machine_mode reg_mode, bool savep, bool gpr, bool lr) @@ -19742,6 +19769,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * int reg_size = GET_MODE_SIZE (reg_mode); rtx sym; rtvec p; + rtx par, insn; offset = 0; start_reg = (gpr @@ -19755,7 +19783,7 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, offset++) = ret_rtx; RTVEC_ELT (p, offset++) -= gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, 65)); += gen_rtx_CLOBBER (VOIDmode, gen_rtx_REG (Pmode, LR_REGNO)); sym = rs6000_savres_routine_sym (info, savep, gpr, lr); RTVEC_ELT (p, offset++) = gen_rtx_USE (VOIDmode, sym); @@ -19788,7 +19816,16 @@ rs6000_make_savres_rtx (rs6000_stack_t * RTVEC_ELT (p, i + offset) = gen_rtx_SET (VOIDmode, mem, reg); } - return gen_rtx_PARALLEL (VOIDmode, p); + par = gen_rtx_PARALLEL (VOIDmode, p); + + if (!savep lr) +{ + insn = emit_jump_insn (par); + JUMP_LABEL (insn) = ret_rtx; +
[RFA/ARM][Patch 01/02]: Thumb2 epilogue in RTL
Hi! This patch generates Thumb2 epilogues in RTL form. The work involves defining new functions, predicates and patterns along with few changes in existing code: * The load_multiple_operation predicate was found to be too restrictive for integer loads as it required consecutive destination regs, so this restriction was lifted. * Variations of load_multiple_operation were required to handle cases - where SP must be the base register - where FP values were being loaded (which do require consecutive destination registers) - where PC can be in register-list (which requires return pattern along with register loads). Hence, the common code was factored out into a new function in arm.c and parameterised to show - whether consecutive destination regs are needed - the data type being loaded - whether the base register has to be SP - whether PC is in register-list The patch is tested with arm-eabi with no regressions. ChangeLog: 2011-09-28 Ian Bolton ian.bol...@arm.com Sameera Deshpande sameera.deshpa...@arm.com * config/arm/arm-protos.h (load_multiple_operation_p): New declaration. (thumb2_expand_epilogue): Likewise. (thumb2_output_return): Likewise (thumb2_expand_return): Likewise. (thumb_unexpanded_epilogue): Rename to... (thumb1_unexpanded_epilogue): ...this * config/arm/arm.c (load_multiple_operation_p): New function. (thumb2_emit_multi_reg_pop): Likewise. (thumb2_emit_vfp_multi_reg_pop): Likewise. (thumb2_expand_return): Likewise. (thumb2_expand_epilogue): Likewise. (thumb2_output_return): Likewise (thumb_unexpanded_epilogue): Rename to... ( thumb1_unexpanded_epilogue): ...this * config/arm/arm.md (pop_multiple_with_stack_update): New pattern. (pop_multiple_with_stack_update_and_return): Likewise. (thumb2_ldr_with_return): Likewise. (floating_point_pop_multiple_with_stack_update): Likewise. (return): Update condition and code for pattern. (arm_return): Likewise. (epilogue_insns): Likewise. * config/arm/predicates.md (load_multiple_operation): Update predicate. (load_multiple_operation_stack_and_return): New predicate. (load_multiple_operation_stack): Likewise. (load_multiple_operation_stack_fp): Likewise. * config/arm/thumb2.md (thumb2_return): Remove. (thumb2_rtl_epilogue_return): New pattern. - Thanks and regards, Sameera D. thumb2_rtl_epilogue_complete-27Sept.patch Description: Binary data
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 09/28/2011 03:57 PM, Carrot Wei wrote: Hi Tom What's the behavior of your patch to the following case typedef int int_unaligned __attribute__((aligned(1))); int foo (int_unaligned *p) { return *p; } I modified the example slightly: test.c: ... typedef int __attribute__((aligned(2))) int_unaligned; int foo (int_unaligned *p) { return *(p+1); } ... test.c.023t.ccp1: ... # PT = anything # ALIGN = 2, MISALIGN = 0 D.2723_2 = pD.1604_1(D) + 4; ... Thanks, - Tom thanks Carrot On Tue, Sep 20, 2011 at 7:13 PM, Tom de Vries vr...@codesourcery.com wrote: Hi Richard, I have a patch for PR43814. It introduces an option that assumes that function arguments of pointer type are aligned, and uses that information in tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined. I tested the patch successfully on-by-default on x86_64 and i686 (both gcc only builds). I also tested the patch on-by-default for ARM (gcc/glibc build). The patch generated wrong code for uselocale.c: ... glibc/locale/locale.h: ... /* This value can be passed to `uselocale' and may be returned by it. Passing this value to any other function has undefined behavior. */ # define LC_GLOBAL_LOCALE ((__locale_t) -1L) ... glibc/locale/uselocale.c: ... locale_t __uselocale (locale_t newloc) { locale_t oldloc = _NL_CURRENT_LOCALE; if (newloc != NULL) { const locale_t locobj = newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc; ... The assumption that function arguments of pointer type are aligned, allowed the test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false. But the usage of ((__locale_t) -1L) as function argument in uselocale violates that assumption. Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without regressions for ARM. Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM, discussed here: - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html But, since glibc uses this construct currently, the option is off-by-default for now. OK for trunk? Thanks, - Tom 2011-09-20 Tom de Vries t...@codesourcery.com PR target/43814 * tree-ssa-ccp.c (get_align_value): New function, factored out of get_value_from_alignment. (get_value_from_alignment): Use get_align_value. (get_value_for_expr): Use get_align_value to handle alignment of function argument pointers. * common.opt (faligned-pointer-argument): New option. * doc/invoke.texi (Optimization Options): Add -faligned-pointer-argument. (-faligned-pointer-argument): New item. * gcc/testsuite/gcc.dg/pr43814.c: New test. * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test.
Re: Use of vector instructions in memmov/memset expanding
There is no separate cost-table for Nehalem or SandyBridge - however, I tuned generic32 and generic64 tables, that should improve performance on modern processors. In old version REP-MOV was used - it The recommended heuristics have changed in Nehalem and Sandy-Bridge over earlier Intel CPUs. -nAid -- a...@linux.intel.com -- Speaking for myself only.
Re: Use of vector instructions in memmov/memset expanding
On Wed, Sep 28, 2011 at 06:27:11PM +0200, Andi Kleen wrote: There is no separate cost-table for Nehalem or SandyBridge - however, I tuned generic32 and generic64 tables, that should improve performance on modern processors. In old version REP-MOV was used - it The recommended heuristics have changed in Nehalem and Sandy-Bridge over earlier Intel CPUs. Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was slower than generic. Adding new tables shouldn't be very difficult, even if they are the same as generic. -Andi
Re: [PATCH 2/3] Use urandom to get random seed
On Wed, 28 Sep 2011 16:56:32 +0200 Andi Kleen a...@firstfloor.org wrote: I suppose we might get interrupted before anything is read and read can return with -1 (I suppose partial reads are quite unlikely though)? Thus, don't we need the usual EINTR loop? When we get interrupted gcc will die. I don't think gcc handles any asynchronous signals, right? That depends upon the signal. If we got SIGCHLD or SIGWINCH, the call to read probably gets EINTR, but the signal is ignored unless explicitly handled. So I would suggest testing for EINTR. Besides, perhaps some plugins could install their signal handlers Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basileatstarynkevitchdotnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 24/09/2011, at 9:40 PM, Jakub Jelinek wrote: On Sat, Sep 24, 2011 at 11:31:25AM +0200, Richard Guenther wrote: In the end I'd probably say the patch is ok without the option (thus turned on by default), but if LC_GLOBAL_LOCALE is part of the glibc ABI then we clearly can't do this. Yes, LC_GLOBAL_LOCALE is part of glibc ABI. I guess we could only assume the alignment if the pointer is actually dereferenced on the statement that checks the ABI or in some stmt that dominates the spot where you want to check the alignment. It is IMHO quite common to pass arbitrary values in pointer types, then cast them back or just compare. I can relate to camps of both GCC developers and GLIBC developers, and I understand the benefits and liabilities of Tom's optimization. Ultimately, the problem we need to solve is to find a way to manage non-conforming code with a compiler that tries to squeeze out as much performance from valid code as possible. I think Tom's suggestion to have an option (either enabled or disabled by default) is a very good solution given the circumstances. I think we should document the option as a transitional measure designed to give time to GLIBC and others to catch up, and obsolete it in the next release. GLIBC patch to fix locale_t definition is attached. In this submission Tom is being punished for his thoroughness in testing the optimization. Let this be a lesson to all of us to steer clear of GLIBC. [Kidding.] We had similar discussions several times already, and it seems the guiding principle of whether enabling new optimization that may break undefined code patterns is to balance expected performance benefits against how frequently the offending code pattern is used. Returning to the case at hand: Is there code comparing a pointer to an integer? Yes. Is it common? Comparing to a zero, absolutely; to a non-zero errr. Probably not that much. The performance benefits from the optimization are quite significant. Pointer alignment has tremendous effect on expanding __builtin_{mem,str}* functions. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics fsf-glibc-locale_t-align.patch Description: Binary data
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 29/09/2011, at 7:35 AM, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:29:12 +1300 GLIBC patch to fix locale_t definition is attached. Isn't this going to result in byte loads being used to dereference all locale_t pointers on targets like sparc and mips? Yes, that's the price for binary compatibility for comparing a pointer to -1. I just hope that no common program has locale functions on its critical path. Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:40:55 +1300 On 29/09/2011, at 7:35 AM, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:29:12 +1300 GLIBC patch to fix locale_t definition is attached. Isn't this going to result in byte loads being used to dereference all locale_t pointers on targets like sparc and mips? Yes, that's the price for binary compatibility for comparing a pointer to -1. I just hope that no common program has locale functions on its critical path. I personally don't think this is acceptable, critical path or not.
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 29/09/2011, at 7:41 AM, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:40:55 +1300 On 29/09/2011, at 7:35 AM, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:29:12 +1300 GLIBC patch to fix locale_t definition is attached. Isn't this going to result in byte loads being used to dereference all locale_t pointers on targets like sparc and mips? Yes, that's the price for binary compatibility for comparing a pointer to -1. I just hope that no common program has locale functions on its critical path. I personally don't think this is acceptable, critical path or not. OK. Do you have an alternative suggestion that would fix non-portable use of locale_t? Thank you, -- Maxim Kuvyrkov CodeSourcery / Mentor Graphics
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:45:17 +1300 OK. Do you have an alternative suggestion that would fix non-portable use of locale_t? Don't optimize something that is invalidated by a quite common practice? What about people who encode invalid pointers with 0xdeadbeef, do we need to audit every source tree that does this too? This invalidates the optimization's preconditions as well. You're not going to eradicate all the code in the world which uses unaligned constants to encode pointers to make them have special meanings in certain situations. We use the -1 thing in the Linux kernel too I believe. I'd go so far as to say this kind of thing is pervasive.
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:58:01 +1300 To summarize, your opinion seems to be to not enable the optimization by default, correct? Yes, and I don't think we ever could do so. BTW, another common paradigm is using the always clear bits of a pointer to encode state bits.
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:58:01 +1300 To summarize, your opinion seems to be to not enable the optimization by default, correct? Yes, and I don't think we ever could do so. BTW, another common paradigm is using the always clear bits of a pointer to encode state bits. Yeah, I think it is a bad optimization that is going to break lots of code and the glibc patch is definitely unacceptable (and not doing what you think it is doing, your change didn't do anything at all, as the aligned attribute applies to the pointer type and not to the pointed type). The alignment of the argument pointer can be IMHO only hard assumed if there is a load or store using that type dominating the spot where you are checking it, and the target is strict alignment target. Then we can both assume that alignment, optimize away tests on the low bits etc. Otherwise, what you could do is just use the pointed type's alignment as a hint, likely alignment, e.g. on non-strict alignment target you could expand code optimistically assuming that it is very likely aligned, but still shouldn't optimize away low bits tests. Jakub
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Wed, Sep 28, 2011 at 9:13 PM, Jakub Jelinek ja...@redhat.com wrote: On Wed, Sep 28, 2011 at 03:00:09PM -0400, David Miller wrote: From: Maxim Kuvyrkov ma...@codesourcery.com Date: Thu, 29 Sep 2011 07:58:01 +1300 To summarize, your opinion seems to be to not enable the optimization by default, correct? Yes, and I don't think we ever could do so. BTW, another common paradigm is using the always clear bits of a pointer to encode state bits. Yeah, I think it is a bad optimization that is going to break lots of code and the glibc patch is definitely unacceptable (and not doing what you think it is doing, your change didn't do anything at all, as the aligned attribute applies to the pointer type and not to the pointed type). The alignment of the argument pointer can be IMHO only hard assumed if there is a load or store using that type dominating the spot where you are checking it, and the target is strict alignment target. Then we can both assume that alignment, optimize away tests on the low bits etc. Otherwise, what you could do is just use the pointed type's alignment as a hint, likely alignment, e.g. on non-strict alignment target you could expand code optimistically assuming that it is very likely aligned, but still shouldn't optimize away low bits tests. There is nothing like very likely aligned ;) Note that what is new is that we now no longer assume alignment by default (we did in the past) and that we derive properties about the pointer _value_ from alignment. I think we can derive pointer values when we see dereferences, the code wouldn't be portable to strict-alignment targets otherwise. We can offer a -fno-strict-alignment option to disable deriving alignment from types. Richard. Jakub
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Wed, Sep 28, 2011 at 09:56:27PM +0200, Richard Guenther wrote: There is nothing like very likely aligned ;) Note that what is new is On non-strict aligned targets there is no reason not to have something like very likely aligned. You would expand stores/loads as if it was aligned in that case, and if it isn't, all you'd need to ensure from that is that you don't derive properties about the pointer value from the likely aligned info, only from alignment. Very likely aligned is interesting to the vectorizer too, if it is very likely something is sufficiently aligned, the vectorizer could decide to assume the alignment in the vectorized loop and add the check for the alignment to the loop guards. In the likely case the vectorized loop would be used (if other guards were true too), in the unlikely case it is unaligned it would just use a slower loop. that we now no longer assume alignment by default (we did in the past) and that we derive properties about the pointer _value_ from alignment. I think we can derive pointer values when we see dereferences, the code wouldn't be portable to strict-alignment targets otherwise. We But any references? If you have int foo (int *p) { memcpy (p, a, 1); return ((uintptr_t) p 3) == 0; } then even if p isn't aligned, this could work even on strict aligned targets. Anyway, the arbitrary value in a pointer thing is much more important then the rest, so having the dominating dereference test is very important. Jakub
Re: [PATCH 2/7] Generate virtual locations for tokens
On 09/28/2011 03:15 PM, Dodji Seketeli wrote: +set_arg_token (macro_arg *arg, const cpp_token *token, + source_location location, size_t index, + enum macro_arg_token_kind kind, + bool track_macro_exp_p) +{ + const cpp_token **token_ptr; + source_location *loc = NULL; + + token_ptr = +arg_token_ptr_at (arg, index, kind, + track_macro_exp_p ? loc : NULL); ... + if (virt_location) +{ + if (kind == MACRO_ARG_TOKEN_NORMAL) + *virt_location = arg-virt_locs[index]; + else if (kind == MACRO_ARG_TOKEN_EXPANDED) + *virt_location = arg-expanded_virt_locs[index]; + else if (kind == MACRO_ARG_TOKEN_STRINGIFIED) + *virt_location = + (source_location *) tokens_ptr[index]-src_loc; +} If we make this block conditional on arg-virt_locs being set, then we can pass loc in unconditionally and don't need the track_macro_exp_p flag to set_arg_token. Note that the gotos were put there also because we needed to get out of the for (;;) loop, similarly to what the previous return statements were doing; so by doing this doesn't we don't do get rid of the gotos. Can't you use break instead? Jason
[C++ Patch] PR 45278
Hi, here submitter remarks that, inconsistently with the documentation, with -Wextra the C++ front-end doesn't warn for ordered comparison of pointer with integer zero. Thus I simply adapted the warning in the C front-end. Is the patch Ok? Tested x86_64-linux. Paolo. /
Re: [C++ Patch] PR 45278
On 09/28/2011 11:02 PM, Paolo Carlini wrote: Hi, here submitter remarks that, inconsistently with the documentation, with -Wextra the C++ front-end doesn't warn for ordered comparison of pointer with integer zero. Thus I simply adapted the warning in the C front-end. Is the patch Ok? Better attaching the patch ;) Paolo. /cp 2011-09-28 Paolo Carlini paolo.carl...@oracle.com PR c++/45278 * typeck.c (cp_build_binary_op): With -Wextra, warn for ordered comparison of pointer with zero. /testsuite 2011-09-28 Paolo Carlini paolo.carl...@oracle.com PR c++/45278 * g++.dg/warn/Wextra-3.C: New. Index: testsuite/g++.dg/warn/Wextra-3.C === --- testsuite/g++.dg/warn/Wextra-3.C(revision 0) +++ testsuite/g++.dg/warn/Wextra-3.C(revision 0) @@ -0,0 +1,9 @@ +// PR c++/45278 +// { dg-options -Wextra } + +extern void* p; + +int f1() { return ( p 0 ? 1 : 0 ); } // { dg-warning ordered comparison } +int f2() { return ( p = 0 ? 1 : 0 ); } // { dg-warning ordered comparison } +int f3() { return ( p 0 ? 1 : 0 ); } // { dg-warning ordered comparison } +int f4() { return ( p = 0 ? 1 : 0 ); } // { dg-warning ordered comparison } Index: cp/typeck.c === --- cp/typeck.c (revision 179319) +++ cp/typeck.c (working copy) @@ -4189,9 +4189,19 @@ cp_build_binary_op (location_t location, result_type = composite_pointer_type (type0, type1, op0, op1, CPO_COMPARISON, complain); else if (code0 == POINTER_TYPE null_ptr_cst_p (op1)) - result_type = type0; + { + result_type = type0; + if (extra_warnings (complain tf_warning)) + warning (OPT_Wextra, +ordered comparison of pointer with integer zero); + } else if (code1 == POINTER_TYPE null_ptr_cst_p (op0)) - result_type = type1; + { + result_type = type1; + if (extra_warnings (complain tf_warning)) + warning (OPT_Wextra, +ordered comparison of pointer with integer zero); + } else if (null_ptr_cst_p (op0) null_ptr_cst_p (op1)) /* One of the operands must be of nullptr_t type. */ result_type = TREE_TYPE (nullptr_node);
Re: [pph] Prepare for mutation detection [2/3] (issue5142049)
More comments to come on [3/3], for now just a single comment below on this specific patch: diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 0bd4d64..b267833 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream) if (marker == PPH_RECORD_END) return NULL; else if (pph_is_reference_marker (marker)) - return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, marker); + { + pph_cache *cache = pph_cache_select (stream, marker, image_ix); + return (cxx_binding *) pph_cache_get (cache, ix); + } Seems like you replaced the pph_cache_get one liners with these two-liners. Wouldn't a simple inline function be nicer for this? Gab
Re: [pph] Prepare for mutation detection [3/3] (issue5139052)
Very nice! I really like where this is heading :)! I think it would be great to instrument this to know how many times we need to use a PPH_RECORD_MREF, to avoid trashing the cache (i.e. potentially there are specific cases where only a small field's value changes and pickling the entire tree again is sub-optimal; if instead we could detect those cases and simply output the required change which could then be applied on the reader side it would potentially be better... it is possible that we haven't been affected by these specific cases up until now, but that they would all massively result in PPH_RECORD_MREFs now (the instrumentation would also potentially help us find some of those tricky mutation, if there are any caused by other things than overloads, which we aren't aware of yet...just a thought). (or maybe even assert that those mutation are indeed overloads if we think that's the only time it occurs??) Cheers, Gab On Tue, Sep 27, 2011 at 1:03 PM, Diego Novillo dnovi...@google.com wrote: This finishes removing constants and builtins out of the cache. This time in a slightly more elegant way. The patch introduces a new version of pph_out_start_record exclusively for trees (pph_out_start_tree_record). If the tree is not cacheable then it emits a PPH_RECORD_START_NO_CACHE record to indicate that the reader should not bother adding the tree to its cache. It also changes the semantics of pph_out_start_record. It now returns true when the caller should do nothing else. We are now signing every DECL and TYPE tree we see, but doing nothing with this signature. In the final patch this will change so we only sign DECLs/TYPEs after reading, if we are also generating a PPH image (pure readers do not need to care about mutations). Tested on x86_64. Committed to branch. * pph-streamer-in.c (pph_read_namespace_tree): Do not insert constants in the cache. * pph-streamer-out.c (pph_cache_should_handle): New. (pph_out_start_ref_record): Factor out of ... (pph_out_start_record): ... here. Change return value meaning; true means 'all done'; false means 'caller should write data out'. Update all callers. (pph_out_start_tree_record): New. (pph_write_builtin): Remove. Update all users. (pph_write_namespace_tree): Call pph_out_start_tree_record. * pph-streamer.c (pph_cache_insert_at): Assert that we are not trying to insert the same data more than once. * pph-streamer.h (enum pph_record_marker): Add new value PPH_RECORD_START_NO_CACHE. (pph_in_record_marker): Handle it. diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index b267833..8e7c772 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -2137,7 +2137,6 @@ pph_read_namespace_tree (pph_stream *stream, tree enclosing_namespace) /* For integer constants we only need the type and its hi/low words. */ expr = streamer_read_integer_cst (ib, data_in); - pph_cache_insert_at (stream-cache, expr, ix); } else { diff --git a/gcc/cp/pph-streamer-out.c b/gcc/cp/pph-streamer-out.c index 7157275..2f9fcae 100644 --- a/gcc/cp/pph-streamer-out.c +++ b/gcc/cp/pph-streamer-out.c @@ -182,14 +182,56 @@ pph_flush_buffers (pph_stream *stream) } -/* Start a new record in STREAM for data in DATA. If DATA is NULL - write an end-of-record marker and return false. If DATA is not NULL - and did not exist in the pickle cache, add it, write a - start-of-record marker and return true. If DATA existed in the - cache, write a shared-record marker and return false. */ +/* Return true if tree node T should be added to the pickle cache. */ static inline bool -pph_out_start_record (pph_stream *stream, void *data) +pph_cache_should_handle (tree t) +{ + if (t) + { + if (TREE_CODE (t) == INTEGER_CST) + { + /* With the exception of some special constants that are + pointer-compared everywhere (e.g., integer_zero_node), we + do not want constants in the cache. Their physical + representation is smaller than a cache reference record + and they can also trick the cache in similar ways to + builtins (read below). */ + return false; + } + else if (streamer_handle_as_builtin_p (t)) + { + /* We do not want builtins in the cache for two reasons: + + - They never need pickling. Much like pre-loaded tree + nodes, builtins are pre-built by the compiler and + accessible with their class and code. + + - They can trick the cache. When possible, user-provided + functions are generally replaced by their builtin + counterparts (e.g., strcmp, malloc, etc). When this + happens, the writer cache will store in its cache two +
Re: [pph] Prepare for mutation detection [2/3] (issue5142049)
On Wed, Sep 28, 2011 at 17:23, Gabriel Charette gcharet...@gmail.com wrote: More comments to come on [3/3], for now just a single comment below on this specific patch: diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 0bd4d64..b267833 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream) if (marker == PPH_RECORD_END) return NULL; else if (pph_is_reference_marker (marker)) - return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, marker); + { + pph_cache *cache = pph_cache_select (stream, marker, image_ix); + return (cxx_binding *) pph_cache_get (cache, ix); + } Seems like you replaced the pph_cache_get one liners with these two-liners. Wouldn't a simple inline function be nicer for this? I call them separately. Or do you mean a single call that combines them for the common case?
Re: [SH] PR 49468 - Integer SI abs
Oleg Endo oleg.e...@t-online.de wrote: The attached patch and ChangeLog below should fix it. I have also added a test case for SI mode abs. Thanks! I've applied your patch as revision 179320. Regards, kaz
[PING] Re: [PATCH] Fix PR50183
Hi there, Ping. I'm seeking approval for this fix on trunk and 4_6-branch. Thanks! Bill On Tue, 2011-09-13 at 17:55 -0500, William J. Schmidt wrote: Greetings, The code to build scops (static control parts) for graphite first rewrites loops into canonical loop-closed SSA form. PR50183 identifies a scenario where the results do not fulfill all required invariants of this form. In particular, a value defined inside a loop and used outside that loop must reach exactly one definition, which must be a single-argument PHI node called a close-phi. When nested loops exist, it is possible that, following the rewrite, a definition may reach two close-phis. This patch corrects that problem. The problem arises because loops are processed from outside in. While processing a loop, duplicate close-phis are eliminated. However, eliminating duplicate close-phis for an inner loop can sometimes create duplicate close-phis for an already-processed outer loop. This patch detects when this may have occurred and repeats the removal of duplicate close-phis as necessary. The problem was noted on ibm/4_6-branch and 4_6-branch; it is apparently latent on trunk. The same patch can be applied to all three branches. Bootstrapped and regression-tested on powerpc64-linux. OK to commit to these three branches? Thanks, Bill 2011-09-13 Bill Schmidt wschm...@linux.vnet.ibm.com * graphite-scop-detection.c (make_close_phi_nodes_unique): New forward declaration. (remove_duplicate_close_phi): Detect and repair creation of duplicate close-phis for a containing loop. Index: gcc/graphite-scop-detection.c === --- gcc/graphite-scop-detection.c (revision 178829) +++ gcc/graphite-scop-detection.c (working copy) @@ -30,6 +30,9 @@ along with GCC; see the file COPYING3. If not see #include tree-pass.h #include sese.h +/* Forward declarations. */ +static void make_close_phi_nodes_unique (basic_block); + #ifdef HAVE_cloog #include ppl_c.h #include graphite-ppl.h @@ -1231,6 +1234,13 @@ remove_duplicate_close_phi (gimple phi, gimple_stm SET_USE (use_p, res); update_stmt (use_stmt); + + /* It is possible that we just created a duplicate close-phi + for an already-processed containing loop. Check for this + case and clean it up. */ + if (gimple_code (use_stmt) == GIMPLE_PHI +gimple_phi_num_args (use_stmt) == 1) + make_close_phi_nodes_unique (gimple_bb (use_stmt)); } remove_phi_node (gsi, true);
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Sep 20, 2011, at 4:13 AM, Tom de Vries wrote: I have a patch for PR43814. It introduces an option that assumes that function arguments of pointer type are aligned, and uses that information in tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined. I'm not a huge fan of an option that is very hard to use. I think this option would be hard to use. I'd rather have a port just assert that no code will be compiled that is weird in this way, then the front end can check for weird values on int to pointer conversions with constants and complain about the code, if they tried it. If Android is safe in this respect, then, they can just turn it on, and then force anyone porting software to their platform to `fix' their code. For systems that are clean, and new systems, we can recommend they set the option on. For legacy systems that don't want to change or just want to compile legacy software, well, they can opt out.
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On 09/24/2011 11:31 AM, Richard Guenther wrote: On Tue, Sep 20, 2011 at 1:13 PM, Tom de Vries vr...@codesourcery.com wrote: Hi Richard, I have a patch for PR43814. It introduces an option that assumes that function arguments of pointer type are aligned, and uses that information in tree-ssa-ccp. This enables the memcpy in pr43814-2.c to be inlined. I tested the patch successfully on-by-default on x86_64 and i686 (both gcc only builds). I also tested the patch on-by-default for ARM (gcc/glibc build). The patch generated wrong code for uselocale.c: ... glibc/locale/locale.h: ... /* This value can be passed to `uselocale' and may be returned by it. Passing this value to any other function has undefined behavior. */ # define LC_GLOBAL_LOCALE ((__locale_t) -1L) ... glibc/locale/uselocale.c: ... locale_t __uselocale (locale_t newloc) { locale_t oldloc = _NL_CURRENT_LOCALE; if (newloc != NULL) { const locale_t locobj = newloc == LC_GLOBAL_LOCALE ? _nl_global_locale : newloc; ... The assumption that function arguments of pointer type are aligned, allowed the test 'newloc == LC_GLOBAL_LOCALE' to evaluate to false. But the usage of ((__locale_t) -1L) as function argument in uselocale violates that assumption. Fixing the definition of LC_GLOBAL_LOCALE allowed the gcc tests to run without regressions for ARM. Furthermore, the patch fixes ipa-sra-2.c and ipa-sra-6.c regressions on ARM, discussed here: - http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00930.html - http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00459.html But, since glibc uses this construct currently, the option is off-by-default for now. OK for trunk? No, I don't like to have an option to control this. And given the issue you spotted it doesn't look like the best idea either. This area in GCC is simply too fragile right now :/ It would be nice if we could extend IPA-CP to propagate alignment information though. And at some point devise a reliable way for frontends to communicate alignment constraints the middle-end can rely on (well, yes, you could argue that's what TYPE_ALIGN is about, and I thought that maybe we can unconditionally rely on TYPE_ALIGN for pointer PARM_DECLs at least - your example shows we can't). In the end I'd probably say the patch is ok without the option (thus turned on by default), but if LC_GLOBAL_LOCALE is part of the glibc ABI then we clearly can't do this. I removed the flag, and enabled the optimization now with the aligned attribute. bootstrapped and tested on x86_64 (gcc only build), and build and reg-tested on arm (gcc + glibc build), no issues found. OK for trunk? I intend to send a follow-up patch that introduces a target hook function_pointers_aligned, that is false by default, and on by default for -mandroid. I asked Google to test their Android codebase with the optimization on by default. Would such a target hook be acceptable? Richard. Thanks, - Tom 2011-09-29 Tom de Vries t...@codesourcery.com PR target/43814 * tree-ssa-ccp.c (get_align_value): New function, factored out of get_value_from_alignment. (get_value_from_alignment): Use get_align_value. (get_value_for_expr): Use get_align_value to handle alignment of function argument pointers with TYPE_USER_ALIGN. * gcc/testsuite/gcc.dg/pr43814.c: New test. * gcc/testsuite/gcc.target/arm/pr43814-2.c: New test. Index: gcc/tree-cfgcleanup.c === --- gcc/tree-cfgcleanup.c (revision 173703) +++ gcc/tree-cfgcleanup.c (working copy) @@ -641,6 +641,552 @@ cleanup_omp_return (basic_block bb) return true; } +/* Returns true if S contains (I1, I2). */ + +static bool +int_int_splay_lookup (splay_tree s, unsigned int i1, unsigned int i2) +{ + splay_tree_node node; + + if (s == NULL) +return false; + + node = splay_tree_lookup (s, i1); + return node node-value == i2; +} + +/* Attempts to insert (I1, I2) into *S. Returns true if successful. + Allocates *S if necessary. */ + +static bool +int_int_splay_insert (splay_tree *s, unsigned int i1 , unsigned int i2) +{ + if (*s != NULL) +{ + /* Check for existing element, which would otherwise be silently + overwritten by splay_tree_insert. */ + if (splay_tree_lookup (*s, i1)) + return false; +} + else +*s = splay_tree_new (splay_tree_compare_ints, 0, 0); + + splay_tree_insert (*s, i1, i2); + return true; +} + +/* Returns 0 if (NODE-value, NODE-key) is an element of S. Otherwise, + returns 1. */ + +static int +int_int_splay_node_contained_in (splay_tree_node node, void *s) +{ + splay_tree_node snode = splay_tree_lookup ((splay_tree)s, node-key); + return (!snode || node-value != snode-value) ? 1 : 0; +} + +/* Returns true if all elements of S1 are also in S2. */ + +static bool +int_int_splay_contained_in (splay_tree s1, splay_tree s2) +{ +
Re: [PATCH] Add explicit VIS intrinsics for addition and subtraction.
[Vlad, if you have a few minutes, would you mind having a look at the couple of questions at the end of the message? Thanks in advance]. No problem. Here are the results of the investigation. Pseudo 116 needs to be assigned a hard register. It is used mostly in vector instructions so we would like it to be assigned a FP reg, but it is initialized in insn 2: (insn 2 5 3 2 (set (reg/v:V4HI 116 [ a ]) (reg:V4HI 24 %i0 [ a ])) combined-1.c:7 93 {*movdf_insn_sp32_v9} (expr_list:REG_DEAD (reg:V4HI 24 %i0 [ a ]) (nil))) so it ends up being assigned the (integer) argument register %i0 instead. It used to be assigned a FP reg as expected with the GCC 4.6.x series. The register class preference discovery is OK: r116: preferred EXTRA_FP_REGS, alternative GENERAL_OR_EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS a2 (r116,l0) best EXTRA_FP_REGS, allocno GENERAL_OR_EXTRA_FP_REGS i.e. EXTRA_FP_REGS is preferred/best. Then it seems that this preference is dropped and only the class of the allocno, GENERAL_OR_EXTRA_FP_REGS, is handed down to the coloring stage. By contrast, in the GCC 4.6 series, the cover_class of the allocno is EXTRA_FP_REGS. The initial cost for %i0 is twice as high (24000) as the cost of FP regs. But then it is reduced by 12000 when process_bb_node_for_hard_reg_moves sees insn 2 above and then again by 12000 when process_regs_for_copy sees the same insn. So, in the end, %i0 is given cost 0 and thus beats every other register. This doesn't happen in the GCC 4.6 series because %i0 isn't in the cover_class. This is at -O1. At -O2, there is an extra pass at the discovery stage and it sets the class of the allocno to EXTRA_FP_REGS, like with the GCC 4.6 series, so a simple workaround is Index: gcc.target/sparc/combined-1.c === --- gcc.target/sparc/combined-1.c (revision 179316) +++ gcc.target/sparc/combined-1.c (working copy) @@ -1,5 +1,5 @@ /* { dg-do compile } */ -/* { dg-options -O -mcpu=ultrasparc -mvis } */ +/* { dg-options -O2 -mcpu=ultrasparc -mvis } */ typedef short vec16 __attribute__((vector_size(8))); typedef int vec32 __attribute__((vector_size(8))); Finally the couple of questions: 1. Is it expected that the register class preference be dropped at -O1? 2. Is it expected that a single insn be processed by 2 different mechanisms that independently halve the initial cost of a hard register? -- Eric Botcazou
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
From: Mike Stump mikest...@comcast.net Date: Wed, 28 Sep 2011 15:19:10 -0700 If Android is safe in this respect, then, they can just turn it on, and then force anyone porting software to their platform to `fix' their code. They'd have to then know to turn this option off when building the kernel, which does use such constructs extensively. I think this whole idea has too many downsides to be considered seriously. People write problems like this, lots of people. It's a pervasive technique to encode boolean state into the low bits of a pointer or to represent special token pointers using integers such as -1.
Re: [pph] Prepare for mutation detection [2/3] (issue5142049)
On Wed, Sep 28, 2011 at 5:31 PM, Diego Novillo dnovi...@google.com wrote: On Wed, Sep 28, 2011 at 17:23, Gabriel Charette gcharet...@gmail.com wrote: More comments to come on [3/3], for now just a single comment below on this specific patch: diff --git a/gcc/cp/pph-streamer-in.c b/gcc/cp/pph-streamer-in.c index 0bd4d64..b267833 100644 --- a/gcc/cp/pph-streamer-in.c +++ b/gcc/cp/pph-streamer-in.c @@ -439,7 +439,10 @@ pph_in_cxx_binding_1 (pph_stream *stream) if (marker == PPH_RECORD_END) return NULL; else if (pph_is_reference_marker (marker)) - return (cxx_binding *) pph_cache_get (stream-cache, image_ix, ix, marker); + { + pph_cache *cache = pph_cache_select (stream, marker, image_ix); + return (cxx_binding *) pph_cache_get (cache, ix); + } Seems like you replaced the pph_cache_get one liners with these two-liners. Wouldn't a simple inline function be nicer for this? I call them separately. Or do you mean a single call that combines them for the common case? Yes that's what I mean, a single call that combines both, since that's the common usage and I feel there should be as little cache code at the top of every pph_* function (in particular, every time a new pph streaming function is added all that code needs to be duplicated, so the less code the better imo).
Re: [PATCH, PR43814] Assume function arguments of pointer type are aligned.
On Wed, Sep 28, 2011 at 06:43:04PM -0400, David Miller wrote: From: Mike Stump mikest...@comcast.net Date: Wed, 28 Sep 2011 15:19:10 -0700 If Android is safe in this respect, then, they can just turn it on, and then force anyone porting software to their platform to `fix' their code. They'd have to then know to turn this option off when building the kernel, which does use such constructs extensively. I think this whole idea has too many downsides to be considered seriously. People write problems like this, lots of people. It's a pervasive technique to encode boolean state into the low bits of a pointer or to represent special token pointers using integers such as -1. Or 1. Just do grep '\*)[[:blank:]]*1' *.[chS] to see how often an integer value is stored into a pointer. And it is not just void * pointers, it is struct cgraph_node * or struct varpool_node * too and the pointed types there certainly have higher alignment than 1. Now repeat the same with google codesearch. Jakub
[C++ testcase, committed as obvious] PR 40145
Hi, tested x86_64-linux, committed. Paolo. // 2011-09-28 Paolo Carlini paolo.carl...@oracle.com PR c++/40145 * g++.dg/ext/visibility/warn5.C: New. Index: g++.dg/ext/visibility/warn5.C === --- g++.dg/ext/visibility/warn5.C (revision 0) +++ g++.dg/ext/visibility/warn5.C (revision 0) @@ -0,0 +1,11 @@ +// PR c++/40145 +// { dg-do compile } +// { dg-require-visibility } +// { dg-options -fvisibility=hidden } + +struct EditorInternalCommand { }; + +static void createCommandMap() +{ +struct CommandEntry { EditorInternalCommand command; }; +}
Re: [PATCH 3/7] Emit macro expansion related diagnostics
On 09/27/2011 01:52 PM, Dodji Seketeli wrote: + Remember we are at the expansion point of MACRO. Each xI is the + location of the Ith token of the replacement-list. Now it gets + confusing. the xI is the location of the Ith token of the + replacement-list at the macro *definition* point. Not at the + macro replacement point. Okay, let's try to explain this below. This should be yI. + The token '+' has two locations, so to speak. One in the context + of the macro *expansion* of PLUS in #2 and one in the context of + the macro *definition* of PLUS in #1. These two locations are + encoded in the the latter context, somehow in the xI we are + talking about. The location of '+' in #2 is not encoded in xI or yI, we reach it through the expansion point location of the macro. The location in #1 is yI (and xI, in this case). + xI is roughly the index of the token inside the replacement-list + at the expansion point. So for '+', it's index would then be 1 its + [The index of token '1' would be 0 and the index of token 2 would + be 1]. So if '+' is our current xI, it is actualy an x1. Are we still talking about #1 here? It looks to me like the index of 1 would be 2, the index of + would be 4, and the index of token 2 would be 6. I bet PLUS used to just be A + B, and this section of comment didn't get updated when it changed. Keep changing xI to yI. + Now what's the y1 then? Remember, we said macro_locations is an + array of pairs (xI,yI). We now know what the xI is, now let's + look at the yI. xI allows us to find where the token was actually written. If the current macro context is also the spelling location of the token (e.g. #1 for +), then xI is the same as yI, i.e. the source location of that token. If the current macro context is not the spelling location of the token (e.g. #0 or #1 for 1), then xI is the location outside the current macro context. For 1 in #0, this the location of 1 in #1, which is a virtual location. For 1 in #1, this is the location of 1 in #2, which is a source location. + * If LRK is set to LRK_MACRO_EXPANSION_POINT + --- + + The virtual location is resolved to the location to the locus of + the expansion point of the macro. The first macro expansion point that led to this macro expansion. + * If LRK is set to LRK_MACRO_DEFINITION_LOCATION + -- The virtual location is resolved to the locus of the token in the context of the macro definition. + If LOC is the locus of a token that is an argument of a + function-like macro [replacing a parameter in the replacement list + of the macro] the virtual location is resolved to the locus of the + parameter that is replaced, in the context of the definition of the + macro. + + If LOC is the locus of a token that is not an argument of a + function-like macro, then the function behaves as if LRK was set to + LRK_SPELLING_LOCATION. (and then keep these two paragraphs) + Finally, if SPELLING_LOC is not NULL, *RESULTING_LOC is set to the + location to which LOC was resolved SPELLING_LOC doesn't exist anymore. + ORIG_LOC is the orginal location of the token at the definition + point of the macro. If you read the extensive comments of struct + line_map_macro in line-map.h, this is the xI. + + If the token is part of a macro argument, ORIG_PARM_REPLACEMENT_LOC + is the location of the point at wich the token (the argument) + replaces the macro parameter in the context of the relevant macro + definition. If you read the comments of struct line_map_macro in + line-map.h, this is the yI. */ + +source_location +linemap_add_macro_token (const struct line_map *map, ORIG_LOC is the location of the token outside this macro expansion. If the token comes originally from the macro definition, it is the locus in the macro definition; otherwise it is a location in the caller of this macro expansion (which is a virtual location or a source location if the caller is itself a macro expansion or not). MACRO_DEFINITION_LOC is the location in the macro definition, either of the token itself or of a macro parameter that it replaces. +/* If LOCATION is the locus of a token that is an argument of a + function-like macro M and appears in the expansion of M, return the + locus of that argument in the context of the caller of M. Note + that the caller of M is necessarily another macro. Why is the caller of M necessarily another macro? In the PLUS example above, if we have the location for 1 in #1, won't it give us the location of 1 in #2? The context of M is a macro definition. What does this mean? +/* Return the source line number corresponding to source location + LOCATION. SET is the line map set LOCATION comes from. If + LOCATION is the source location of token that is part of the +
[arm-embedded][PR38644] Fix stack red zone.
Committed to ARM/embedded-4_6-branch. 2011-09-28 Jiangning Liu jiangning@arm.com PR rtl-optimization/38644 * config/i386/i386.c (ix86_stack_using_red_zone): Change inline to be extern. (TARGET_STACK_USING_RED_ZONE): New. * config/rs6000/rs6000.c (rs6000_stack_using_red_zone): New. (TARGET_STACK_USING_RED_ZONE): New. (offset_below_red_zone_p): Change to use new hook TARGET_STACK_USING_RED_ZONE. * doc/tm.texi (TARGET_STACK_USING_RED_ZONE): New. * doc/tm.texi.in (TARGET_STACK_USING_RED_ZONE): New. * sched-deps.c (sched_analyze_1): If the stack pointer is being modified and stack red zone is not supported for ports, flush out all memory references as they may become invalid if moved across the stack adjustment. * target.def (stack_using_red_zone): New. * testsuite/gcc.target/arm/stack-red-zone.c: New.
RE: [PATCH] Fix stack red zone bug (PR38644)
-Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 5:56 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 11:40 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 5:20 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 11:10 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Wednesday, September 28, 2011 4:39 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Wed, Sep 28, 2011 at 3:49 AM, Jiangning Liu jiangning@arm.com wrote: -Original Message- From: Richard Guenther [mailto:richard.guent...@gmail.com] Sent: Tuesday, September 27, 2011 3:41 PM To: Jiangning Liu Cc: Andrew Pinski; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Fix stack red zone bug (PR38644) On Tue, Sep 27, 2011 at 5:32 AM, Jiangning Liu jiangning@arm.com wrote: Think of it this way. What the IR says is there is no barrier between those moves. You either have an implicit barrier (which is what you are proposing) or you have it explicitly. I think we all rather have more things explicit rather than implicit in the IR. And that has been the overall feeling for a few years now. Sorry, I'm afraid I can't agree with you. Instead, I think using barrier to describe this kind of dependence is a kind of implicit method. Please note that this is not an usual data dependence issue. The stack pointer change doesn't have any dependence with memory access at all. It is similar to atomic instructions that require being an optimization/memory barrier. Sure it is not a usual data dependence (otherwise we would handle it already), so the targets have to explicitly express the dependence somehow, for which we only have barriers right now. Richard, Thanks for your explanation. It's explicit to back-end, while it's implicit to scheduler in middle end, because barrier can decide dependence in scheduler but barrier can be generated from several different scenarios. It's unsafe and prone to introduce bug if any one of the scenarios requiring generating barriers is being missed in back-end. Between middle-end and back-end, we should have interfaces that is easy to be implemented by back-end. After all, middle-end itself can't consist of a compiler, and vice versa. Back-end needs middle-end's help to make sure back-end is easy to be implemented and reduce the possibility of introducing bugs. Without an explicit hook as I'm proposing, back-end implementers have to clearly know all scenarios of generating barrier very clearly, because there isn't any code tips and comments in middle end telling back-end the list of all scenarios on generating barriers. Yes, barrier is a perfect interface for scheduler in theory. But from engineering point of view, I think it's better to explicitly define an interface to describe stack red zone and inform back-end, or vice versa. Not like computer, people is easy to make mistake if you don't tell them. On this bug, the fact is over the years different back-ends made similar bugs. GCC is really a perfect platform on building new ports, and I saw a lot of new back-ends. The current middle end is unsafe, if port doesn't support stack red zone and back-ends doesn't generate barrier for it. Why can't we explicitly clarify this in compiler code between middle end and back end? What if any other back-end (new or old) NOT supporting stack red zone exposing the similar bug again? There are gazillion things you have to explicitly get right in your backends, so I don't see why exposing proper scheduling barriers should be special, and there, why red-zones should be special (as opposed to other occasions where you need to emit barriers from the backend for the scheduler). Richard, This is because, 1) Current scheduler is unsafe if back-end doesn't generate barrier for a port which doesn't support stack red zone at all. 2) Implementing barrier in back-end is a burden to new back-end implementation for ports not supporting stack red zone at all. 3) There are many other ports not reporting bugs around this.
[PATCH] Respin sparc pixel-compare patterns using iterators.
This is heavily inspired by a proof-of-concept patch David Bremner sent to me the other week. Committed to trunk. gcc/ * config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE, UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to... (UNSPEC_FCMP): New unspec. (gcond): New code iterator. (gcond_name): New code attr. (GCM): New mode iterator. (gcm_name): New mode attr. (fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators. git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179329 138bc75d-0d04-0410-961f-82ee72b054a4 --- gcc/ChangeLog | 11 + gcc/config/sparc/sparc.md | 90 + 2 files changed, 21 insertions(+), 80 deletions(-) diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 2aae5aa..e853b15 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,14 @@ +2011-09-28 David S. Miller da...@davemloft.net + + * config/sparc/sparc.md (UNSPEC_FCMPLE, UNSPEC_FCMPNE, + UNSPEC_FCMPGT, UNSPEC_FCMPEQ): Delete and reduce to... + (UNSPEC_FCMP): New unspec. + (gcond): New code iterator. + (gcond_name): New code attr. + (GCM): New mode iterator. + (gcm_name): New mode attr. + (fcmp{le,ne,gt,eq}{16,32}_vis): Reimplement using iterators. + 2011-09-28 Oleg Endo oleg.e...@t-online.de PR target/49486 diff --git a/gcc/config/sparc/sparc.md b/gcc/config/sparc/sparc.md index 6e38298..dfc5559 100644 --- a/gcc/config/sparc/sparc.md +++ b/gcc/config/sparc/sparc.md @@ -58,7 +58,7 @@ (UNSPEC_MUL8UL 46) (UNSPEC_MULDUL 47) (UNSPEC_ALIGNDATA 48) - + (UNSPEC_FCMP49) (UNSPEC_PDIST 50) (UNSPEC_EDGE8 51) (UNSPEC_EDGE8L 52) @@ -69,11 +69,6 @@ (UNSPEC_SP_SET 60) (UNSPEC_SP_TEST 61) - - (UNSPEC_FCMPLE 70) - (UNSPEC_FCMPNE 71) - (UNSPEC_FCMPGT 72) - (UNSPEC_FCMPEQ 73) ]) (define_constants @@ -8149,83 +8144,18 @@ edge32l\t%r1, %r2, %0 [(set_attr type edge)]) -(define_insn fcmple16P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V4HI 1 register_operand e) - (match_operand:V4HI 2 register_operand e)] -UNSPEC_FCMPLE))] - TARGET_VIS - fcmple16\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmple32P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V2SI 1 register_operand e) - (match_operand:V2SI 2 register_operand e)] -UNSPEC_FCMPLE))] - TARGET_VIS - fcmple32\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmpne16P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V4HI 1 register_operand e) - (match_operand:V4HI 2 register_operand e)] -UNSPEC_FCMPNE))] - TARGET_VIS - fcmpne16\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmpne32P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V2SI 1 register_operand e) - (match_operand:V2SI 2 register_operand e)] -UNSPEC_FCMPNE))] - TARGET_VIS - fcmpne32\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmpgt16P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V4HI 1 register_operand e) - (match_operand:V4HI 2 register_operand e)] -UNSPEC_FCMPGT))] - TARGET_VIS - fcmpgt16\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmpgt32P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V2SI 1 register_operand e) - (match_operand:V2SI 2 register_operand e)] -UNSPEC_FCMPGT))] - TARGET_VIS - fcmpgt32\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) - -(define_insn fcmpeq16P:mode_vis - [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V4HI 1 register_operand e) - (match_operand:V4HI 2 register_operand e)] -UNSPEC_FCMPEQ))] - TARGET_VIS - fcmpeq16\t%1, %2, %0 - [(set_attr type fpmul) - (set_attr fptype double)]) +(define_code_iterator gcond [le ne gt eq]) +(define_code_attr gcond_name [(le le) (ne ne) (gt gt) (eq eq)]) +(define_mode_iterator GCM [V4HI V2SI]) +(define_mode_attr gcm_name [(V4HI 16) (V2SI 32)]) -(define_insn fcmpeq32P:mode_vis +(define_insn fcmpgcond_namegcm_nameP:mode_vis [(set (match_operand:P 0 register_operand =r) - (unspec:P [(match_operand:V2SI 1 register_operand e) - (match_operand:V2SI 2 register_operand e)] -UNSPEC_FCMPEQ))] + (unspec:P [(gcond:GCM
Re: Use of vector instructions in memmov/memset expanding
Michael, Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap failure... I checked bootstrap, specs and 'make check' with the complete patch. Separate patches for ME and BE were only tested for build (no bootstrap) and 'make check'. I think it's better to apply the complete patch, but review the separate patches (to make it easier). ps There also seems to be common sections in the memfunc-mid.patch and memfunc-be.patch patches. That's true, some new routines from middle-end are used in back-end changes - I couldn't separate the patches in other way without significant changes in them. On 29 September 2011 01:51, Jack Howarth howa...@bromo.med.uc.edu wrote: On Wed, Sep 28, 2011 at 05:33:23PM +0400, Michael Zolotukhin wrote: It appears that part 1 of the patch wasn't really attached. Thanks, resending. Michael, Did you bootstrap with --enable-checking=yes? I am seeing the bootstrap failure... /sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/g++ -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/./prev-gcc/ -B/sw/lib/gcc4.7/x86_64-apple-darwin11.2.0/bin/ -nostdinc++ -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs -B/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include/x86_64-apple-darwin11.2.0 -I/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/include -I/sw/src/fink.build/gcc47-4.7.0-1/gcc-4.7-20110927/libstdc++-v3/libsupc++ -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/src/.libs -L/sw/src/fink.build/gcc47-4.7.0-1/darwin_objdir/prev-x86_64-apple-darwin11.2.0/libstdc++-v3/libsupc++/.libs -c -g -O2 -mdynamic-no-pic -gtoggle -DIN_GCC -W -Wall -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc-4.7-20110927/gcc -I../../gcc-4.7-20110927/gcc/. -I../../gcc-4.7-20110927/gcc/../include -I../../gcc-4.7-20110927/gcc/../libcpp/include -I/sw/include -I/sw/include -I../../gcc-4.7-20110927/gcc/../libdecnumber -I../../gcc-4.7-20110927/gcc/../libdecnumber/dpd -I../libdecnumber -I/sw/include -I/sw/include -DCLOOG_INT_GMP -DCLOOG_ORG -I/sw/include ../../gcc-4.7-20110927/gcc/emit-rtl.c -o emit-rtl.o ../../gcc-4.7-20110927/gcc/emit-rtl.c: In function ‘rtx_def* adjust_address_1(rtx, machine_mode, long int, int, int)’: ../../gcc-4.7-20110927/gcc/emit-rtl.c:2060:26: error: unused variable ‘max_align’ [-Werror=unused-variable] cc1plus: all warnings being treated as errors on x86_64-apple-darwin11 with your patches. Jack ps There also seems to be common sections in the memfunc-mid.patch and memfunc-be.patch patches. -- --- Best regards, Michael V. Zolotukhin, Software Engineer Intel Corporation.
Re: Use of vector instructions in memmov/memset expanding
Sorry what I meant is that it would be bad if -mtune=corei7(-avx)? was slower than generic. For now, -mtune=corei7 is triggering use of generic cost-table (I'm not sure about corei7-avx, but assume the same) - so it won't be slower. Adding new tables shouldn't be very difficult, even if they are the same as generic. Yes, that should be quite easy, but this isn't 'must-have' for the current patch (and it's already very huge). This could be done independently from the work on memfunctions inlining.