Re: Apple's Objective-C 2.0 extensions
On Mar 7, 2007, at 9:13 AM, Eric Christopher wrote: Hi Michael, Two questions about Apple's Objective-C 2.0 work: 1) Does anyone know when the syntax extensions will be available working in the gcc compiler? It is work in progress. For current status, you can check out Apple's 4.0 branch. We will push this to FSF mainline when features are frozen and adapted by Apple's internal developers. Note that features rely heavily on Leopard frameworks. So, you may not get very far using the new features on Tiger, etc. 2) Will their garbage collection accelerated message dispatch mechanisms also be supported? Yes. In Leopard. If you want more information on the features, please ask and after management approval, I can forward them to you. - Fariborz Fairborz is working on them, I imagine that it won't be until they're done in Leopard, but I'll let him give more information. -eric
Re: Apple's Objective-C 2.0 extensions
On Mar 7, 2007, at 11:16 AM, Mike Stump wrote: Does -fobjc-gc work for you now? It's been on mainline for a while now. As for accelerated message dispatch, I'm not exactly certain which feature you're Option may be recognized. But it entirely depends on Leopard runtime for support. - Fariborz
Use of compound_literal_expr in c vs target_expr in c++ for compound literals
gcc generates two separate trees for compound literals in c and c++. As in this test case: struct S { int i,j; }; void foo (struct S); int main () { foo((struct S){1,1}); } In c it generates compound_literal_expr and in c++ it generates target_expr. But gimplifier treats them differently in the following areas: 1) in routine mostly_copy_tree_v we don;t copy target_expr but we do copy compound_literal_expr. I see the following comment there: / * Similar to copy_tree_r() but do not copy SAVE_EXPR or TARGET_EXPR nodes. These nodes model computations that should only be done once. If we were to unshare something like SAVE_EXPR(i++), the gimplification process would create wrong code. */ Shouldn't compound_literal_expr be treated same as target_expr here? 2) gimplify_target_expr can be called more than once on the same target_expr node because first time around its TARGET_EXPR_INITIAL is set to NULL. This works as a guard and prevents its temporary to be added to the temporary list more than once (when call is made to gimple_add_tmp_var). On the other hand, such a guard does not exist for a compound_literal_expr and when gimple_add_tmp_var is called, it asserts. So, I added check for !DECL_SEEN_IN_BIND_EXPR_P (decl) in gimplify_compound_literal_expr before call to gimple_add_tmp_var is made. As in the following diff: % svn diff c-gimplify.c Index: c-gimplify.c === --- c-gimplify.c(revision 116462) +++ c-gimplify.c(working copy) @@ -538,7 +538,7 @@ /* This decl isn't mentioned in the enclosing block, so add it to the list of temps. FIXME it seems a bit of a kludge to say that anonymous artificial vars aren't pushed, but everything else is. */ - if (DECL_NAME (decl) == NULL_TREE) + if (DECL_NAME (decl) == NULL_TREE !DECL_SEEN_IN_BIND_EXPR_P (decl)) gimple_add_tmp_var (decl); This fixes the problem I am encountering. Is this a right approach in situations when compound_literal_expr is used to represent a compound literal in c and the expression is referenced in multiple places (by hanging off a save_expr call_expr tree)? - Thanks, Fariborz ([EMAIL PROTECTED])
Re: Use of compound_literal_expr in c vs target_expr in c++ for compound literals
On Jul 24, 2006, at 3:07 PM, Andrew Pinski wrote: gcc generates two separate trees for compound literals in c and c++. As in this test case: struct S { int i,j; }; void foo (struct S); int main () { foo((struct S){1,1}); } On the other hand, such a guard does not exist for a compound_literal_expr and when gimple_add_tmp_var is called, it asserts. So, I added check for !DECL_SEEN_IN_BIND_EXPR_P (decl) in gimplify_compound_literal_expr before call to gimple_add_tmp_var is made. As in the following diff: I think you are trying to fix PR 28418 which is an ICE in gimple_add_tmp_var with compound literals in C. Yes, looks like is similar to my problem. - Thanks, Fariborz Thanks, Andrew Pinski
Re: Use of compound_literal_expr in c vs target_expr in c++ for compound literals
On Jul 24, 2006, at 3:07 PM, Andrew Pinski wrote: gcc generates two separate trees for compound literals in c and c++. As in this test case: struct S { int i,j; }; void foo (struct S); int main () { foo((struct S){1,1}); } On the other hand, such a guard does not exist for a compound_literal_expr and when gimple_add_tmp_var is called, it asserts. So, I added check for !DECL_SEEN_IN_BIND_EXPR_P (decl) in gimplify_compound_literal_expr before call to gimple_add_tmp_var is made. As in the following diff: I think you are trying to fix PR 28418 which is an ICE in gimple_add_tmp_var with compound literals in C. My patch fixes the test case in PR 28418 as well. There are really two issues here: Should we gimplify compound_literal_expr twice? Regardless of this issue, how do we avoid calling gimple_add_tmp_var on the same variable. My patch addresses the latter. - Fariborz Thanks, Andrew Pinski
Re: [RFC] patch to fix an ICE involving sign-extract of mmx expression
On Sep 23, 2005, at 12:41 PM, Richard Henderson wrote: On Thu, Sep 22, 2005 at 01:21:06PM -0700, Fariborz Jahanian wrote: /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (final_word = GET_MODE_SIZE (inner_mode) ! || (final_word % GET_MODE_SIZE (tmode)) != 0) return NULL_RTX; I think you should just call validate_subreg. Ok with that change. This is the patch I am checking in. - fariborz ([EMAIL PROTECTED]) ChangeLog: 2005-09-26Fariborz Jahanian [EMAIL PROTECTED] * combine.c (make_extraction): Check for valid use of subreg. Index: combine.c === RCS file: /cvs/gcc/gcc/gcc/combine.c,v retrieving revision 1.503 diff -c -p -r1.503 combine.c *** combine.c 26 Aug 2005 21:52:23 - 1.503 --- combine.c 26 Sep 2005 16:01:23 - *** make_extraction (enum machine_mode mode, *** 6314,6320 /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (final_word = GET_MODE_SIZE (inner_mode)) return NULL_RTX; new = gen_rtx_SUBREG (tmode, inner, final_word); --- 6314,6320 /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (!validate_subreg (tmode, inner_mode, inner, final_word)) return NULL_RTX; new = gen_rtx_SUBREG (tmode, inner, final_word); r~
Re: [RFC] patch to fix an ICE involving sign-extract of mmx expression
On Sep 23, 2005, at 12:41 PM, Richard Henderson wrote: On Thu, Sep 22, 2005 at 01:21:06PM -0700, Fariborz Jahanian wrote: /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (final_word = GET_MODE_SIZE (inner_mode) ! || (final_word % GET_MODE_SIZE (tmode)) != 0) return NULL_RTX; I think you should just call validate_subreg. Ok with that change. Yes. Will do so. - fj r~
[RFC] patch to fix an ICE involving sign-extract of mmx expression
In a given test case with 128 bit mmx intrinsics, routine make_compound_operation (in combine.c) attempts to do a sign-extract of the middle 64bit of the 128 bit (TImode) register. Pattern we have is: (lshiftrt:TI (ashift:TI (subreg:TI (reg/v:V2DI 75 [ vu16YPrediction3 ]) 0) (const_int 32 [0x20])) (const_int 64 [0x40])) And here is the code which attempts to do this: case LSHIFTRT: /* ... fall through ... */ case ASHIFTRT: lhs = XEXP (x, 0); rhs = XEXP (x, 1); /* If we have (ashiftrt (ashift foo C1) C2) with C2 = C1, this is a SIGN_EXTRACT. */ =if (GET_CODE (rhs) == CONST_INT GET_CODE (lhs) == ASHIFT GET_CODE (XEXP (lhs, 1)) == CONST_INT INTVAL (rhs) = INTVAL (XEXP (lhs, 1))) { new = make_compound_operation (XEXP (lhs, 0), next_code); new = make_extraction (mode, new, INTVAL (rhs) - INTVAL (XEXP (lhs, 1)), NULL_RTX, mode_width - INTVAL (rhs), code == LSHIFTRT, 0, in_code == COMPARE); This results in gen_rtx_SUBREG asserting. We can't really do this extraction when the extraction mode (DImode in this case) is not properly aligned within its original mode. In other words, gen_rtx_SUBREG attempts to generate an illegal rtl; such as: (subreg:DI (reg/v:V2DI 75 [ vu16YPrediction3 ]) 4) and asserts. Following patch avoids this problem. If this is OK, I will submit a patch when fsf mainline is unfrozen. - fariborz ([EMAIL PROTECTED]) Index: combine.c === RCS file: /cvs/gcc/gcc/gcc/combine.c,v retrieving revision 1.475.2.5 diff -c -p -r1.475.2.5 combine.c *** combine.c 26 Aug 2005 22:36:52 - 1.475.2.5 --- combine.c 22 Sep 2005 19:52:02 - *** make_extraction (enum machine_mode mode, *** 6197,6203 /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (final_word = GET_MODE_SIZE (inner_mode)) return NULL_RTX; new = gen_rtx_SUBREG (tmode, inner, final_word); --- 6197,6204 /* Avoid creating invalid subregs, for example when simplifying (x32)255. */ ! if (final_word = GET_MODE_SIZE (inner_mode) ! || (final_word % GET_MODE_SIZE (tmode)) != 0) return NULL_RTX; new = gen_rtx_SUBREG (tmode, inner, final_word);
Can we have a symbol_ref node of a declared symbol without having its flags set?
I ran into a problem when chasing down an -mfix-and-continue (an apple specialty :) code-gen problem. In a test case, ivopts creates a symbol_ref via a call to produce_memory_decl_rtl; as in: if (TREE_STATIC (obj) || DECL_EXTERNAL (obj)) { const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj)); x = gen_rtx_SYMBOL_REF (Pmode, name); } ... But it does not set the flags for this symbol. This causes code gen problem in certain cases ; such as in apple-ppc-darwin PIC generation code, which rely on these flags. An obvious fix come to mind is to set the flags when symbol_ref is created. Such as in this patch. But a more general question is should we always set the flags for symbol_ref whenever such a node is created for a declared symbol? --- 2376,2404 static rtx produce_memory_decl_rtl (tree obj, int *regno) { ! rtx x, ret; if (!obj) abort (); if (TREE_STATIC (obj) || DECL_EXTERNAL (obj)) { const char *name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (obj)); x = gen_rtx_SYMBOL_REF (Pmode, name); + ret = gen_rtx_MEM (DECL_MODE (obj), x); + SET_DECL_RTL (obj, ret); + targetm.encode_section_info (obj, DECL_RTL (obj), true); } else ! { ! x = gen_raw_REG (Pmode, (*regno)++); ! ret = gen_rtx_MEM (DECL_MODE (obj), x); ! } ! return ret; } Thanks, fariborz ([EMAIL PROTECTED])
RFC - COST of const_double for x86 prevents constant copy propagation in cse
(Note! I am starting a new thread of an old thread because of old thread's corruption which prevented me from responding). Following test case: struct S { double d1, d2, d3; }; struct S ms() { struct S s = {0,0,0}; return s; } Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces: pxor%xmm0, %xmm0 movsd %xmm0, 16(%eax) movsd %xmm0, 8(%eax) movsd %xmm0, (%eax) But following code results in 7% performance gain in eon as reported by one of Apple's performance people: movl$0, 16(%eax) movl$0, 20(%eax) movl$0, 8(%eax) movl$0, 12(%eax) movl$0, (%eax) movl$0, 4(%eax) This is because cse does not do the constant propagation in this rtl (note that cse is capable of grabbing a constant from REG_EQUAL ). (insn 12 7 13 0 (set (reg:DF 59) (mem/u/i:DF (symbol_ref/u:SI (*LC0) [flags 0x2]) [0 S8 A64])) 64 {*movdf_nointeger} (nil) (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0]) (nil))) (insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ]) (const_int 16 [0x10])) [0 result.d3+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) (insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ]) (const_int 8 [0x8])) [0 result.d2+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) (insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0 result.d1+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) And the reason that it is not doing it is the definition of COST macro which returns a higher cost for const_double than when constant is available in a register. For x86 platform, this cost is evaluated in call to ix86_rtx_costs. It returns 1 or 2. I had a lengthy conversation with Ian Lance Taylor. He suggested to lower the const_double cost to 0. And indeed, this will lower the cost so COST of const_double constant wins. But careful selection of this cost in ix86_rtx_costs makes me cautious that this may break performance on some other flavors of x86 architecture and/or on some other benchmarks. Any comments from those familiar with this cost function (or any other way that cse to do its job, such as a special new cost function) is appreciated. - Thanks, fariborz ([EMAIL PROTECTED]).
Re: RFC - COST of const_double for x86 prevents constant copy propagation in cse
Forgot to attach the patch: Index: i386.c === RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v retrieving revision 1.795.4.33 diff -c -p -r1.795.4.33 i386.c *** i386.c 15 Aug 2005 23:36:10 - 1.795.4.33 --- i386.c 25 Aug 2005 17:08:33 - *** ix86_rtx_costs (rtx x, int code, int out *** 15730,15740 else switch (standard_80387_constant_p (x)) { ! case 1: /* 0.0 */ ! *total = 1; ! break; ! default: /* Other constants */ ! *total = 2; break; case 0: case -1: --- 15730,15737 else switch (standard_80387_constant_p (x)) { ! default: /* All constants */ ! *total = 0; break; case 0: case -1: On Aug 25, 2005, at 11:09 AM, Fariborz Jahanian wrote: (Note! I am starting a new thread of an old thread because of old thread's corruption which prevented me from responding). Following test case: struct S { double d1, d2, d3; }; struct S ms() { struct S s = {0,0,0}; return s; } Compiled with -O1 -mdynamic-no-pic -march=pentium4 produces: pxor%xmm0, %xmm0 movsd %xmm0, 16(%eax) movsd %xmm0, 8(%eax) movsd %xmm0, (%eax) But following code results in 7% performance gain in eon as reported by one of Apple's performance people: movl$0, 16(%eax) movl$0, 20(%eax) movl$0, 8(%eax) movl$0, 12(%eax) movl$0, (%eax) movl$0, 4(%eax) This is because cse does not do the constant propagation in this rtl (note that cse is capable of grabbing a constant from REG_EQUAL ). (insn 12 7 13 0 (set (reg:DF 59) (mem/u/i:DF (symbol_ref/u:SI (*LC0) [flags 0x2]) [0 S8 A64])) 64 {*movdf_nointeger} (nil) (expr_list:REG_EQUAL (const_double:DF 0.0 [0x0.0p+0]) (nil))) (insn 13 12 15 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ]) (const_int 16 [0x10])) [0 result.d3+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) (insn 15 13 17 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 58 [ D.1470 ]) (const_int 8 [0x8])) [0 result.d2+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) (insn 17 15 20 0 (set (mem/s/j:DF (reg/f:SI 58 [ D.1470 ]) [0 result.d1+0 S8 A32]) (reg:DF 59)) 64 {*movdf_nointeger} (nil) (nil)) And the reason that it is not doing it is the definition of COST macro which returns a higher cost for const_double than when constant is available in a register. For x86 platform, this cost is evaluated in call to ix86_rtx_costs. It returns 1 or 2. I had a lengthy conversation with Ian Lance Taylor. He suggested to lower the const_double cost to 0. And indeed, this will lower the cost so COST of const_double constant wins. But careful selection of this cost in ix86_rtx_costs makes me cautious that this may break performance on some other flavors of x86 architecture and/or on some other benchmarks. Any comments from those familiar with this cost function (or any other way that cse to do its job, such as a special new cost function) is appreciated. - Thanks, fariborz ([EMAIL PROTECTED]).
Re: RFC - COST of const_double for x86 prevents constant copy propagation in cse
On Aug 25, 2005, at 12:47 PM, H. J. Lu wrote: On Thu, Aug 25, 2005 at 12:37:32PM -0700, Ian Lance Taylor wrote: Fariborz Jahanian [EMAIL PROTECTED] writes: Forgot to attach the patch: Index: i386.c === RCS file: /cvs/gcc/gcc/gcc/config/i386/i386.c,v retrieving revision 1.795.4.33 diff -c -p -r1.795.4.33 i386.c *** i386.c 15 Aug 2005 23:36:10 - 1.795.4.33 --- i386.c 25 Aug 2005 17:08:33 - *** ix86_rtx_costs (rtx x, int code, int out *** 15730,15740 else switch (standard_80387_constant_p (x)) { ! case 1: /* 0.0 */ ! *total = 1; ! break; ! default: /* Other constants */ ! *total = 2; break; case 0: case -1: --- 15730,15737 else switch (standard_80387_constant_p (x)) { ! default: /* All constants */ ! *total = 0; break; case 0: case -1: For what it's worth, as I told Fariborz, I suspect that returning 0 is correct for SFmode, but I'm somewhat doubtful for DFmode. And his test case is odd since the resulting code has more instructions and is larger. I know little about x86 instruction timings, but it seems surprising that the new sequence is faster. Maybe the problem is in using %xmm0 instead of one of the 80387 registers--or, since this is after all merely a constant--one of the general registers. And in any case this type of thing should be controlled by an entry in the i386 processor_costs structure. I think the problem may be somewhere else. I got the same xmm0 code sequence on Linux/ia32 with -msse3 -mfpmath=sse. However, I got xorl%eax, %eax movq%rax, 16(%rdi) movq%rax, 8(%rdi) movq%rax, (%rdi) Can you try this with -march=pentium4 - fariborz on Linux/x86-64. H.J.
bootstrap of gcc mainline on apple-x86-darwin is broken
Today's checkout and bootstrap on apple-x86-darwin resulted in the following: stage1/xgcc -Bstage1/ -B/usr/local/i686-apple-darwin8.1.0/bin/ -O2 - g -fomit-frame-pointer -DIN_GCC -W -Wall -Wwrite-strings -Wstrict- prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno- variadic-macros -Wold-style-definition -Werror -fno-common - DHAVE_CONFIG_H -DGENERATOR_FILE -o build/genattrtab \ build/genattrtab.o build/genautomata.o \ build/rtl.o build/read-rtl.o build/ggc-none.o build/min-insn-modes.o build/gensupport.o build/insn-conditions.o build/print-rtl.o build/ errors.o \ build/varray.o ../build-i686-apple-darwin8.1.0/libiberty/libiberty.a -lm build/genattrtab ../../gcc-mainline/gcc/config/i386/i386.md tmp- attrtab.c make[2]: *** [s-attrtab] Error 139 make[1]: *** [stage2_build] Error 2 make: *** [bootstrap] Error 2 Is this known? - Thanks, fariborz
x86 build is broken
Tried building fsf mainline on x86-darwin. Syntax error compiling c- common.c. The preprocessed file shows the following: if (__builtin_ acosf 1) { tree decl; ((void)(!((!1 !1) || !strncmp (__builtin_ acosf, __builtin_, strlen (__builtin_))) ? fancy_abort (../../gcc- mainline/gcc/builtins.def, 162, __FUNCTION__), 0 : 0)); if (!1) decl = lang_hooks.builtin_function (__builtin_ acosf, builtin_types[BT_FN_FLOAT_FLOAT], BUILT_IN_ACOSF, BUILT_IN_NORMAL, (1 ? (__builtin_ acosf + strlen (__builtin_)) : ((void *)0)), built_in_attributes[(int) (flag_errno_math ? ATTR_NOTHROW_LIST : (flag_unsafe_math_optimizations ? ATTR_CONST_NOTHROW_LIST : ATTR_PURE_NOTHROW_NOVOPS_LIST))]); else decl = builtin_function_2 (__builtin_ acosf, __builtin_ acosf + strlen (__builtin_), builtin_types[BT_FN_FLOAT_FLOAT], builtin_types[BT_FN_FLOAT_FLOAT], BUILT_IN_ACOSF, BUILT_IN_NORMAL, 1, !flag_isoc99, built_in_attributes[(int) (flag_errno_math ? ATTR_NOTHROW_LIST : (flag_unsafe_math_optimizations ? ATTR_CONST_NOTHROW_LIST : ATTR_PURE_NOTHROW_NOVOPS_LIST))]); built_in_decls[(int) BUILT_IN_ACOSF] = decl; if () ^^ implicit_built_in_decls[(int) BUILT_IN_ACOSF] = decl; } Which is result of macro expansion of: DEF_C99_C90RES_BUILTIN (BUILT_IN_ACOSF, acosf, BT_FN_FLOAT_FLOAT, ATTR_MATHFN_FPROUNDING_ERRNO) in builtins.def. - fariborz
Re: x86 build is broken
On Jul 8, 2005, at 5:36 PM, Daniel Berlin wrote: On Fri, 2005-07-08 at 17:13 -0700, Fariborz Jahanian wrote: Tried building fsf mainline on x86-darwin. Syntax error compiling c- common.c. The preprocessed file shows the following: as of when? I bootstrapped and tested x86_64-unknown-linux-gnu and x86-linux- gnu and powerpc-linux-gnu in the 2.5 hours before committing my patch, so i'm pretty sure it wasn't me :) I did a fresh update just to be sure; still broken for x86-darwin. I don't think it is related to your change. It could be darwin specific. - fariborz
Re: x86 build is broken
On Jul 8, 2005, at 5:41 PM, Andrew Pinski wrote: On Jul 8, 2005, at 8:13 PM, Fariborz Jahanian wrote: Tried building fsf mainline on x86-darwin. Syntax error compiling c-common.c. The preprocessed file shows the following: This is a darwin specific bug and was introduced by Geoff K.'s patch today. I committed this as obvious to fix the bug. Thanks, Andrew Pinski ChangeLog: * config/darwin.h (TARGET_C99_FUNCTIONS): Define to 1. Yes. This should fix it. Thanks. - fariborz t3.diff.txt
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 30, 2005, at 11:23 AM, Jeffrey A Law wrote: On Thu, 2005-06-30 at 20:12 +0200, Bernd Schmidt wrote: Jeffrey A Law wrote: I'd tend to agree. I'd rather see the option go away than linger on if the option is no longer useful. I wouldn't mind that, but I'd also like to point out that there are Makefiles out there which hard-code things like -fforce-mem. Do we want to keep the option as a stub to avoid breaking them? Excellent point. I believe in other cases we've kept the option around for a release, then killed it. I would also like to keep this feature around for a while. It is possible that setting of this option under -O2/-O3 has masked some optimization bugs. In which case, addition of -fforce-mem would be a temporary workaround. - fariborz jeff
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 30, 2005, at 12:47 PM, Steven Bosscher wrote: Well, maybe so, but it would be a pretty lame workaround. Why are you so worried about bugs? This flag was always disabled at -O1, and we have never seen any bug reports that got fixed with -fforced-mem. And besides, it is better to fix bugs than to work around them. Making the option a nop, issuing a warning in 4.1 and removing the option completely for gcc 4.2 looks like a very reasonable approach to me. OK. This seems to be the consensus and I will prepare a patch base on that. - Thanks, fariborz Gr. Steven
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 27, 2005, at 12:56 PM, Richard Henderson wrote: Hmm. I would suspect this is obsolete now. We'll have forced everything into registers (or something equivalent that we can work with) during tree optimization. Any CSEs that can be made should have been made. I will do sanity check followed by SPEC runs (x86 and ppc darwin) and see if behavior changes by obsoleting -fforce-mem in -O2 (or higher). - Thanks, fariborz r~
[RFH] - Less than optimal code compiling 252.eon -O2 for x86
A source file mrSurfaceList.cc of 252.eon produces less efficient code initializing instance objects to 0 at -O2 than at -O1. Behavior is random and it does not happen on all x86 platforms and making the test smaller makes the problem go away. But here is what I found out is the cause. When source is compiled with -O1 -march=pentium4, 'cse' phase sees the following pattern initializing a 'double' with 0. (insn 18 13 19 0 (set (reg:SF 109) (mem/u/i:SF (symbol_ref/u:SI (*LC11) [flags 0x2]) [0 S4 A32])) -1 (nil) (nil)) (insn 19 18 20 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffe0])) [0 objectBox.pmin.e+16 S8 A128]) (float_extend:DF (reg:SF 109))) 86 {*extendsfdf2_sse} (nil) (nil)) Then fold_rtx routine converts it into its reduced form, resulting in optimum code: (insn 19 13 21 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffe0])) [0 objectBox.pmin.e+16 S8 A128]) (const_double:DF 0.0 [0x0.0p+0])) 64 {*movdf_nointeger} (nil) (nil)) But when the same source is compiled with -O2 march=pentium4, 'cse' phase sees a slightly different pattern (note that float_extend:DF has moved) (insn 18 13 19 0 (set (reg:DF 109) (float_extend:DF (mem/u/i:SF (symbol_ref/u:SI (*LC13) [flags 0x2]) [0 S4 A32]))) -1 (nil) (nil)) (insn 19 18 20 0 (set (mem/s/j:DF (plus:SI (reg/f:SI 20 frame) (const_int -32 [0xffe0])) [0 objectBox.pmin.e+16 S8 A128]) (reg:DF 109)) 64 {*movdf_nointeger} (nil) (nil)) This cannot be simplified by fold_rtx, resulting in less efficient code. Change in pattern is most likely because of additional tree optimization phases running at -O2. If so, then should the cse be taught to simplify the new rtl pattern. Or, the tree optimizer phase responsible for the less than optimal tree need be twiked to generate the same tree as with -O1? Thanks, fariborz
[RFC] Problem with altivec_vmrghb pattern in altivec.md
One of our internal apps fails due to problem in folding of vec_mergeh of unsigned char of zeros and ones. It produces a new vector of zeros followed by ones. I traced the problem to the 3rd operand for the altivec_vmrghb pattern defined in altivec.md file. It is 255 (0xff). I think it should be 21845 for unsigned chars (0x). With this change, customer code passes and merged pattern looks OK. So far so good. But I tried the following test case (with -O2) and curiously enough it works OK with or *without* my change!. So, I am wondering if I approached this problem correctly. From the code in simplify-rtx.c where value of merge of two constants in a VEC_MERGE rtl is computed, it seems that the correct value for element selection should be 0x. But I am curious why changing this value did not make a difference in the following test case (compiled with -O2). - Thanks, fariborz ([EMAIL PROTECTED]) #include stdio.h int main (int argc, const char * argv[]) { vector unsigned char v_zero; vector unsigned char v_c1; v_zero = (vector unsigned char) ('a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'); v_c1 = (vector unsigned char) ('1','2','3','4','5','6','7','8','1','2','3','4','5','6','7','8'); vector unsigned char vResult = vec_mergeh(v_zero, v_c1); printf (\t%vc\n, vResult); return 0; }
bootstrap fails for apple-ppc-darwin
Today, I tried bootstrapping gcc mainline on/for apple-ppc-darwin. It fails in stage1. Is this known? - Thanks, fariborz ./xgcc -B./ -B/usr/local/powerpc-apple-darwin8.0.0/bin/ -isystem /usr/local/powerpc-apple-darwin8.0.0/include -isystem /usr/local/powerpc-apple-darwin8.0.0/sys-include -L/Volumes/sandbox/gcc-mainline-bootstrap.obj/gcc/../ld -O2 -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -Wold-style-definition -isystem ./include -Wa,-force_cpusubtype_ALL -g -DHAVE_GTHR_DEFAULT -DIN_LIBGCC2 -D__GCC_FLOAT_NOT_NEEDED -dynamiclib -nodefaultlibs -Wl,-install_name,/usr/local/lib/libgcc_s.1.0.dylib -Wl,-flat_namespace -o ppc64/libgcc_s.1.0.dylib.tmp -Wl,-exported_symbols_list,libgcc/ppc64/libgcc.map -compatibility_version 1 -current_version 1.0 -m64 libgcc/ppc64/_muldi3_s.o libgcc/ppc64/_negdi2_s.o libgcc/ppc64/_lshrdi3_s.o libgcc/ppc64/_ashldi3_s.o libgcc/ppc64/_ashrdi3_s.o libgcc/ppc64/_cmpdi2_s.o libgcc/ppc64/_ucmpdi2_s.o libgcc/ppc64/_floatdidf_s.o libgcc/ppc64/_floatdisf_s.o libgcc/ppc64/_fixunsdfsi_s.o libgcc/ppc64/_fixunssfsi_s.o libgcc/ppc64/_fixunsdfdi_s.o libgcc/ppc64/_fixdfdi_s.o libgcc/ppc64/_fixunssfdi_s.o libgcc/ppc64/_fixsfdi_s.o libgcc/ppc64/_fixxfdi_s.o libgcc/ppc64/_fixunsxfdi_s.o libgcc/ppc64/_floatdixf_s.o libgcc/ppc64/_fixunsxfsi_s.o libgcc/ppc64/_fixtfdi_s.o libgcc/ppc64/_fixunstfdi_s.o libgcc/ppc64/_floatditf_s.o libgcc/ppc64/_clear_cache_s.o libgcc/ppc64/_enable_execute_stack_s.o libgcc/ppc64/_trampoline_s.o libgcc/ppc64/__main_s.o libgcc/ppc64/_absvsi2_s.o libgcc/ppc64/_absvdi2_s.o libgcc/ppc64/_addvsi3_s.o libgcc/ppc64/_addvdi3_s.o libgcc/ppc64/_subvsi3_s.o libgcc/ppc64/_subvdi3_s.o libgcc/ppc64/_mulvsi3_s.o libgcc/ppc64/_mulvdi3_s.o libgcc/ppc64/_negvsi2_s.o libgcc/ppc64/_negvdi2_s.o libgcc/ppc64/_ctors_s.o libgcc/ppc64/_ffssi2_s.o libgcc/ppc64/_ffsdi2_s.o libgcc/ppc64/_clz_s.o libgcc/ppc64/_clzsi2_s.o libgcc/ppc64/_clzdi2_s.o libgcc/ppc64/_ctzsi2_s.o libgcc/ppc64/_ctzdi2_s.o libgcc/ppc64/_popcount_tab_s.o libgcc/ppc64/_popcountsi2_s.o libgcc/ppc64/_popcountdi2_s.o libgcc/ppc64/_paritysi2_s.o libgcc/ppc64/_paritydi2_s.o libgcc/ppc64/_powisf2_s.o libgcc/ppc64/_powidf2_s.o libgcc/ppc64/_powixf2_s.o libgcc/ppc64/_powitf2_s.o libgcc/ppc64/_mulsc3_s.o libgcc/ppc64/_muldc3_s.o libgcc/ppc64/_mulxc3_s.o libgcc/ppc64/_multc3_s.o libgcc/ppc64/_divsc3_s.o libgcc/ppc64/_divdc3_s.o libgcc/ppc64/_divxc3_s.o libgcc/ppc64/_divtc3_s.o libgcc/ppc64/_divdi3_s.o libgcc/ppc64/_moddi3_s.o libgcc/ppc64/_udivdi3_s.o libgcc/ppc64/_umoddi3_s.o libgcc/ppc64/_udiv_w_sdiv_s.o libgcc/ppc64/_udivmoddi4_s.o libgcc/ppc64/darwin-tramp_s.o libgcc/ppc64/darwin-ldouble_s.o libgcc/ppc64/unwind-dw2_s.o libgcc/ppc64/unwind-dw2-fde-darwin_s.o libgcc/ppc64/unwind-sjlj_s.o libgcc/ppc64/unwind-c_s.o libgcc/ppc64/darwin-fallback_s.o -lc rm -f ppc64/libgcc_s.dylib if [ -f ppc64/libgcc_s.1.0.dylib ]; then mv -f ppc64/libgcc_s.1.0.dylib ppc64/libgcc_s.1.0.dylib.backup; else true; fi mv ppc64/libgcc_s.1.0.dylib.tmp ppc64/libgcc_s.1.0.dylib ln -s libgcc_s.1.0.dylib ppc64/libgcc_s.dylib /usr/bin/libtool: fatal error in ld64 make[3]: *** [ppc64/libgcc_s.dylib] Error 1 make[2]: *** [libgcc.a] Error 2 make[1]: *** [stage1_build] Error 2 make: *** [bootstrap] Error 2
C++ [RFC] taking address of a static const data member
Section 9.4.2 of c++ standard Static data members does not directly address this issue. But there is a dejagnu c++ test case which explicitly disallows (by issuing a link-time error) taking address of a static const data member. Test case is const2.C. This question has come up because, g++-4.0 (ppc-darwin target) issues the same link error for the following test case (which requires taking address of Foo::foo). #include map struct Foo { static const int foo = 0x3ab; }; int main() { std::mapint, int m; m[Foo::foo]; } And here is const2.C for easy reference: / { dg-do link } // This test should get a linker error for the reference to Aint::i. // { dg-error i { target *-*-* } 0 } template class T struct B { static const int i = 3; }; template class T struct A { static const int i = BT::i; }; const int *p = Aint::i; int main(){} So, is g++ correct in rejecting this seemingly good user code? - Thanks, fariborz ([EMAIL PROTECTED])
Re: C++ [RFC] taking address of a static const data member
Thanks Andrew. Yes, standard actually mentions this that I missed. - fariborz On Mar 11, 2005, at 11:25 AM, Andrew Pinski wrote: On Mar 11, 2005, at 2:16 PM, Fariborz Jahanian wrote: So, is g++ correct in rejecting this seemingly good user code? Yes you need a place to store the data. So for an example in your original testcase, you need: const int Foo::foo; Which fixes the problem and yes 9.4.2 explains this (I cannot find it right now but I know there has been multiple bugs about this in the past). -- Pinski
Re: patch [RFC] Simple loop runs out of stack at -O1
On Feb 25, 2005, at 1:16 PM, Joe Buck wrote: I duplicated this on a i686-pc-linux-gnu system: the compiler is built from last night's trunk. % /usr/localdisk/gcc-cvs/trunk/bin/gcc -c -O1 bad.c gcc: Internal error: Segmentation fault (program cc1) Please submit a full bug report. See URL:http://gcc.gnu.org/bugs.html for instructions. Could you please file a PR and attach the proposed patch? I wil shortly. - fariborz