[Bug c/28418] [4.0/4.1 regression] ICE incrementing compound literal expression
--- Comment #8 from fjahanian at apple dot com 2006-08-25 21:36 --- I was about to sub mit the patch. Thank you for this patch. - Fariborz > Subject: Bug 28418 > > Author: jsm28 > Date: Fri Aug 25 21:14:24 2006 > New Revision: 116436 > > URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=116436 > Log: > 2006-08-25 Fariborz Jahanian <[EMAIL PROTECTED]> > > PR c/28418 > * c-gimplify.c (gimplify_compound_literal_expr): Don't add > variable again if DECL_SEEN_IN_BIND_EXPR_P. > > 2006-08-25 Joseph S. Myers <[EMAIL PROTECTED]> > > * gcc.c-torture/compile/compound-literal-1.c: New test. > > Added: > trunk/gcc/testsuite/gcc.c-torture/compile/compound-literal-1.c > Modified: > trunk/gcc/ChangeLog > trunk/gcc/c-gimplify.c > trunk/gcc/testsuite/ChangeLog > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28418
[Bug c++/28554] New: Use of __attribute__ ((constructor)) on functions issues confusing error
In this test case, g++ issues a diagnostics which is confusing. If use of __attribute__ ((constructor)) on a function with argument list other than 'void' is illegal, it should say so: __attribute__ ((constructor)) static void Initialize(int argc, char *argv[], char *envp[]) { } % g++ -c ctor.C ctor.C: In function '(static initializers for ctor.C)': ctor.C:2: error: too few arguments to function 'void Initialize(int, char**, char**)' ctor.C:3: error: at this point in file -- Summary: Use of __attribute__ ((constructor)) on functions issues confusing error Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28554
[Bug c/28418] [4.0/4.1/4.2 regression] ICE incrementing compound literal expression
--- Comment #3 from fjahanian at apple dot com 2006-07-24 23:16 --- gcc generates two separate trees for compound literals in c and c++. As in this test case: struct S { int i,j; }; void foo (struct S); int main () { foo((struct S){1,1}); } In c it generates compound_literal_expr and in c++ it generates target_expr. But gimplifier treats them differently in the following areas: 1) in routine mostly_copy_tree_v we don;t copy target_expr but we do copy compound_literal_expr. I see the following comment there: / * Similar to copy_tree_r() but do not copy SAVE_EXPR or TARGET_EXPR nodes. These nodes model computations that should only be done once. If we were to unshare something like SAVE_EXPR(i++), the gimplification process would create wrong code. */ Shouldn't compound_literal_expr be treated same as target_expr here? 2) gimplify_target_expr can be called more than once on the same target_expr node because first time around its TARGET_EXPR_INITIAL is set to NULL. This works as a guard and prevents its temporary to be added to the temporary list more than once (when call is made to gimple_add_tmp_var). On the other hand, such a guard does not exist for a compound_literal_expr and when gimple_add_tmp_var is called, it asserts. So, I added check for !DECL_SEEN_IN_BIND_EXPR_P (decl) in gimplify_compound_literal_expr before call to gimple_add_tmp_var is made. As in the following diff: % svn diff c-gimplify.c Index: c-gimplify.c === --- c-gimplify.c(revision 116462) +++ c-gimplify.c(working copy) @@ -538,7 +538,7 @@ /* This decl isn't mentioned in the enclosing block, so add it to the list of temps. FIXME it seems a bit of a kludge to say that anonymous artificial vars aren't pushed, but everything else is. */ - if (DECL_NAME (decl) == NULL_TREE) + if (DECL_NAME (decl) == NULL_TREE && !DECL_SEEN_IN_BIND_EXPR_P (decl)) gimple_add_tmp_var (decl); This fixes the problem I am encouterring as well as the test case in this PR. -- fjahanian at apple dot com changed: What|Removed |Added CC| |fjahanian at apple dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28418
[Bug c++/24260] [4.0/4.1 Regression] stdcall attribute is ignored at static member template functions
--- Comment #6 from fjahanian at apple dot com 2005-10-19 17:11 --- (In reply to comment #5) > And did fjahanian take a look at this already to see if he > really is to blame for causing this bug? > I am miffed as to why my name was in ChangeLog-2004. PR/13989 and PR/9844 were fixed by Ziemwit Laski (no longer at Apple). Andrew Pinski may know more about this as he commented and pointed to Ziem's patch in that radar. annotate on ChangeLog-2004 did not reveal any usefull info. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24260
[Bug target/14552] compiled trivial vector intrinsic code is ineffiencent
--- Additional Comments From fjahanian at apple dot com 2005-09-13 21:09 --- Hello, What is the status of Uros's patches in: http://gcc.gnu.org/ml/gcc-patches/2005-07/msg01128.html Looks like they did not make it to FSF mainline? Are there remaining issues with them? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552
[Bug target/22152] Poor loop optimization when using mmx builtins
--- Additional Comments From fjahanian at apple dot com 2005-09-13 00:52 --- Has there been any progress toward fixing the problems addressed by these PRs? - thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22152
[Bug middle-end/21894] [4.0/4.1 Regression] Invalid operand to binary operator with nested function
--- Additional Comments From fjahanian at apple dot com 2005-08-08 17:36 --- Thanks. Test case should say PR 21894. > Fixed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21894
[Bug rtl-optimization/22152] New: Poor loop optimization when using sse2 builtins - regression from 3.3
In the following trivial test case, gcc-4.1 produces very ineffecient code for the loop. gcc-3.3 produces much better code. typedef int __m64 __attribute__ ((__vector_size__ (8))); __m64 unsigned_add3( const __m64 *a, const __m64 *b, unsigned long count ) { __m64 sum; unsigned int i; for( i = 1; i < count; i++ ) { sum = (__m64) __builtin_ia32_paddq ((long long)a[i], (long long)b[i]); } return sum; } 1) Loop when compiled with gcc-4.1 -O2 -msse2 (note in particular the extra movq to memory): L4: movl12(%ebp), %esi movq(%eax,%edx,8), %mm0 paddq (%esi,%edx,8), %mm0 incl%edx cmpl%edx, %ecx movq%mm0, -16(%ebp) movl-16(%ebp), %esi movl-12(%ebp), %edi jne L4 2) Loop using gcc-3.3 compiled with -O2 -msse2: L6: movq(%esi,%edx,8), %mm0 paddq (%eax,%edx,8), %mm0 addl$1, %edx cmpl%ecx, %edx jb L6 AFAICT, culprit is reload which generates extra load and store of %mm0: (insn 62 30 63 2 (set (mem:V2SI (plus:SI (reg/f:SI 6 bp) (const_int -16 [0xfff0])) [0 S8 A8]) (reg:V2SI 29 mm0)) 736 {*movv2si_internal} (nil) (nil)) (insn 63 62 32 2 (set (reg/v:V2SI 4 si [orig:61 sum ] [61]) (mem:V2SI (plus:SI (reg/f:SI 6 bp) (const_int -16 [0xfff0])) [0 S8 A8])) 736 {*movv2si_internal} (nil) (nil)) Here is the larger test case from which above test was extracted: #include __m64 unsigned_add3( const __m64 *a, const __m64 *b, __m64 *result, unsigned long count ) { __m64 carry, temp, sum, one, onesCarry, _a, _b; unsigned int i; if( count > 0 ) { _a = a[0]; _b = b[0]; one = _mm_cmpeq_pi8( _a, _a ); //-1 one = _mm_sub_si64( _mm_xor_si64( one, one ), one );//1 sum = _mm_add_si64( _a, _b ); onesCarry = _mm_and_si64( _a, _b ); //the 1's bit is set only if the 1's bit add generates a carry onesCarry = _mm_and_si64( onesCarry, one ); //onesCarry &= 1 //Trim off the one's bit on both vA and vB to make room for a carry bit at the top after the add _a = _mm_srli_si64( _a, 1 ); //vA >>= 1 _b = _mm_srli_si64( _b, 1 ); //vB >>= 1 //Add vA to vB and add the carry bit carry = _mm_add_si64( _a, _b ); carry = _mm_add_si64( carry, onesCarry ); //right shift by 63 bits to get the carry bit for the high 64 bit quantity carry = _mm_srli_si64( carry, 63 ); for( i = 1; i < count; i++ ) { result[i-1] = sum; _a = a[i]; _b = b[i]; onesCarry = _mm_and_si64( _a, _b ); onesCarry = _mm_and_si64( onesCarry, one ); sum = _mm_add_si64( _a, _b ); _a = _mm_add_si64( _a, onesCarry ); onesCarry = _mm_and_si64( carry, _a ); //find low bit carry sum = _mm_add_si64( sum, carry ); //add in carry bit to low word sum carry = _mm_add_si64( _a, onesCarry ); //add in low bit carry to high result } result[i-1] = sum; } return carry; } Again, gcc-3.3 produces much better code for this loop. -- Summary: Poor loop optimization when using sse2 builtins - regression from 3.3 Product: gcc Version: 4.1.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-x86-darwin GCC host triplet: apple-x86-darwin GCC target triplet: apple-x86-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22152
[Bug c++/22009] New: Friend declaration of a private member function produces error in g++-4.0
I think following test case is correct. But g++-4.0 produces a diagnostics. We should be able to declare a private member function of a class as a friend of another class in order for the member function be able to access private members of the befriended class. class FriendTestTo; class FriendTestFrom { private: void reallySetIt (FriendTestTo* PF); }; class FriendTestTo { private: int i; friend void FriendTestFrom::reallySetIt (FriendTestTo*); }; void FriendTestFrom::reallySetIt (FriendTestTo* PF){ PF->i = 1; }; % g++ -c test.cc test.cc:6: error: 'void FriendTestFrom::reallySetIt(FriendTestTo*)' is private test.cc:13: error: within this context Workaround is to declare class FriendTestFrom as friend of class FriendTestTo. -- Summary: Friend declaration of a private member function produces error in g++-4.0 Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22009
[Bug tree-optimization/21894] New: gcc crashes with -O1 on a call to nested function
Following test ICEs with gcc mainline when compiled with -O1. Test was done on apple-ppc-darwin. % gcc -c -O1 bad.c bad.c: In function 'CheckFile': bad.c:2: internal compiler error: Bus error Please submit a full bug report, with preprocessed source if appropriate. See http://developer.apple.com/bugreporter> for instructions. /* TEST */ typedef unsigned char uchar; static void CheckFile () { uchar *p; uchar tagname[10]; uchar * a = tagname; void validate(uchar const * pp, uchar const * q){ uchar const * p = pp; if (a == tagname+4) { uchar const * x = p; } } while(1){ if(a == tagname) goto slip; if (*p == '\"') { uchar const * const q = ++p; validate(q, p++); } } slip: ; } -- Summary: gcc crashes with -O1 on a call to nested function Product: gcc Version: tree-ssa Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppa-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppa-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21894
[Bug tree-optimization/20256] -ftree-loop-linear doesn't work right in small loop
-- What|Removed |Added CC||dberlin at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20256
[Bug tree-optimization/20256] -ftree-loop-linear doesn't work right in small loop
-- What|Removed |Added CC||dalej at apple dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20256
[Bug tree-optimization/20256] New: -ftree-loop-linear doesn't work right in small loop
This is a small extract from a benchmark. It shows that -O1 -ftree-loop-linear generates a couple of empty loops, and incorrect behavior. /* main.c */ #include extern void init(); double mid_wts[8][35]; double in_pats[1][35]; double do_mid_forward(int patt) { double sum; int neurode, i; for (neurode=0;neurode<8; neurode++) { sum = 0.0; for (i=0; i<35; i++) { sum += mid_wts[neurode][i]*in_pats[patt][i]; } sum = 1.0/(1.0+sum); } return sum; } double value; main() { init(); printf(" %e\n", do_mid_forward (0)); } /* init.c */ extern double mid_wts[8][35]; extern double in_pats[1][35]; double value; void init() { int i; int neurode; value=(double)1.0 - (double) 0.5; for (neurode = 0; neurode<8; neurode++) for (i=0; i<35; i++) mid_wts[neurode][i] = value; for (i=0; i<35; i++) in_pats[0][i] = 1.234; } % cc -c -O0 init.c % cc -O1 -ftree-loop-linear main.c init.o % ./a.out -2.384238e+11 Assembly file for ppc-darwin shows a couple of do-nothing empty loops. Remove -ftree-loop-linear and program behaves correctly. -- Summary: -ftree-loop-linear doesn't work right in small loop Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc-apple-darwin GCC host triplet: powerpc-apple-darwin GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20256
[Bug tree-optimization/20216] [4.0/4.1 Regression] Simple loop runs out of stack at -O1
--- Additional Comments From fjahanian at apple dot com 2005-02-27 00:51 --- (In reply to comment #6) > The first part of the patch seems fine. > We should make tree_fold_binomial non-recursive. You meant tree_fold_factorial? tree_fold_binomial is not recursive as is. > Note, however, that once you do that, the other part of the patch isn't > actually > doing anything (the change to chrec_apply). I agree. checking for 1024 is arbitrary and I did not propose it as a final solution. I think a better solution would be to compute the factorial of the array upper bound, as currently is done. If it cannot be evaluated, due to overflow, chrec_evaluate which depends on computation of tree_fold_binomial returns chrec_dont_know. In other words, we do this optimization only when factorial can be computed. This prevents setting an arbitrary limit and will let the implmentation limitations dicides feasibility of this optimization. What do you think on a patch along this line? > > Then all the memory usage comes from fold (all 600 meg of memory usage, i > mean) > creating new trees. > It also doesn't recurse int hat case. > > In any case, limiting the input to chrec_apply to <1024 is uh, wrong, as it's > not really fixing anything. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20216
[Bug tree-optimization/20216] [4.0/4.1 Regression] Simple loop runs out of stack at -O1
--- Additional Comments From fjahanian at apple dot com 2005-02-25 21:32 --- Created an attachment (id=8286) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=8286&action=view) A proposed patch to fix this Note that patch I attached is against the apple-ppc-branch. So, it may not apply to the mainline as is. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20216
[Bug tree-optimization/20216] New: Simple loop runs out of stack at -O1
Following test case runs out of stack space when gcc tries to compute factorial (159). I have a patch which does two simple things. 1) it rewrites tree_fold_factorial function into its non-recursive version, and 2) it sets a limit before deciding to call chrec_evaluate. This limit is arbitrary in this patch. Author of the algorithm may want to decide when to stop evaluating feasibility of this optimization. Note that even with this limit, the computed factorial overflows. So, even a much smaller limit is needed if this value is significant. /* bad.c */ static unsigned int *buffer; void FUNC (void) { unsigned int *base; int i, j; for (i = 0; i < 4; i++) for (j = 0; j < 160; j++) *base++ = buffer[j]; } % mygccm5 -c -O1 bad.c Out of stack space. Try running 'limit stacksize unlimited' in the shell to raise its limit. -- Summary: Simple loop runs out of stack at -O1 Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc-apple-darwin GCC host triplet: powerpc-apple-darwin GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20216
[Bug target/18118] bad code gen for -mcpu=G5 and unsigned long long to double
--- Additional Comments From fjahanian at apple dot com 2005-01-17 16:49 --- on apple-ppc-branch -mcpu=G5 is all you need to reproduce the problem. But I noticed that this bug is no longer reproducible with the FSF mainline. So, this bug has been fixed as far as I am concerned. Just need to investigate which patch fixed this in mainline. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18118
[Bug target/18916] [4.0 Regression] mis-aligned vector code with copy memory (-maltivec)
--- Additional Comments From fjahanian at apple dot com 2004-12-29 17:34 --- (In reply to comment #8) > Why can't we make sure that temporaries which should be aligned to 128 bits > are actually aligned to > 128 bits? Surely failing to do so will cause other problems. Yes, this is the best way of fixing this problem, hoping not to break ABI conformacne in some obscure way along the way. My last posted patch, took the approach of setting the alignment of the stack temporaries to what they really were. This worked, but it also turned off the Vector move insns for such temporaries. I will look at forcing the 128 bit alignment next year. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug target/18916] [4.0 Regression] mis-aligned vector code with copy memory (-maltivec)
--- Additional Comments From fjahanian at apple dot com 2004-12-21 01:25 --- My last patch also had problems, in that it changed alignment of local vector variables on stack. This alignment cannot be changed because AltiVec intrincics expect 128bit alignment. So, I conclude that only tempoaries with expected 128bit or more alignments are not aligned properly. The safest fix would be to simply change the alignment at the rtl level when temporaries of 128bit alignment need be generated. This requires change to the middle end. Following patch shows the concept and is not an FSF ready patch (which requires target-hook or some such). This patch essentially says that if 128 alignment of local temporaries on stack can not be guaranteed (or changed), then set the alignment value in the rtl to what can be guaranteed. With this patch, emit_block_move will not generate lvx/stvx for these cases. Index: expr.c === RCS file: /cvs/gcc/gcc/gcc/expr.c,v retrieving revision 1.761 diff -c -p -r1.761 expr.c *** expr.c 18 Dec 2004 14:38:31 - 1.761 --- expr.c 21 Dec 2004 01:23:27 - *** emit_push_insn (rtx x, enum machine_mode *** 3457,3463 to record the alignment of the stack slot. */ /* ALIGN may well be better aligned than TYPE, e.g. due to PARM_BOUNDARY. Assume the caller isn't lying. */ ! set_mem_align (target, align); emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM); } --- 3457,3469 to record the alignment of the stack slot. */ /* ALIGN may well be better aligned than TYPE, e.g. due to PARM_BOUNDARY. Assume the caller isn't lying. */ ! /* powerpc-darwin currently does not enforce 128 bit alignment of ! temporaries on the stack. To do so, requires changes which will break ! ABI compatibility. On the other hand, Leaving this unchanged generates ! incorrect code in cases where block move is implemented using ! AltiVec instructions whose src and dest must be 128 bit aligned ! (expand_block_move implementation in rs6000.c). */ ! set_mem_align (target, align >= 128 ? PARM_BOUNDARY : align); emit_block_move (target, xinner, size, BLOCK_OP_CALL_PARM); } *** store_expr (tree exp, rtx target, int ca *** 4206,4214 emit_group_load (target, temp, TREE_TYPE (exp), int_size_in_bytes (TREE_TYPE (exp))); else if (GET_MODE (temp) == BLKmode) ! emit_block_move (target, temp, expr_size (exp), !(call_param_p ! ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); else { temp = force_operand (temp, target); --- 4212,4224 emit_group_load (target, temp, TREE_TYPE (exp), int_size_in_bytes (TREE_TYPE (exp))); else if (GET_MODE (temp) == BLKmode) ! { ! /* See previous comment. */ ! set_mem_align (temp, MEM_ALIGN (temp) >= 128 ? PARM_BOUNDARY : MEM_ALIGN (temp)); ! emit_block_move (target, temp, expr_size (exp), !(call_param_p ! ? BLOCK_OP_CALL_PARM : BLOCK_OP_NORMAL)); ! } else { temp = force_operand (temp, target); (In reply to comment #6) > And this is the patch that I had in mind. Can this break ABI compatibily? My > limited testing shows > that it does not. > > Index: rs6000.c > === > > RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v > retrieving revision 1.332.2.46.2.84 > diff -c -p -r1.332.2.46.2.84 rs6000.c > *** rs6000.c16 Dec 2004 03:23:30 - 1.332.2.46.2.84 > --- rs6000.c18 Dec 2004 01:44:28 - > *** function_arg_boundary (enum machine_mode > *** 5190,5195 > --- 5190,5201 >|| (type && TREE_CODE (type) == VECTOR_TYPE >&& int_size_in_bytes (type) >= 16)) > return 128; > + else if (DEFAULT_ABI == ABI_DARWIN && mode == BLKmode > + && TYPE_ALIGN (type) >= 128) > + { > + TYPE_ALIGN (type) = PARM_BOUNDARY; > + return PARM_BOUNDARY; > + } > else > return PARM_BOUNDARY; > } > -- What|Removed |Added CC||dalej at apple dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug target/18916] [4.0 Regression] mis-aligned vector code with copy memory (-maltivec)
--- Additional Comments From fjahanian at apple dot com 2004-12-18 01:46 --- And this is the patch that I had in mind. Can this break ABI compatibily? My limited testing shows that it does not. Index: rs6000.c === RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v retrieving revision 1.332.2.46.2.84 diff -c -p -r1.332.2.46.2.84 rs6000.c *** rs6000.c16 Dec 2004 03:23:30 - 1.332.2.46.2.84 --- rs6000.c18 Dec 2004 01:44:28 - *** function_arg_boundary (enum machine_mode *** 5190,5195 --- 5190,5201 || (type && TREE_CODE (type) == VECTOR_TYPE && int_size_in_bytes (type) >= 16)) return 128; + else if (DEFAULT_ABI == ABI_DARWIN && mode == BLKmode + && TYPE_ALIGN (type) >= 128) + { + TYPE_ALIGN (type) = PARM_BOUNDARY; + return PARM_BOUNDARY; + } else return PARM_BOUNDARY; } (In reply to comment #5) > Followin patch fixes the alignment problem. But it cannot be applied because > it breaks ABI > compatibilty. > > A possible solution is to relax alignment of the type in question (with > alignment of 128) to that of the > PARM_BOUNDARY (32). This will not (should not ?) break the ABI compatibility > (because it is currently > on PARM_BOUNDARY). But it will prevent vector code to be generated (which is > cause of the abort). > Comments are most welcome. > > > Index: rs6000.c > === > > RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v > retrieving revision 1.332.2.46.2.84 > diff -c -p -r1.332.2.46.2.84 rs6000.c > *** rs6000.c16 Dec 2004 03:23:30 - 1.332.2.46.2.84 > --- rs6000.c18 Dec 2004 00:20:54 - > *** function_arg_boundary (enum machine_mode > *** 5190,5195 > --- 5190,5197 >|| (type && TREE_CODE (type) == VECTOR_TYPE >&& int_size_in_bytes (type) >= 16)) > return 128; > + else if (DEFAULT_ABI == ABI_DARWIN && mode == BLKmode) > + return MAX (TYPE_ALIGN (type), PARM_BOUNDARY); > else > return PARM_BOUNDARY; > } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug target/18916] [4.0 Regression] mis-aligned vector code with copy memory (-maltivec)
--- Additional Comments From fjahanian at apple dot com 2004-12-18 00:43 --- Followin patch fixes the alignment problem. But it cannot be applied because it breaks ABI compatibilty. A possible solution is to relax alignment of the type in question (with alignment of 128) to that of the PARM_BOUNDARY (32). This will not (should not ?) break the ABI compatibility (because it is currently on PARM_BOUNDARY). But it will prevent vector code to be generated (which is cause of the abort). Comments are most welcome. Index: rs6000.c === RCS file: /cvs/gcc/gcc/gcc/config/rs6000/rs6000.c,v retrieving revision 1.332.2.46.2.84 diff -c -p -r1.332.2.46.2.84 rs6000.c *** rs6000.c16 Dec 2004 03:23:30 - 1.332.2.46.2.84 --- rs6000.c18 Dec 2004 00:20:54 - *** function_arg_boundary (enum machine_mode *** 5190,5195 --- 5190,5197 || (type && TREE_CODE (type) == VECTOR_TYPE && int_size_in_bytes (type) >= 16)) return 128; + else if (DEFAULT_ABI == ABI_DARWIN && mode == BLKmode) + return MAX (TYPE_ALIGN (type), PARM_BOUNDARY); else return PARM_BOUNDARY; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug tree-optimization/18792] ICE with -O1 -ftree-loop-linear on small test case
--- Additional Comments From fjahanian at apple dot com 2004-12-17 19:40 --- Why hasn't been there be a resolution of this PR? It seems that all issues, including elimination of loop numbers, etc. have been taken care of. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18792
[Bug target/18916] vector code is generated to copy data to mis-aligned memory (-mcpu=G5)
--- Additional Comments From fjahanian at apple dot com 2004-12-10 01:42 --- AFAICT, I don't see how gcc middle-end can force correct parameter alignment when alignment is more strict than PARAM_BOUNDARY. There is no code to do so (I am looking at store_one_arg which is the one responsible for determining the alignment). It does set the MEM_ALIGN field to 128 in this case, but there is no extra padding to move the target address to the next 128 bit boundary. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug target/18916] vector code is generated to copy data to mis-aligned memory (-mcpu=G5)
-- What|Removed |Added CC||dje at watson dot ibm dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug target/18916] New: vector code is generated to copy data to mis-aligned memory (-mcpu=G5)
Following test case, compiled with -mcpu=G5, aborts. It aborts because in passing the 32-byte argument (g1sScld1) to testvaScld1 routine gcc allocates a temporary on the stack for the purpose of storing g1sScld1 and then loading it into GPRs. Recently, rs6000.c was modified in routine expand_block_move to do lvx/stvx when alignment of src and destination are 128 bits. But in the case of temporaries allocated on the stack, target alignment is not correct. It is true that we set the MEM_ALIGN of target temporary to 128 bit, but it comes from the alignment of the source which is a user variable and has the 128 bit alignment. So, in the given test case, routine expand_block_move generates stvx to temporary stack location which is misaligned and bad things happen. extern void abort (void); typedef __builtin_va_list __gnuc_va_list; typedef __gnuc_va_list va_list; typedef struct { _Complex long double a; } Scld1; void testvaScld1 (int n, ...) { va_list ap; __builtin_va_start(ap,n); Scld1 t = __builtin_va_arg(ap,Scld1); if (t.a != (_Complex long double)1) abort(); __builtin_va_end(ap); } int main () { Scld1 g1sScld1; g1sScld1.a = (_Complex long double)1; testvaScld1 (1, g1sScld1); return 0; } -- Summary: vector code is generated to copy data to mis-aligned memory (-mcpu=G5) Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18916
[Bug tree-optimization/18792] ICE with -O1 -ftree-loop-linear on small test case
--- Additional Comments From fjahanian at apple dot com 2004-12-07 23:04 --- I agree that bug is before linear loop xform. Make a slight, none-cfg change to the test case and loop_nbr come out different (and sequential in the nesting). Somehow, changing the first loop condition makes a big difference! void put_atoms_in_triclinic_unitcell(int i, float x[1][3]) { int d; while (i < 0) for (d=0; d<=3; d++) x[i][d] = 0; while (x[i][3] >= 0) for (d=0; d<=3; d++) x[i][d] = 0; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18792
[Bug tree-optimization/18792] ICE with -O1 -ftree-loop-linear on small test case
--- Additional Comments From fjahanian at apple dot com 2004-12-07 22:37 --- Zdenek, Could you take a look at this? -- What|Removed |Added CC||rakdver at atrey dot karlin ||dot mff dot cuni dot cz http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18792
[Bug middle-end/18641] [4.0 Regression] Another ICE caused by reload of a psuedo reg into f0 for a DImode expr
--- Additional Comments From fjahanian at apple dot com 2004-12-06 23:32 --- David's patch (including darwin.h patch attached here) successufully bootstrapped, dejagnu tested on apple-ppc-darwin. Please apply the patch to mainline. Index: darwin.h === RCS file: /cvs/gcc/gcc/gcc/config/rs6000/darwin.h,v retrieving revision 1.72 diff -c -p -r1.72 darwin.h *** darwin.h27 Nov 2004 22:45:22 - 1.72 --- darwin.h6 Dec 2004 17:56:34 - *** do { \ *** 344,351 #undef PREFERRED_RELOAD_CLASS #define PREFERRED_RELOAD_CLASS(X,CLASS) \ ! ((GET_CODE (X) == CONST_DOUBLE \ ! && GET_MODE_CLASS (GET_MODE (X)) == MODE_FLOAT) \ ? NO_REGS \ : ((GET_CODE (X) == SYMBOL_REF || GET_CODE (X) == HIGH)\ && reg_class_subset_p (BASE_REGS, (CLASS))) \ --- 344,351 #undef PREFERRED_RELOAD_CLASS #define PREFERRED_RELOAD_CLASS(X,CLASS) \ ! ((CONSTANT_P (X)\ ! && reg_classes_intersect_p ((CLASS), FLOAT_REGS))\ ? NO_REGS \ : ((GET_CODE (X) == SYMBOL_REF || GET_CODE (X) == HIGH)\ && reg_class_subset_p (BASE_REGS, (CLASS))) \ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18641
[Bug middle-end/18641] [4.0 Regression] Another ICE caused by reload of a psuedo reg into f0 for a DImode expr
--- Additional Comments From fjahanian at apple dot com 2004-12-06 17:55 --- I applied the patch to fsf-mainline (including darwin.h) and it worked for me. I will do the bootstrap, dejagnu testing and let you know how it went. - Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18641
[Bug tree-optimization/18792] New: ICE with -O1 -ftree-loop-linear on small test case
Following small test case, extraced from a SPEC2004 benchmark ICEs when compiled with gcc-4.0 and -O1 -ftree-loop-linear /* test */ void put_atoms_in_triclinic_unitcell(float x[][3]) { int i=0,d; while (x[i][3] < 0) for (d=0; d<=3; d++) x[i][d] = 0; while (x[i][3] >= 0) for (d=0; d<=3; d++) x[i][d] = 0; } % gcc-4.0 -c bad.c -O1 -ftree-loop-linear bad.c: In function 'put_atoms_in_triclinic_unitcell': bad.c:2: internal compiler error: in build_classic_dist_vector, at tree-data-ref.c:1871 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html> for instructions. -- Summary: ICE with -O1 -ftree-loop-linear on small test case Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18792
[Bug middle-end/18641] [4.0 Regression] Another ICE caused by reload of a psuedo reg into f0 for a DImode expr
--- Additional Comments From fjahanian at apple dot com 2004-12-01 22:07 --- Regardless of how we fix this specific problem; by reverting Ulrich's patch to find_reloads_address, making the small change he proposed in find_reloads, or something else, there remains the problem each time a 64-bit integer constant is loaded into an FPR. This is a cronic problem which we need to address. Now is as good as ever. Being a newby in this area please bear with me. I see a couple of solutions: 1) Do not use FPR for a 64-bit constant integers. This is indeed what happens when reverting Ulrich's patch. What benefit do we gain by allowing use of FPR for these cases? Don't we always need to eventually load the constant into a pair of GPRs, via going to memory first. What are the cases where using FPR is beneficial (to reduce register pressure is one answer, but then we still need to go to memory and back to GPRs for any useful operation). 2) Handle this special case in the splitter which is used. But this requires going to memory. Can this be done in the splitter? This seems to be a better solution if 1) cannot be disallowed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18641
[Bug target/18118] bad code gen for -mcpu=G5 and unsigned long long to double
--- Additional Comments From fjahanian at apple dot com 2004-11-29 17:15 --- This patch doesn't fix the problem I reported on apple-ppc-darwin. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18118
[Bug target/18641] New: Another ICE caused by reload of a psuedo reg into f0 for a DImode expr
This is similar to PR/152866. In the following test case compiled with -O0 gcc-4.0 produces following patter in reload phase: (insn 68 47 67 7 (set (reg:DI 32 f0) (const_int 4294967295 [0x])) 354 {*movdi_internal32} (nil) (nil)) This pattern cause ICE in gen_reg_rtx. This is the usual problem. Reload decides to use a float register for a 'long long' expression, a constant in this case because this is legit. for powerpc. But ppc patterns cannot handle it. /* Test case */ void crc() { int toread; long long nleft; unsigned char buf[(128 * 1024)]; nleft = 0; while (toread = (nleft < (2147483647 * 2U + 1U)) ? nleft: (2147483647 * 2U + 1U) ) ; } -- Summary: Another ICE caused by reload of a psuedo reg into f0 for a DImode expr Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: uweigand at de dot ibm dot com ReportedBy: fjahanian at apple dot com CC: dje at gcc dot gnu dot org,gcc-bugs at gcc dot gnu dot org GCC build triplet: powerpc-apple-darwin7.0.0 GCC host triplet: powerpc-apple-darwin7.0.0 GCC target triplet: powerpc-apple-darwin7.0.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18641
[Bug middle-end/16266] [4.0 regression] gcc.dg/c99-intconst-1.c compilation is very slow
--- Additional Comments From fjahanian at apple dot com 2004-11-17 18:02 --- Following patch has broken many dejagnu tests on apple-ppc-darwing with -mcpu=G5. http://gcc.gnu.org/cgi-bin/cvsweb.cgi/gcc/gcc/function.c.diff?cvsroot=gcc&r1=1.581&r2=1.582 FAIL: gcc.c-torture/execute/20041011-1.c compilation, -O1 FAIL: gcc.c-torture/execute/20041011-1.c compilation, -O2 FAIL: gcc.c-torture/execute/950612-1.c compilation, -O1 FAIL: gcc.c-torture/execute/950612-1.c compilation, -O2 FAIL: gcc.c-torture/execute/950612-1.c compilation, -Os FAIL: gcc.c-torture/execute/ashldi-1.c compilation, -O1 FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O1 FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O2 FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O3 -fomit-frame-pointer FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O3 -fomit-frame-pointer -funroll-loops FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O3 -fomit-frame-pointer -funroll-all-loops -finline-functions FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -O3 -g FAIL: gcc.c-torture/execute/ashrdi-1.c compilation, -Os FAIL: gcc.c-torture/execute/lshrdi-1.c compilation, -O1 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=16266
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-26 15:17 --- I tested the patch on apple-ppc-darwin; bootstrapped and dejagnu tested (with and without -mcpu=G5). There were no regressions. This is an important bug for us. We have had 4 separate reporting of this bug. It also happens in SPEC2004. - Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-25 23:58 --- I tried the last patch and for the following statement built with -O2 -mcpu=G5 (aaple's mixed mode) I get the following instruction sequence. It looks OK to me. But David's case might be different than what I am looking at: clock_start=(((double)clock())/((double)(100))); bl L_clock$stub rldicl r3,r3,0,32 lha r0,232(r29) addis r2,r31,ha16(LC40-"L008$pb") std r3,456(r1) lfd f12,lo16(LC40-"L008$pb")(r2) cmpwi cr7,r0,0 nop lfd f13,456(r1) fcfid f0,f13 fdiv f0,f0,f12 fctidz f0,f0 stfd f0,528(r1) nop nop nop ld r19,528(r1) ble cr7,L147 ... ti+=double)clock())/((double)(100)))-clock_start); L186: bl L_clock$stub rldicl r3,r3,0,32 rldicl r2,r19,0,32 std r3,464(r1) std r2,472(r1) addis r2,r31,ha16(LC40-"L008$pb") lfd f0,464(r1) lfd f13,472(r1) lwz r0,816(r30) cmpwi cr7,r0,0 fcfid f12,f0 lfd f0,lo16(LC40-"L008$pb")(r2) fcfid f11,f13 addis r2,r31,ha16(LC39-"L008$pb") fdiv f12,f12,f0 lfd f0,lo16(LC39-"L008$pb")(r2) fsub f12,f12,f11 ... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-25 21:14 --- By mistake, I applied the test for !reload_completed to you earlier patch (which was worng). In any case, after correcting the patch and with your latest patch, all my test cases passed. Now, I need to do a complete bootstrap with -mcpu=G5 on apple-ppc-darwin and will let you know how it goes. Thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-25 20:58 --- You need to replace GET_MODE_SIZE (x) with GET_MODE_SIZE (GET_MODE (x)), etc. for a clean compile. But as I mentioned in last comment, I still get the ICE with or without this patch (along with the previous patch) in all the test cases that I tried. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-25 19:12 --- You referred to them as 'both patches' in comment #21. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/15286] ICE cause by reload
--- Additional Comments From fjahanian at apple dot com 2004-10-25 18:39 --- I applied the last two patch, but it didn;t help: % mygccf -O2 -mcpu=G5 -c loader_obj.i loader_obj.c: In function 'load_obj': loader_obj.c:92: error: unrecognizable insn: (insn 1395 601 1396 50 (set (subreg:DI (mem:SI (plus:SI (reg/f:SI 1 r1) (const_int 716 [0x2cc])) [0 allocednf+0 S4 A8]) 0) (reg:DI 32 f0)) -1 (nil) (nil)) loader_obj.c:92: internal compiler error: in extract_insn, at recog.c:2034 Please submit a full bug report, with preprocessed source if appropriate. See http://developer.apple.com/bugreporter> for instructions. Just to be clear, this is the patch I applied. Index: simplify-rtx.c === RCS file: /cvs/gcc/gcc/gcc/simplify-rtx.c,v retrieving revision 1.107.2.31.2.9 diff -c -p -r1.107.2.31.2.9 simplify-rtx.c *** simplify-rtx.c 16 Oct 2004 00:06:42 - 1.107.2.31.2.9 --- simplify-rtx.c 25 Oct 2004 18:38:20 - *** simplify_gen_subreg (enum machine_mode o *** 3800,3806 if (newx) return newx; ! if (GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte); --- 3800,3808 if (newx) return newx; ! if ((GET_CODE (op) == SUBREG || GET_MODE (op) == VOIDmode !|| (REG_P (op) && REGNO (op) < FIRST_PSEUDO_REGISTER)) ! && !reload_completed) return NULL_RTX; return gen_rtx_SUBREG (outermode, op, byte); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15286
[Bug target/18118] New: bad code gen for -mcpu=G5
Following test case, extracted from rbug.c of dejagnu fails on apple-ppc-darwin when -mcpu=G5 is specified. double s (unsigned long long k) { return (float)k; } extern void abort(); main () { unsigned long long int k; double x; k = 0x82345081ULL; x = s (k); k = (unsigned long long) x; if (k != 0x82345100ULL) abort(); return 0; } -- Summary: bad code gen for -mcpu=G5 Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P2 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18118
[Bug tree-optimization/17892] [4.0 Regression] gcc-4.0 should not reassociate floating point add or multiplication
--- Additional Comments From fjahanian at apple dot com 2004-10-12 20:57 --- tree-outof-ssa.c is not part of this patch. I accidentally checked it in. I have since backed it out. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17892
[Bug tree-optimization/17955] New: Perform associative optimization when it is safe
PR/17892 was filed because gcc-4.0 performs an unsafe optimization of (X*C)*C into X*(C*C). Fix to this PR prevents certain safe transformation; such as X*2.0*2.0->X*4.0 from taking place. This PR is to track this enhancement. -- Summary: Perform associative optimization when it is safe Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: tree-optimization AssignedTo: roger at eyesopen dot com ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org GCC build triplet: apple-ppc-darwin GCC host triplet: apple-ppc-darwin GCC target triplet: apple-ppc-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17955
[Bug tree-optimization/17884] [4.0 Regression] asm 'volatile' is not honored as documented
--- Additional Comments From fjahanian at apple dot com 2004-10-08 16:23 --- But this is a regression from gcc-3.3. Also, without this patch, there is no other place which checks for a volatility of an 'asm' statement. Then why not just say in the documentation that 'volatile' has no effect on an 'asm'? BTW, thanks for preparing the patch for me. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17884
[Bug tree-optimization/17892] New: gcc-4.0 should not reassociate floating point add or multiplication
In the following code the repeated multiplication is folded into a single operation (multiplication by Infinity). For different values of "x" this leads to undeserved or absent floating point exceptions, and breaks some of the elementary math functions in Libm. Occurs at optimization O1 and higher. static const double C = 0x1.0p1023; double foo(double x) { return ( ( (x * C) * C ) * C ); } -- Summary: gcc-4.0 should not reassociate floating point add or multiplication Product: gcc Version: 4.0.0 Status: UNCONFIRMED Severity: normal Priority: P1 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: fjahanian at apple dot com CC: gcc-bugs at gcc dot gnu dot org,roger at eyesopen dot com GCC build triplet: powerpc-apple-darwin GCC host triplet: powerpc-apple-darwin GCC target triplet: powerpc-apple-darwin http://gcc.gnu.org/bugzilla/show_bug.cgi?id=17892