[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
--- Comment #26 from davidxl at gcc dot gnu dot org 2010-08-31 17:45 --- Good observation re. the number of IVs in the final set. This usually points to some problem/bug in the cost function. I briefly looked at this case -- it indeed exposes two more bugs in the cost model: 1) the computation cost of the all the cost pairs in an assignment can actually not simply be added together, because many rewrite expressions can be commoned. We now have the mechanism to compute with common loop invariants for register pressure estimation, and this mechnasim needs to be extended for computation cost. 2) the offset is not stripped when computing loop invariant expression ids -- this can cause problem in overestimating reg pressure. (The case arises more often with loop unrolling). David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
--- Comment #25 from davidxl at gcc dot gnu dot org 2010-08-30 16:41 --- (In reply to comment #24) (In reply to comment #20) (In reply to comment #16) adjust summary according to the last timings I am surprised to see such big differences between trunk and previous releases. Compiling this test case with the those options on my core2 box (2.4GHz ) took only 56seconds which is comparable with the timing with a 4.4.3 compiler (with google local patches including ivopt improvements). Of course - because the ivopt improvement patches are the problem. It is just the total time diff from Joost's measure can be just explained by ivopt component. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
--- Comment #20 from davidxl at gcc dot gnu dot org 2010-08-30 03:10 --- (In reply to comment #16) adjust summary according to the last timings I am surprised to see such big differences between trunk and previous releases. Compiling this test case with the those options on my core2 box (2.4GHz ) took only 56seconds which is comparable with the timing with a 4.4.3 compiler (with google local patches including ivopt improvements). David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45422] [4.6 Regression] compile time increases 3x.
--- Comment #21 from davidxl at gcc dot gnu dot org 2010-08-30 03:19 --- (In reply to comment #17) tree iv optimization : 32.57 (20%) usr 0.10 ( 5%) sys 32.73 (20%) wall 322095 kB (18%) ggc 20% is still completely unreasonable for IV optimization. There was a patch in trunk that may double the time in ivopt -- i.e. find_optimal_iv_set_1 is done twice, one with the original iv set while the other with full set. This probably needs to be revisited. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45422] [4.6 Regression] compile time increases 8x.
--- Comment #10 from davidxl at gcc dot gnu dot org 2010-08-28 06:00 --- fixed in r163610. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45422] [4.6 Regression] compile time increases 8x.
--- Comment #9 from davidxl at gcc dot gnu dot org 2010-08-27 17:01 --- Will take a look -- davidxl at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |davidxl at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2010-08-27 10:23:21 |2010-08-27 17:01:01 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45422
[Bug middle-end/45098] Missed induction variable optimization
--- Comment #1 from davidxl at gcc dot gnu dot org 2010-07-30 17:23 --- Seems -Os specific -- also reproducible on x86. With -O2, the result is expected. David -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||davidxl at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45098
[Bug c++/45121] [4.6 Regression] c-c++-common/uninit-17.c
--- Comment #3 from davidxl at gcc dot gnu dot org 2010-07-29 17:21 --- Fixed in r162687 -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
[Bug c++/45121] [4.6 Regression] c-c++-common/uninit-17.c
--- Comment #2 from davidxl at gcc dot gnu dot org 2010-07-29 05:51 --- The problem is that before the ivopt patch, the ivopt patch introduced a iv candidate that is unconditionally initialized with b: ivtmp_xxx = b (D); After the patch, this assignment no longer exists, and the use of b in the test is via a PHI def -- thus the warning becomes 'may be uninitialized'. Will fix the test case. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45121
[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails
--- Comment #4 from davidxl at gcc dot gnu dot org 2010-07-19 16:34 --- Fixed in r162310. David -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932
[Bug testsuite/44932] gcc.dg/uninit-pred-9_b.c fails
--- Comment #1 from davidxl at gcc dot gnu dot org 2010-07-14 04:12 --- This seems to be specific to powerpc. Could you attach the dump files with options: -O2 -Wuninitialized -fdump-tree-cddce2 -fdump-tree-uninit-details Thanks, David (In reply to comment #0) Subject testcase fails on powerpc64. FAIL: gcc.dg/uninit-pred-9_b.c bogus warning (test for bogus messages, line 24) Compiling standalone I see the following: pthaugen/work ~/install/gcc/trunk/bin/gcc -O2 -S -m32 -Wuninitialized ~/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In function 'foo': /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:24:11: warning: 'v' may be used uninitialized in this function [-Wuninitialized] /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c: In function 'foo_2': /home/pthaugen/src/gcc/trunk/gcc/gcc/testsuite/gcc.dg/uninit-pred-9_b.c:41:11: warning: 'v' may be used uninitialized in this function [-Wuninitialized] -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||xinliangli at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44932
[Bug tree-optimization/43846] [4.5 Regression] array vs members, total scalarization issues
--- Comment #3 from davidxl at gcc dot gnu dot org 2010-04-22 17:04 --- (In reply to comment #2) (In reply to comment #1) so it doesn't consider the struct with the array for total scalarization for some reason. Martin? Well, that was a deliberate decision when fixing PR 42585 (see type_consists_of_records_p). The code is simpler because it does not have to know how to iterate over the array index domain. Of course, we can alleviate this restriction and learn how to iterate. However, all the accesses for the whole array are already created, that is not the issue. The problem basically is that when we see the sequence D.2035.m[0] = D.2044_20; D.2035.m[1] = D.2043_19; D.2035.m[2] = D.2042_18; *b_1(D) = D.2035; (and there are no other accesses to D.2035) the condition that tries to prevent us from creating unnecessary replacements kicks in and we decide not to scalarize. This code sequence looks like a good motivating factor for scalarizing/expansion. In fact, small arrays should be treated the same way as records if all accesses are through compile time constant indices. This is a common scenario after full unrolling. The intent of the current code (possibly among other reasons) was to avoid going through a replacement when the whole structure was then passed as an argument to a function and similar situations. If the temp aggregate is passed to call and the calling convention is not exposed at the IL level, then it is not a good sra candidate as no copy (both code and storage) elimination will be exposed. In this one, the temp aggregate is used as the RHS of an assignment, thus it is a good candidate to expand. So will be the reverse case: aggregate1 = aggregate2; .. ... = aggregate1.e1; ... = aggregate1.e2; David But it should not be very difficult to change the condition (in analyze_access_subtree) to handle both situations right. Doing this, rather than total scalarization for arrays (which should be only useful as a substitute for a copy propagation) should enable us to handle even huge arrays. I'll get to this right after dealing with PR 43835. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||xinliangli at gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43846
[Bug middle-end/36550] Wrong may be used uninitialized warning (conditional PHIs)
--- Comment #11 from davidxl at gcc dot gnu dot org 2010-04-20 23:55 --- (In reply to comment #2) (In reply to comment #1) check() can return 1 on the first call and 0 on the second and if *argv is NULL then then bug will be used uninitialized. right, but this doesn't matter here. Better testcase: /* { dg-do compile } */ /* { dg-options -Os -Wuninitialized } */ void bail(void) __attribute__((noreturn)); unsigned once(void); int pr(char**argv) { char *bug; unsigned check = once(); if (check) { if (*argv) bug = *++argv; } else { bug = *argv++; if (!*argv) bail(); } /* now bug is set except if (check !*argv) */ if (check) { if (!*argv) return 0; } /* if we ever get here then bug is set */ return *bug != 'X'; } The example is a little tricky for the compiler to reason because of the '++argv'. Predicate analysis (http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00706.html -- with additional fix to a never return handling) will catch the following case (while the trunk gcc does not): void bail(void) __attribute__((noreturn)); int foo(void); unsigned once(void); int pr(char**argv) { char *bug; unsigned check = once(); char * a = *argv; if (check) { if (a) bug = *++argv; } else { bug = *argv++; if (!*argv) bail(); } if (foo ()) once(); /* now bug is set except if (check !*argv) */ if (check) { if (!a || !*argv) return 0; } /* if we ever get here then bug is set */ return *bug != 'X'; } -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||davidxl at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36550
[Bug middle-end/20968] spurious may be used uninitialized warning (conditional PHIs)
--- Comment #8 from davidxl at gcc dot gnu dot org 2010-04-21 00:27 --- (In reply to comment #2) Note this is not fully a regression but really a progression. What is happening now is only partial optimizations is happen before the warning to happen. I was unable to reduce the test case further without making the warning disappear. In particular, removing the increment of v1-count makes the warning disappear. This is because we would then jump thread he jump. Again this is because we are emitting the warning too soon, I might be able to come up with a testcase which shows that this is not really a regression but a progression in that we have warned in 3.4 and 4.0: struct {int count;} *v1; int c; int k; extern void baz(int); void foo(void) { int i; int r; if (k == 4) { i = 1; r = 1; } else r = 0; if (!r) { if (!c) return; v1-count++; } if (!c) { baz(i); } } There is no different from the case above and the functions you gave below. There has been some talking about moving where we warn about uninitialized variables but I feel that you can get around this in your code. To reproduce the problem -- -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-ccp are needed. This -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20968
[Bug c/42643] may be used uninitialized compiled with -Wall -O
--- Comment #1 from davidxl at gcc dot gnu dot org 2010-04-21 00:29 --- (In reply to comment #0) When compiling the source with -Wall -O, gcc gives the following warning: % gcc -c -Wall -O gcc_test.c gcc_test.c: In function ?functionLeon?: gcc_test.c:11: warning: ?reference? may be used uninitialized in this function % cat gcc_test.c #includestdio.h typedef struct { int yb; } TCRData; void functionLeon (TCRData *pParent, int pBool); void functionLeon (TCRData *pParent, int pBool) { int isRootCell; TCRData *reference; isRootCell = (pParent == NULL); if (!isRootCell) reference = pParent; if (pBool) { if(!isRootCell) reference-yb++; } } % gcc -v Using built-in specs. Target: x86_64-redhat-linux Configured with: ../../src/gcc-4.4.0/configure --prefix=/remote/depotsrc/depotsrc/amd64-2.4/local_install/gcc-4.4.0 --enable-bootstrap --enable-shared --enable-threads=posix --disable-checking -with-gmp=/remote/depotsrc/depotsrc/amd64-2.4/local_install/gmp-4.3.1 --with-mpfr=/remote/depotsrc/depotsrc/amd64-2.4/local_install/mpfr-2.4.1 --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,fortran --with-cpu=generic --build=x86_64-redhat-linux Thread model: posix gcc version 4.4.0 (GCC) This is a common case handled by patch in http://gcc.gnu.org/ml/gcc-patches/2010-04/msg00706.html -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||davidxl at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42643
[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.
--- Comment #6 from davidxl at gcc dot gnu dot org 2010-02-03 18:30 --- See discussions in http://gcc.gnu.org/ml/gcc-patches/2010-02/msg00138.html about changing dynamic types using placement new -- it is basically not allowed -- so the optimization is valid. David -- davidxl at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |davidxl at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2008-11-29 22:42:58 |2010-02-03 18:30:00 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560
[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.
--- Comment #8 from davidxl at gcc dot gnu dot org 2010-02-03 21:44 --- (In reply to comment #7) It is valid to use placement new to construct a more or less derived type which would change the vtable pointer. Thus I think this bug is still invalid. How did you reach this conclusion from reading p7 of 3.8 in the standard? The original object was a most derived object of type T and the new object is a most derived object of type T The following is allowed: class B { virtual ... }; class D : public B { ... }; B* bp = new D (); ... new (bp) D(); but vptr does not change. Set aside the standard -- this optimization is useful regardless. Some of the develpoers are so desperate that they manually do LICM of vptr and vtbl access for vcalls in the loop. The worst case is to use a option to guard it (which I think the default should be on). David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560
[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.
--- Comment #11 from davidxl at gcc dot gnu dot org 2010-02-03 21:55 --- (In reply to comment #9) Ah, Set aside the standard. Another user who wants to make up his own semantics for a standardized language. No, no, and damn no. Of course, things like this can be brought up to the language committee as long as it is 1) not ambiguous 2) and generally useful. (In terms of optimization related semantics (type aliasing, restrict etc), I am not sure how standard it actually is given the ambiguity here and there.) David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560
[Bug middle-end/35560] Missing CSE/PRE for memory operations involved in virtual call.
--- Comment #13 from davidxl at gcc dot gnu dot org 2010-02-03 22:05 --- (In reply to comment #12) Btw, a destructor call also changes the vtbl pointer. ctors, dtors, wrapper function calls etc are all handled. Detailed write up will be available at some point. To put it a simple way, it is done via live across analyis: if an poly object is referenced before and after a call (accesses to any field of it) both available and anticipated from a a call -- it is live across the call -- vptr field won't be modified by the call. Partially anticipated case is also handled. Once vptr is handled, vtbl access follows automatically -- at vtbls are RO. vptr assignment is treated conservatively. I implemented this thing in 4.4 line using special shadow symbols and VUSE/VDEFS. It works as expected except that SCCVN time went to hell. Simple fix to collapse varying defs in DFS walk help a lot but still slow. Need to do this using alias oracle. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35560
[Bug target/40956] GCSE opportunity in if statement
--- Comment #3 from davidxl at gcc dot gnu dot org 2009-12-23 19:37 --- This bug is ARM specific (thumb) mode. In x86, the hoisting is unnecessary as the move instruction support the imm form. The issue here is more in the GIMPLE canonicalization (target specific). In this case, the IR should be in the following form to expose the hoisting. if (...) { temp = 0; *p = temp; } else { temp = 0; *(p+1) = temp; } -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|RESOLVED|UNCONFIRMED Resolution|DUPLICATE | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40956
[Bug tree-optimization/42337] GCC ICE in compute_antic, at tree-ssa-pre.c:2534
--- Comment #2 from davidxl at gcc dot gnu dot org 2009-12-09 18:07 --- Fixed in r155111. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42337
[Bug tree-optimization/39557] Invalid PDOM lead to infinite loop to be generated
--- Comment #2 from davidxl at gcc dot gnu dot org 2009-03-27 18:25 --- See SVN revision 145121 for the fix. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39557
[Bug tree-optimization/39548] gcc ICE compiling code with option -fprofile-generate
--- Comment #8 from davidxl at gcc dot gnu dot org 2009-03-27 18:28 --- See r145118 for the fix. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39548
[Bug tree-optimization/39557] New: Invalid PDOM lead to infinite loop to be generated
Compiling the attached source with the following options -Wall -fno-exceptions -O2 -fprofile-use=/blah -fno-rtti will result in a code with infinite loop. In DCE, special code is added to handle dead loops conservatively. However this requires PDOM information (control dep info) to be valid. The PDOM is created in unintialized variable warning, but gets invalidated before cddce pass (the incremental update does not work well). With the wrong CD info, DCE pass tries to eliminate the loop, but the exit edge fixup code ends up linking the precessor not to its post-dom bb, but to itself -- leading to infinite loop. A proposed patch will be posted to gcc-patches. David -- Summary: Invalid PDOM lead to infinite loop to be generated Product: gcc Version: 4.4.0 Status: UNCONFIRMED Keywords: wrong-code Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: davidxl at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39557
[Bug tree-optimization/39557] Invalid PDOM lead to infinite loop to be generated
--- Comment #1 from davidxl at gcc dot gnu dot org 2009-03-25 23:10 --- Created an attachment (id=17542) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17542action=view) test case -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39557
[Bug tree-optimization/39548] gcc ICE compiling code with option -fprofile-generate
--- Comment #1 from davidxl at gcc dot gnu dot org 2009-03-24 17:50 --- Created an attachment (id=17538) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17538action=view) Test case -- davidxl at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |davidxl at gcc dot gnu dot |dot org |org Status|UNCONFIRMED |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39548
[Bug tree-optimization/39548] gcc ICE compiling code with option -fprofile-generate
--- Comment #2 from davidxl at gcc dot gnu dot org 2009-03-24 17:51 --- Created an attachment (id=17539) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17539action=view) patch file -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39548
[Bug tree-optimization/39548] gcc ICE compiling code with option -fprofile-generate
--- Comment #5 from davidxl at gcc dot gnu dot org 2009-03-24 21:25 --- (In reply to comment #3) It might be better to place the check after the loop (and put an assert in set_copy_of_val that triggers the copy may not happen). This sounds good. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39548
[Bug tree-optimization/39548] gcc ICE compiling code with option -fprofile-generate
--- Comment #6 from davidxl at gcc dot gnu dot org 2009-03-24 21:33 --- (In reply to comment #4) Btw, it shouldn't really happen that we are not allowed to copyprop PHI arguments. It hints at some inconsistency in the IL instead. This sounds good. David(In reply to comment #4) Btw, it shouldn't really happen that we are not allowed to copyprop PHI arguments. It hints at some inconsistency in the IL instead. Yes I suspect that too, but this is an independent issue. As long as the check is done in replace_uses_in (tree-ssa-propagate), it should be done in the copy chain computation -- at least it should be done in line 742 of tree-ssa-copy.c (copy_prop_visit_cond_stmt), which was my original fix. By the way, the check that fails in may_propagate_copy is -- which looks hairy. If you think it is ok, I can file a different bug to track this. else if (!MTAG_P (SSA_NAME_VAR (dest)) !MTAG_P (SSA_NAME_VAR (orig)) (DECL_NO_TBAA_P (SSA_NAME_VAR (dest)) != DECL_NO_TBAA_P (SSA_NAME_VAR (orig Thanks, David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39548
[Bug middle-end/38204] PRE for post dominating expressions
--- Comment #3 from davidxl at gcc dot gnu dot org 2008-11-22 00:35 --- (In reply to comment #2) (In reply to comment #0) For this function: int test (int a, int b, int c, int g) { int d, e; if (a) d = b * c; else d = b - c; e = b * c + g; return d + e; } the multiply expression is moved to both branches of the if, it would be better to move it before the if. Intel's compiler does that. Moving it before the if is a code size optimization that also happens to extend the lifetime of the multiply. So better is a relative term. As a side note: PRE is made aware of the impact of code size bloat and is -Os friendly. for instance, if multiple insertions are needed, the PRE won't happen with -Os. if (..) expr else if (..) ... else if (..) ... else ... expr While this is good, if hoisting opportunities exposed by PRE is materialized, this PRE should still be allowed under -Os. (-Os in gcc is not well tuned -- many optimizations are simply turned off in fear of code bloat without analysis -- the end result is often lost opportunities for code clean up -- end up with a slower and BIGGer binary). The hoisting increase tmp life time slightly, but it also adds more scheduling freedom as a good effect. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38204
[Bug rtl-optimization/36438] New: gcc ICE compiling code with mmx builtin
Compiling the following code with latest compiler, got ice: f.i: In function 'void foo(int __vector__*, int)': f.i:33: internal compiler error: in trunc_int_for_mode, at explow.c:55 Please submit a full bug report, with preprocessed source if appropriate. See http://gcc.gnu.org/bugs.html for instructions. // f.i: typedef unsigned short int16; typedef int __m64 __attribute__ ((__vector_size__ (8), __may_alias__)); typedef long long __v1di __attribute__ ((__vector_size__ (8))); extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_slli_si64 (__m64 __m, int __count) { return (__m64) __builtin_ia32_psllqi ((__v1di)__m, __count); } extern __inline __m64 __attribute__((__gnu_inline__, __always_inline__, __artificial__)) _mm_set_pi16 (short __w3, short __w2, short __w1, short __w0) { return (__m64) __builtin_ia32_vec_init_v4hi (__w0, __w1, __w2, __w3); } inline __m64 __attribute__((__always_inline__)) SetS16(int16 a, int16 b, int16 c, int16 d) { return _mm_set_pi16(d, c, b, a); } void foo(__m64* dest, int n) { __m64 mask = SetS16(0x00FF, 0xFF00, 0x, 0x00FF); for ( int i = 0 ; i n; ++i ) { mask = _mm_slli_si64(mask, 8); mask = _mm_slli_si64(mask, 8); *dest = mask; ++dest; } __builtin_ia32_emms (); } -- Summary: gcc ICE compiling code with mmx builtin Product: gcc Version: 4.4.0 Status: UNCONFIRMED Keywords: ice-on-valid-code Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: davidxl at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36438
[Bug rtl-optimization/36438] gcc ICE compiling code with mmx builtin
--- Comment #1 from davidxl at gcc dot gnu dot org 2008-06-05 06:41 --- cse1 (RTL) does some expression simplification on the fly such as t = x 4 r = t 4 == r = x 8 However for mmx shift operation, the mode (V1DI) for the const folding is illegal -- resulting in ICE. -- davidxl at gcc dot gnu dot org changed: What|Removed |Added CC||davidxl at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36438
[Bug rtl-optimization/36438] gcc ICE compiling code with mmx builtin
--- Comment #6 from davidxl at gcc dot gnu dot org 2008-06-05 17:37 --- (In reply to comment #5) Patch at http://gcc.gnu.org/ml/gcc-patches/2008-06/msg00268.html Thanks -- same as my local workaround. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36438
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
--- Comment #13 from davidxl at gcc dot gnu dot org 2008-06-04 16:48 --- (In reply to comment #12) Interesting things start to happen once you inline allocator functions as well. See PR29286 and PR33407 which we still don't handle 100% correct. I browsed through the two bugs -- it seems that compiler should get this right regardless -- local pointer analysis should detect the must aliasing and should overrule the type based aliasing decision when the placement new is inlined. If not inlined, compiler should know the exact semantics of placement new (return == arg), or treat it conservatively. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383
[Bug c++/23383] builtin array operator new is not marked with malloc attribute
--- Comment #15 from davidxl at gcc dot gnu dot org 2008-06-04 17:34 --- (In reply to comment #14) We do the exact opposite - type-based rules override points-to must-alias information (or really may-alias information). Also for the proposed scheme to work you need to guarantee that you always can compute correct points-to relations (I mean, if points-to information says pt_anything and if you then assume must-alias and thus a conflict then you simply disable TBAA completely). Right, in general, type alias rules should override field and flow insensitive pointer aliasing information as they really have very low confidence level (especially for pt_anything case which is just a baseless guess) -- but precise/trustworthy aliasing info should be checked before assertion based alias information and decide whether to proceed. For example: if (no_alias_according_to_conservative_pointer_info) return no_alias; if (no_alias_according_to_precise_pointer_info) return no_alias; if (must_alias or definitely_may_alias) return may/must_alias; (1) // now proceed with type based rules, etc. This is in theory. In practice, it can be tricky to tag the confidence level of aliasing info. David -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=23383