[Bug libstdc++/77356] New: regex error for a ECMAScript syntax string
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77356 Bug ID: 77356 Summary: regex error for a ECMAScript syntax string Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com Target Milestone: --- For the testcase 1.cxx: #include int main() { static const char* kNumericAnchor ="(\\$|usd)(usd|\\$|to|and|up to|[0-9,\\.\\-\\sk])+"; const std::regex re(kNumericAnchor); return 0; } ~/workarea/gcc-r239713/build/install/bin/g++ -std=c++11 -O0 1.cxx -Wl,-rpath=/usr/local/google/home/wmi/workarea/gcc-r239713/build/install/lib64 ./a.out terminate called after throwing an instance of 'std::regex_error' what(): Unexpected end of bracket expression. Aborted (core dumped) I have no problem to compile and run the testcase using libc++. For libstdc++, the exception is thrown because the second dash '-' is not in any range and it is not the start or end of bracket expression. According to the comment in _M_expression_term in src/libstdc++-v3/include/bits/regex_compiler.tcc this is not allowed in POSIX syntax but allowed in ECMAScript syntax. Since the input is ECMAScript syntax, libstdc++ shoudn't throw exception for it?
[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 --- Comment #13 from wmi at google dot com --- Use the extracted testcase vogt contributed. Here is some digging about why rtx_refs_may_alias_p returns noalias for the load and store: (gdb) c Continuing. Breakpoint 3, rtx_refs_may_alias_p (x=0x757fe768, mem=0x757fe708, tbaa_p=true) at ../../src/gcc/alias.c:385 385 if (!ao_ref_from_mem (&ref1, x) ** (gdb) p print_rtl_single(stderr, x) (mem/j:SI (reg/v/f:DI 1 %r1 [orig:64 ps ] [64]) [5 ps_8->f2+-1 S4 A32]) $1 = 1 (gdb) p print_rtl_single(stderr, mem) (mem/j:QI (reg/v/f:DI 2 %r2 [orig:64 ps ] [64]) [5 ps_8->f1+0 S1 A32]) ** rtx_refs_may_alias_p(x, mem, true) returns no_alias for "x" and "mem". >From RTL representation, x's starting address is ps_8->f2+-1, size is 4 //See [5 ps_8->f2+-1 S4 A32] mem's starting address is ps_8->f1+0, size is 1 // see [5 ps_8->f1+0 S1 A32] So x and mem are aliased with each other. ** (gdb) p ref1 $3 = {ref = 0x7568d9f0, base = 0x757d7c30, offset = 8, size = 24, max_size = 24, ref_alias_set = 5, base_alias_set = -1, volatile_p = false} (gdb) p ref2 $4 = {ref = 0x7568d9c0, base = 0x757d7f50, offset = 0, size = 8, max_size = 8, ref_alias_set = 5, base_alias_set = -1, volatile_p = false} (gdb) p debug_generic_expr(ref1.base) *ps_8 $6 = void (gdb) p debug_generic_expr(ref2.base) *ps_8 $7 = void ** rtx_refs_may_alias_p(x, mem, true) calls refs_may_alias_p_1(&ref1, &ref2, ...) as its helper func. For ref1 and ref2, they have the same base -- *ps_8, but they have non-overlapping accessing ranges -- ref1 from 8 to 8+24, ref2 from 0 to 8, so ref1 has no-alias with ref2. The major difference is between ref1 and x. ref1 is initialized using MEM_EXPR(x) (MEM_EXPR(x) is ps_8->f2). So ref1 has its offset to be 8 and its size to be 24. However, x has starting address to be ps_8->f2-1 and size to be 32 bits. Usually ref1's offset and size will be adjusted according to MEM_SIZE(x) and MEM_OFFSET(x). However, because of the if (...) clause below, ao_ref_from_mem returns true without adjusting ref1->offset and ref1->size. (gdb) p debug_generic_expr(MEM_EXPR(x)) ps_8->f2 (gdb) p MEM_OFFSET(x) $11 = -1 (gdb) p MEM_SIZE(x) $12 = 4 ao_ref_from_mem (ao_ref *ref, const_rtx mem) { tree expr = MEM_EXPR (mem); ... ao_ref_init (ref, expr); base = ao_ref_base (ref); ... /* If the base decl is a parameter we can have negative MEM_OFFSET in case of promoted subregs on bigendian targets. Trust the MEM_EXPR here. */ if (MEM_OFFSET (mem) < 0 && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size) return true; ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT; ref->size = MEM_SIZE (mem) * BITS_PER_UNIT; ... } I don't understand the code above well -- why can we trust MEM_EXPR instead of relying on MEM_OFFSET and MEM_SIZE? It seems not the case for the testcase here.
[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 --- Comment #12 from wmi at google dot com --- Yes, I agree it is a problem that memrefs_conflict_p doesn't take effect. But I am still wondering even if memrefs_conflict_p doesn't take effect, the alias oracle query in rtx_refs_may_alias_p should have returned may-alias for the load and store. Why rtx_refs_may_alias_p failed to do that?
[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 --- Comment #6 from wmi at google dot com --- (In reply to Dominik Vogt from comment #3) > I think the Rtl in comment 1 ist correct. Note that "i" is stored at > 0x.xx00 and "j" is stored at 0x.00xx. That is the > reason for the rather confusing mask in insn 9. Your test program compiles > and runs fine for me. I am not familiar with s390 assembly. please correct me if I am wrong: This is the assembly generated for my testcase: .globl _Z3fooP1A .type _Z3fooP1A, @function _Z3fooP1A: .LFB0: larl%r5,.L3 mvi 0(%r2),3// move 0x.0003 to 0(%r2) l %r1,.L4-.L3(%r5)// load 0xff00 to %r1 n %r1,0(%r2) // %r1 = %r1 & 0(%r2) = 0x. oill%r1,5 // %r1 = %r1 | 5 = 0x.0005 st %r1,0(%r2) // store 0x.0005 to 0(%r2) br %r14 .section.rodata .align 8 .L3: .L4: .long -16777216 .align 2 .previous According to the asm sequence above, the result of a should be: 0x.0005, but the correct result should be 0x.0305, right?
[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 --- Comment #2 from wmi at google dot com --- Another problem is found in true_dependence_1 in alias.c. true_mem_addr or true_x_addr got after calling get_addr may be used as inputs of memrefs_conflict_p. However memrefs_conflict_p expects to use VALUE type nodes as its inputs, so the values of the memory addresses can be comparable. Only find_base_term and base_alias_check should use true_mem_addr/true_x_addr in true_dependence_1. This problem is not a correctness issue, but may affect the effectiveness of dse/postreload...
[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443 --- Comment #1 from wmi at google dot com --- Seems the patch makes some problem exposed. For the testcase 1.cxx below: typedef struct A { unsigned i : 8; unsigned j : 24; } A; void foo(A *a) { a->i = 3; a->j = 5; } The rtl generated by s390x-ibm-linux-g++ seems wrong. ~/workarea/gcc-r227524/build/install/bin/s390x-ibm-linux-g++ -O2 -S 1.cxx -fdump-rtl-expand-details-blocks (note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) (insn 2 4 3 2 (set (reg/v/f:DI 60 [ a ]) (reg:DI 2 %r2 [ a ])) 4.cxx:6 -1 (nil)) (note 3 2 6 2 NOTE_INSN_FUNCTION_BEG) (insn 6 3 8 2 (set (mem/j:QI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->i+0 S1 A32]) (const_int 3 [0x3])) 4.cxx:7 -1 (nil)) (insn 8 6 9 2 (set (reg:SI 62) (mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32])) 4.cxx:8 -1 (nil)) (insn 9 8 10 2 (parallel [ (set (reg:SI 63) (and:SI (reg:SI 62) (const_int -16777216 [0xff00]))) (clobber (reg:CC 33 %cc)) ]) 4.cxx:8 -1 (nil)) (insn 10 9 11 2 (parallel [ (set (reg:SI 64) (ior:SI (reg:SI 63) (const_int 5 [0x5]))) (clobber (reg:CC 33 %cc)) ]) 4.cxx:8 -1 (nil)) (insn 11 10 0 2 (set (mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32]) (reg:SI 64)) 4.cxx:8 -1 (nil)) ;; succ: EXIT [100.0%] (FALLTHRU)
[Bug target/65474] sub-optimal code for __builtin_abs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474 --- Comment #3 from wmi at google dot com --- Thanks. You are right. I wrote a microbenchmark (attached), and tested it on different intel microarchitectures. westmere: 1.gcc.out:19.42 1.llvm.out: 19.32 sandybridge: 1.gcc.out:18.61 1.llvm.out: 19.16 ivybridge: 1.gcc.out:15.79 1.llvm.out: 15.87 On sandybridge, llvm's version was slower. On other microarchitectures, they were close to each other. So gcc's choose makes sense.
[Bug target/65474] sub-optimal code for __builtin_abs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474 --- Comment #2 from wmi at google dot com --- Created attachment 35069 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35069&action=edit microbench
[Bug middle-end/65474] New: sub-optimal code for __builtin_abs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474 Bug ID: 65474 Summary: sub-optimal code for __builtin_abs Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com int foo(int x) { return __builtin_abs(x); } ~/workarea/gcc-r221398/build/install/bin/gcc -O2 -S 1.c -o 1.gcc.s .cfi_startproc movl%edi, %edx movl%edi, %eax sarl$31, %edx xorl%edx, %eax subl%edx, %eax ret .cfi_endproc ~/workarea/llvm-r224097/build/bin/clang -O2 -S 1.c -o 1.llvm.s .cfi_startproc movl%edi, %eax negl%eax cmovll %edi, %eax retq .cfi_endproc
[Bug rtl-optimization/64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557 --- Comment #1 from wmi at google dot com --- The experimental patch is to call get_addr for VALUE of base before plus other constant, when creating mem_addr for dependence check and for store_info. bootstrap and regression on x86_64-linux-gnu are ok. Index: dse.c === --- dse.c(revision 219421) +++ dse.c(working copy) @@ -1564,6 +1564,7 @@ record_store (rtx body, bb_info_t bb_inf = rtx_group_vec[group_id]; mem_addr = group->canon_base_addr; } + mem_addr = get_addr (mem_addr); if (offset) mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); } @@ -2177,6 +2178,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t = rtx_group_vec[group_id]; mem_addr = group->canon_base_addr; } + mem_addr = get_addr (mem_addr); if (offset) mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset); }
[Bug rtl-optimization/64557] New: get_addr in true_dependence_1 cannot handle VALUE inside an expr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557 Bug ID: 64557 Summary: get_addr in true_dependence_1 cannot handle VALUE inside an expr Product: gcc Version: 4.9.3 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com We saw a bug in dse2 after porting the patch https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01209.html from gcc-4_9 to google-4_9 branch. From the analysis below, I think the problem exists but is hidden in trunk and gcc-4_9 too. I cannot extract a small testcase to show it independently without turning on some optimization in google-4_9, so I just described it here: We have such IR in a case: The IR before dse2: (insn/f 67 4 68 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0 S8 A8]) (reg/f:DI 6 bp)) contentads/adx/mixer/auction/candidate.cc:14 -1 (nil)) (insn/f 68 67 69 2 (set (reg/f:DI 6 bp) (reg/f:DI 7 sp)) contentads/adx/mixer/auction/candidate.cc:14 -1 (nil)) (insn/f 70 69 71 2 (parallel [ (set (reg/f:DI 7 sp) (plus:DI (reg/f:DI 7 sp) (const_int -24 [0xffe8]))) (clobber (reg:CC 17 flags)) (clobber (mem:BLK (scratch) [0 A8])) ]) contentads/adx/mixer/auction/candidate.cc:14 -1 (nil)) (note 71 70 2 2 NOTE_INSN_PROLOGUE_END) (insn 7 3 9 2 (set (mem/c:SI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507754]+0 S4 A128]) (const_int 0 [0])) ./ads/base/money.h:67 90 {*movsi_internal} (nil)) (insn 9 7 10 2 (set (mem/c:HI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507754]+0 S2 A128]) (const_int 21333 [0x5355])) ./ads/base/money.h:68 92 {*movhi_internal} (nil)) (insn 10 9 11 2 (set (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 2 [0x2])) [0 MEM[(void *)&D.3507754]+2 S1 A16]) (const_int 68 [0x44])) ./ads/base/money.h:68 93 {*movqi_internal} (nil)) (insn 11 10 12 2 (set (reg:SI 0 ax [orig:87 D.3507754 ] [87]) (mem/c:SI (reg/f:DI 7 sp) [0 D.3507754+0 S4 A128])) ./ads/base/money.h:302 90 {*movsi_internal} (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 20 frame) (const_int -16 [0xfff0])) [0 D.3507754+0 S4 A128]) (nil))) ... (insn 15 13 17 2 (set (mem/c:SI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507756]+0 S4 A128]) (const_int 0 [0])) ./ads/base/money.h:67 90 {*movsi_internal} (nil)) The IR after dse2: The store in insn 10 is deleted. The other part is the same as above. (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 2 [0x2])) in insn10 is regarded to have no alias with (mem/c:SI (reg/f:DI 7 sp) in insn11, which is wrong. This is because with the applied patch, get_addr is used to extract original addresses for x_addr and mem_addr before they are used to find_base_term and used in base_alias_check. See the description of x_addr and mem_addr below: x is (mem/c:SI (reg/f:DI 7 sp) x_addr before calling get_addr is: (value:DI 4:12939 @0x84355f8/0x84354a0) x_addr after calling get_addr is: (plus:DI (value:DI 3:8637 @0x84355e8/0x8435478) (const_int -24 [0xffe8])) x_addr_base is: (address:DI -4) mem is (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 2 [0x2])) mem_addr before calling get_addr is: (plus:DI (value:DI 4:12939 @0x84355f8/0x84354a0) (const_int 2 [0x2])) mem_addr after calling get_addr is: // Notice: get_addr cannot handle plus expr, so it returns the origin expr. (plus:DI (value:DI 4:12939 @0x84355f8/0x84354a0) (const_int 2 [0x2])) mem_addr_base is: (address:DI -1) // value:DI 4:12939 @0x84355f8/0x84354a0 corresponds to reg/f:DI 7 sp // value:DI 3:8637 @0x84355e8/0x8435478 corresponds to reg/f:DI 6 bp // address:DI -1 corresponds to reg/f:DI 7 sp // address:DI -4 corresponds to reg/f:DI 6 bp x_addr_base and mem_addr_base are different, and unique_base_value_p will return true for (address:DI -1) and (address:DI -4), so base_alias_check will return 0, which is wrong. I think the root cause of the problem is get_addr can only handle VALUE but cannot handle VALUE inside an expr, like:(plus:DI (value:DI 4:12939 @0x84355f8/0x84354a0) (const_int 2 [0x2])), so find_base_term cannot know x_addr and mem_addr actually have the same base.
[Bug tree-optimization/64072] New: wrong cgraph node profile count
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64072 Bug ID: 64072 Summary: wrong cgraph node profile count Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: davidxl at gcc dot gnu.org, hubicka at gcc dot gnu.org We have a program like this: A() {// hot func ... } B() { A();// very hot if (i) { A(); // very cold } } Both callsites of A will be inlined into B. In gcc func save_inline_function_body in inline_transform stage, A's first clone will be choosen and materialized. For our case, the clone node choosen corresponds to the cold callsite of A. cgraph_rebuild_references in tree_function_versioning will reset the cgraph node count of the choosen clone to the entry bb count of func A (A is hot). So the cgraph node count of the choosen clone becomes hot while its inline edge count is still cold. It breaks the assumption described here: https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html: for inline node, bb->count == edge->count == edge->callee->count For the patch committed in the thread above (it is listed below), cg_edge->callee->count is used for profile update to its inline instance, which leads to a hot BB in func B which is actually very cold. The wrong profile information causes performance regression in one of our internal benchmarks. Our internal workround is to change cg_edge->callee->count to MIN(cg_edge->callee->count, cg_edge->count). Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 210535) +++ gcc/tree-inline.c (working copy) @@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c function in any way before this point, as this CALL_EXPR may be a self-referential call; if we're calling ourselves, we need to duplicate our body before altering anything. */ - copy_body (id, bb->count, + copy_body (id, cg_edge->callee->count, GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE), bb, return_block, NULL);
[Bug ipa/63970] [4.9/5 Regression] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970 --- Comment #6 from wmi at google dot com --- The patch was committed to trunk at r217973.
[Bug ipa/63970] [4.9/5 Regression] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970 --- Comment #4 from wmi at google dot com --- (In reply to Jan Hubicka from comment #3) > Created attachment 34047 [details] > Patch > > Something like this (untested) may work Thanks! I tested your patch after minor change. It passed bootstrap and regression. It also solved the performance regression we saw in internal benchmarks. + if (origin_node && !origin_node->used_as_abstract_origin) +{ + origin_node->used_as_abstract_origin = true; + enqueue_node (origin_node, &first, &reachable); // enqueue_node moved here + gcc_assert (!origin_node->prev_sibling_clone); + gcc_assert (!origin_node->next_sibling_clone); + for (origin_node = origin_node->clones; origin_node; + origin_node = origin_node->next_sibling_clone) +if (origin_node->decl == DECL_ABSTRACT_ORIGIN (node->decl)) + origin_node->used_as_abstract_origin = true; +} Wei.
[Bug tree-optimization/63970] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970 --- Comment #1 from wmi at google dot com --- > I think we need to keep the functions but do not need to account for them in > the unit size if we otherwise could remove them > > Richard. But there is code in symbol_table::remove_unreachable_nodes: if (TREE_CODE (node->decl) == FUNCTION_DECL && DECL_ABSTRACT_ORIGIN (node->decl)) { struct cgraph_node *origin_node = cgraph_node::get_create (DECL_ABSTRACT_ORIGIN (node->decl)); origin_node->used_as_abstract_origin = true; enqueue_node (origin_node, &first, &reachable); } If we remove the check in can_remove_node_now_p_1, the original node will be removed or reused as clone node in ipa inline analysis, but it will be recreated in symbol_table::remove_unreachable_nodes after ipa inline analysis finishes, if only its clone nodes are reachable. So can we just remove the original node in inline analysis and let symbol_table::remove_unreachable_nodes to restore it after ipa inline analysis? Thanks, Wei.
[Bug tree-optimization/63970] New: gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970 Bug ID: 63970 Summary: gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: davidxl at google dot com, dehao at google dot com, hubicka at gcc dot gnu.org, rguenth at gcc dot gnu.org, tejohnson at google dot com We see an inline problem as below caused by r201408 (https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00027.html). hoo() { foo(); ... } foo { goo(); ... } foo is func splitted, so its body changes to foo { goo(); ... foo.part(); } and the used_as_abstract_origin of cgraph node of foo will be set to true after func splitting. In ipa-inline, when inlining foo into hoo, the original node of foo will not be reused as clone node because used_as_abstract_origin of cgraph node of foo is true and can_remove_node_now_p_1 will return false, so that a new clone node of foo will be created. This is the case in gcc-4_9. In gcc-4_8, the original node of foo will be reused as clone node. gcc-4_8 foo | goo gcc-4_9 foofoo_clone \ / goo Because of the difference of whether to create a new clone for foo, when inlining goo to foo, the overall growth of inlining all callsites of goo in gcc-4_8 will be less than gcc-4_9 (goo has two callsites in gcc-4_9 but only one in gcc-4_8). If we have many cases like this, gcc-4_8 will actually have more inline growth budget than gcc-4_9 and will inline more aggressively than gcc-4_9. I don't understand the exact usage of the check about node->used_as_abstract_origin in can_remove_node_now_p_1, but I feel puzzled about following two points: 1. https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00027.html said the patch was to ensure all abstract origin functions do have nodes attached. However, even if the node of origin function is reused as a clone node, a new clone node will be created in following code in symbol_table::remove_unreachable_nodes if only the node that needs abstract origin is reachable. if (TREE_CODE (node->decl) == FUNCTION_DECL && DECL_ABSTRACT_ORIGIN (node->decl)) { struct cgraph_node *origin_node = cgraph_node::get_create (DECL_ABSTRACT_ORIGIN (node->decl)); origin_node->used_as_abstract_origin = true; enqueue_node (origin_node, &first, &reachable); } 2. DECL_ABSTRACT_ORIGIN(decl) seems only useful for debug info of clone nodes. But now the check of used_as_abstract_origin affect inline decisions, which should be the same with or without keeping debug info.
[Bug rtl-optimization/63548] New: redundent reload in loop after removing regmove
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63548 Bug ID: 63548 Summary: redundent reload in loop after removing regmove Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com Created attachment 33730 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33730&action=edit testcase 1.c For program with many insns like "a = b + c", where operands "b" and "c" are both dead immediately after the add insn, the hardreg preference heuristic seems not perfect. Here is a testcase 1.c, For gcc after r204212, they generates two redundent reload insns caused by imperfect hardreg preference heuristic in IRA. ~/workarea/gcc-r214579/build/install/bin/gcc -O2 -S 1.c .L5: movl%ebx, %edi callgoo leal2(%rbx), %edi movl%eax, %r13d callgoo leal4(%rbx), %edi movl%eax, %r12d callgoo leal6(%rbx), %edi movl%eax, %ebp addl$1, %ebx callgoo movl%eax, %edx // redundent mov movl%r13d, %eax// redundent mov imull %r12d, %eax imull %ebp, %eax imull %edx, %eax addl%eax, total(%rip) cmpl%ebx, M(%rip) jg .L5 For old gcc with regmove, it happens to be better than hardreg preference heuristic and generates one redundent reload. ~/workarea/gcc-r199418/build/install/bin/gcc -O2 -S 1.c .L3: movl%ebx, %edi callgoo leal2(%rbx), %edi movl%eax, %r13d callgoo leal4(%rbx), %edi movl%eax, %r12d callgoo leal6(%rbx), %edi movl%eax, %ebp addl$1, %ebx callgoo movl%r13d, %edx// redundent mov imull %r12d, %edx imull %ebp, %edx imull %eax, %edx addl%edx, total(%rip) cmpl%ebx, M(%rip) jg .L3 llvm generates no redundent move insn. clang-r217862 -O2 -S 1.c .LBB0_2: movl%ebx, %edi callq goo movl%eax, %r14d leal2(%rbx), %edi callq goo movl%eax, %ebp leal4(%rbx), %edi callq goo movl%eax, %r15d leal6(%rbx), %edi callq goo imull %r14d, %ebp imull %r15d, %ebp imull %eax, %ebp addl%ebp, total(%rip) incl%ebx cmplM(%rip), %ebx jl .LBB0_2
[Bug rtl-optimization/63525] New: unnecessary reloads generated in loop
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63525 Bug ID: 63525 Summary: unnecessary reloads generated in loop Product: gcc Version: 5.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: vmakarov at gcc dot gnu.org Created attachment 33700 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33700&action=edit testcase 1.cxx For the testcase 1.cxx attached, trunk (r214579) generates an addpd with mem operand and one extra reload insn in kernel loop. For g++ before r204274, it generate less insns in the kernel loop. ~/workarea/gcc-r214579/build/install/bin/g++ -O2 -S 1.cxx -o 1.s kernel loop: .L3: pxor%xmm0, %xmm0 cvtsi2sd%eax, %xmm0 addl$1, %eax cmpl%edx, %eax unpcklpd%xmm0, %xmm0 addpd -24(%rsp), %xmm0 ===> mem operand used movaps %xmm0, -24(%rsp) ===> reload jne .L3 ~/workarea/gcc-r199418/build/install/bin/g++ -O2 -S 1.cxx -o 2.s kernel loop: .L3: xorpd %xmm1, %xmm1 cvtsi2sd%eax, %xmm1 addl$1, %eax unpcklpd%xmm1, %xmm1 addpd %xmm1, %xmm0 cmpl%edx, %eax jne .L3 The reload insns in trunk are generated because of following steps: With r204274, the IR after expand like this: Loop: ... (insn 15 14 16 5 (set (reg/v:V2DF 83 [ v ]) (plus:V2DF (reg/v:V2DF 83 [ v ]) (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 -1 (nil)) ... end Loop. (insn 23 22 24 7 (set (reg/v:TI 90 [ tmp ]) (subreg:TI (reg/v:V2DF 83 [ v ]) 0)) /usr/local/google/home/wmi/workarea/gcc-r212442/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.10.0/include/emmintrin.h:157 -1 (nil)) (insn 24 23 25 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2] ) [2 x+0 S8 A64]) (subreg:DF (reg/v:TI 90 [ tmp ]) 0)) 1.cxx:17 -1 (nil)) (insn 25 24 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2] ) [2 y+0 S8 A64]) (subreg:DF (reg/v:TI 90 [ tmp ]) 8)) 1.cxx:18 -1 (nil)) forward propagation will propagate reg 90 from insn 23 to insn 24 and insn 25, and remove subreg:TI, so we get the IR before IRA like this: Loop: ... (insn 15 14 16 4 (set (reg/v:V2DF 83 [ v ]) (plus:V2DF (reg/v:V2DF 83 [ v ]) (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 1263 {*addv2df3} (expr_list:REG_DEAD (reg:V2DF 92 [ D.5005 ]) (nil))) ... end Loop. (insn 24 22 25 5 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2] ) [2 x+0 S8 A64]) (subreg:DF (reg/v:V2DF 83 [ v ]) 0)) 1.cxx:17 128 {*movdf_internal} (nil)) (insn 25 24 0 5 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2] ) [2 y+0 S8 A64]) (subreg:DF (reg/v:V2DF 83 [ v ]) 8)) 1.cxx:18 128 {*movdf_internal} (expr_list:REG_DEAD (reg/v:V2DF 83 [ v ]) (nil))) ix86_cannot_change_mode_class doesn't allow such subreg: "subreg:DF (reg/v:V2DF 83 [ v ]) 8)" in insn 25, so reg 83 will be added in invalid_mode_changes by record_subregs_of_mode and will be allocated NO_REGS regclass. reg 83 has NO_REGS regclass while plus:V2DF requires the target operand to be xmm register in insn 15, so reload insns are needed. The kernel loop has low register pressure and it doesn't form a separate IRA region, so live range splitting on region boarder doesn't kick in here. Without r204274, IR after expand is like this: Loop: ... (insn 15 14 16 5 (set (reg/v:V2DF 61 [ v ]) (plus:V2DF (reg/v:V2DF 61 [ v ]) (reg:V2DF 68 [ D.4966 ]))) 1.cxx:14 -1 (nil)) ... End Loop. (insn 25 24 26 7 (set (subreg:V2DF (reg/v:TI 66 [ tmp ]) 0) (reg/v:V2DF 61 [ v ])) /usr/local/google/home/wmi/workarea/gcc-r199418/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/include/emmintrin.h:147 -1 (nil)) (insn 26 25 27 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2] ) [2 x+0 S8 A64]) (subreg:DF (reg/v:TI 66 [ tmp ]) 0)) 1.cxx:17 -1 (nil)) (insn 27 26 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2] ) [2 y+0 S8 A64]) (subreg:DF (reg/v:TI 66 [ tmp ]) 8)) 1.cxx:18 -1 (nil)) Because the subreg is on the left handside of insn 25, it is impossible for forward propagation to merge insn 25 to insn 26 and insn 27. reg 61 will not have reference like this: "subreg:DF (reg/v:V2DF 61 [ v ]) 8)", so it gets SSE regclass and will not introduce extra reload insns in kernel loop. r204274 just enables more forward propagations and exposes the problem here.
[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 --- Comment #6 from wmi at google dot com --- (In reply to davidxl from comment #5) > (In reply to wmi from comment #4) > > Can we move the pure/const resetting loop to an earlier place: inside > > branch_prob , after instrument_edges and before gsi_commit_edge_inserts > > (where stmt_ends_bb_p is checked), so that gsi_commit_edge_inserts() which > > changes cfg could take reset const/pure flags into consideration? > > Sounds plausible. Have you tried it? > > David I just tried but found it was not very easy. FOR_EACH_DEFINED_FUNCTION (node) { execute_fixup_cfg() and cleanup_tree_cfg() branch_prob() } For the above loop, branch_prob is called one by one for each defined func. Because a func could possibly call any other funcs on the cgraph, we need to reset the const/pure flags for every defined func before any branch_prob() is called, so we cannot put the const/pure reset code inside branch_prob(). We also cannot move the const/pure reset loop before the branch_prob() loop, because execute_fixup_cfg will use const/pure flags to generate different cfg. If we put the const/pure reset code before the branch_prob() loop, the const/pure reset code should only be executed in intrumentation phase, not in annotation phase, so that we may get different cfg between intrumentation and annotation. Wei.
[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 --- Comment #4 from wmi at google dot com --- Can we move the pure/const resetting loop to an earlier place: inside branch_prob , after instrument_edges and before gsi_commit_edge_inserts (where stmt_ends_bb_p is checked), so that gsi_commit_edge_inserts() which changes cfg could take reset const/pure flags into consideration?
[Bug tree-optimization/61776] New: ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776 Bug ID: 61776 Summary: ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: rguenth at gcc dot gnu.org Host: x86_64-linux-gnu Target: x86_64-linux-gnu Build: x86_64-linux-gnu Created attachment 33106 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33106&action=edit testcase 1.c ~/workarea/gcc-r211604/build/install/bin/gcc -O2 -S -fprofile-generate 1.c 1.c: In function ‘foo’: 1.c:24:1: error: control flow in the middle of basic block 3 } ^ 1.c:24:1: error: control flow in the middle of basic block 3 1.c:24:1: error: control flow in the middle of basic block 3 1.c:24:1: internal compiler error: verify_flow_info failed 0x71f8c2 verify_flow_info() ../../src/gcc/cfghooks.c:260 0xbdaf1b cleanup_tree_cfg_noloop ../../src/gcc/tree-cfgcleanup.c:737 0xbdafec cleanup_tree_cfg() ../../src/gcc/tree-cfgcleanup.c:786 0xc674b8 tree_profiling ../../src/gcc/tree-profile.c:652 0xc6754a execute ../../src/gcc/tree-profile.c:691 The cause of the problem is: Before edge profiling instrumentation, goo in 1.c is regarded as a const function since it has no side-effect, so during instrumentation call goo is not regarded as a bb ending stmt, and some instrumentation code is inserted after call goo in the same BB -- call goo stmt is in the middle of a BB now. After edge profiling instrumentation, goo body now contains instrumentation code, and goo's const flag is reset to false because now it has side-effect. Since then call goo is regarded as a bb ending stmt, which is inconsistent with the fact that call goo is in the middle of a BB. verify_flow_info() after that fails and Error message "control flow in the middle of basic block" is reported. Google ref b/15936428
[Bug tree-optimization/61493] [4.10 Regression] Bug exposed by speculative devirtualizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493 wmi at google dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #4 from wmi at google dot com --- This is a source problem. So close the bug. void foo(FST *fst) { const PAIR &final_pair = fst->Final().getpair(); if (final_pair == global_pair) __builtin_printf("equal\n"); else __builtin_printf("not equal\n"); return; } The life time of the temporary object generated by fst->Final() will not be extended after the statement generating it, according to the following rule : a temporary bound to a return value of a function in a return statement is not extended: it is destroyed immediately at the end of the return expression. Such function always returns a dangling reference. (http://en.cppreference.com/w/cpp/language/reference_initialization) So it is meaningless to access final_pair afterwards.
[Bug tree-optimization/61493] [4.10 Regression] Bug exposed by speculative devirtualizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493 --- Comment #3 from wmi at google dot com --- Fix a typo in the first post. $~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx $./a.out not equal $~/workarea/gcc-r211604/build/install/bin/g++ -O0 1.cxx $./a.out equal $~/workarea/gcc-r211604/build/install/bin/g++ -O2 -fno-devirtualize-speculatively 1.cxx $./a.out equal
[Bug tree-optimization/61493] Bug exposed by speculative devirtualizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493 --- Comment #1 from wmi at google dot com --- Created attachment 32931 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32931&action=edit testcase
[Bug tree-optimization/61493] New: Bug exposed by speculative devirtualizing
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493 Bug ID: 61493 Summary: Bug exposed by speculative devirtualizing Product: gcc Version: 4.10.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com 1.cxx is attached. $~/workarea/gcc-r211604/build/install/bin/g++ -v Using built-in specs. COLLECT_GCC=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install/bin/g++ COLLECT_LTO_WRAPPER=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install/libexec/gcc/x86_64-unknown-linux-gnu/4.10.0/lto-wrapper Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --enable-languages=c,c++ --disable-bootstrap --prefix=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install Thread model: posix gcc version 4.10.0 20140613 (experimental) (GCC) $~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx $./a.out not equal $~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx $./a.out equal $~/workarea/gcc-r211604/build/install/bin/g++ -O2 -fno-devirtualize-speculatively 1.cxx $./a.out equal Google ref b/15521306
[Bug rtl-optimization/60738] New: A missing opportunity about process_single_reg_class_operands
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60738 Bug ID: 60738 Summary: A missing opportunity about process_single_reg_class_operands Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com Testcase 1.c: int a, b, c, d, e, cond; void foo() { int r1, r2, r3; r1 = b; r2 = d; if (__builtin_expect(cond > 3, 0)) { e = e * 5; c = a << r1; } c = c << r2; __builtin_printf("r1 + r2 = %d\n", r1 + r2); } ~/workarea/gcc-r208410/build/install/bin/gcc -O2 -S 1.c foo: .LFB0: .cfi_startproc cmpl$3, cond(%rip) movlb(%rip), %esi movld(%rip), %eax jg .L2 movlc(%rip), %edx .L3: movl%eax, %ecx // r2 gets assigned %eax. This is reload for insn1. addl%eax, %esi movl$.LC0, %edi sall%cl, %edx // insn1. Its constraint requires r2 in %ecx xorl%eax, %eax movl%edx, c(%rip) jmp printf .p2align 4,,10 .p2align 3 .L2: movle(%rip), %edx movl%esi, %ecx // r1 gets assigned %esi. This is reload for insn2. leal(%rdx,%rdx,4), %edx movl%edx, e(%rip) movla(%rip), %edx sall%cl, %edx // insn2. Its constraint requires r1 in %ecx jmp .L3 .cfi_endproc Because the bb starting from L2 is relatively cold, it is better to generate the code below: foo: .LFB0: .cfi_startproc cmpl$3, cond(%rip) movlb(%rip), %esi movld(%rip), %ecx jg .L2 movlc(%rip), %eax .L3: sall%cl, %eax // r2 gets assigned %ecx. no reload is needed. addl%ecx, %esi movl$.LC0, %edi movl%eax, c(%rip) xorl%eax, %eax jmp printf .p2align 4,,10 .p2align 3 .L2: movle(%rip), %eax movl%ecx, %edx // r2's live range is splitted here. This is the start of the splitted live range. movl%esi, %ecx // r1 gets assigned %esi, this is reload for insn2. leal(%rax,%rax,4), %eax movl%eax, e(%rip) movla(%rip), %eax sall%cl, %eax// insn2. constraint of insn2 requires r1 in %ecx movl%edx, %ecx // r2's live range is splitted here. This is the end of the splitted live range. jmp .L3 .cfi_endproc Now there is less code in the hotpath (The bb starting from .L3). r1 and r2 used in sall insns need CX_REG class which is single_reg_operand_class in IRA. Existing logic in process_single_reg_class_operands in ira-lives.c doesn't allow %ecx to being assigned to r1 or r2. May it need improvement here?
[Bug tree-optimization/60206] IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 --- Comment #7 from wmi at google dot com --- After looking into the problem more, I found IVOPT may not be the root cause. Even if IVOPT create a memory operand using two registers, if only the following optimizations doesn't propagate the memory operand to an asm_operand, the problem will not happen. So I created another smallcase 2.c for which gcc at the head of trunk will report the same error. -fno-ivopts will not help here. gcc -v Target: x86_64-unknown-linux-gnu Configured with: ../src/configure --prefix=/usr/local/google/home/wmi/workarea/gcc-r208410/build/install Thread model: posix gcc version 4.9.0 20140307 (experimental) (GCC) gcc -O2 -fno-omit-frame-pointer -m32 -S 2.c 2.c: In function ‘foo’: 2.c:25:1: error: ‘asm’ operand has impossible constraints __asm__ ( ^ The problem will disappear after I use -fno-tree-ter and -fdisable-rtl-combine. These two phases could propagate a memory reference using a register into an asm operand with constraint "g", which make the registers used in asm stmt increase. For TER, TER of loads into input arguments is allowed. For combine, insn_invalid_p() will only check whether an asm operand will satisfy its constraint. However, neither TER nor combine check whether the propagation could make the registers in asm stmt exceed available register number.
[Bug tree-optimization/60206] IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 --- Comment #6 from wmi at google dot com --- Created attachment 32328 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32328&action=edit 2.c
[Bug tree-optimization/60206] IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 --- Comment #4 from wmi at google dot com --- > On Fri, 14 Feb 2014, pinskia at gcc dot gnu.org wrote: > > > I think the real issue __FP_FRAC_SUB_4 needs to be fixed not to use > > inline-asm > > but normal C code. The normal C code should be able to produce as good as > > the > > inline-asm code now too. > > Does GCC do a good job of detecting add-with-carry and > subtract-with-borrow patterns (i.e. detecting the comparison that > corresponds to the carry flag and its use in a subsequent operation)? I remember at least the expansion of builtin_strlen could generate sub-with-borrow and it works well, so I think rtl passes could handle add-with-carry/subtract-with-borrow.
[Bug tree-optimization/60206] IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 --- Comment #2 from wmi at google dot com --- This is a way to fix the problem. libgcc/soft-fp/op-4.h has provided a C version of __FP_FRAC_SUB_4, but now it is overrided by the inline asm version in config/i386/32/sfp-machine.h. But the inline asm looks legal right? Isn't it compiler's responsiblity to keep the inline asm constraints always satisfiable?
[Bug tree-optimization/60206] New: IVOPT has no idea of inline asm
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206 Bug ID: 60206 Summary: IVOPT has no idea of inline asm Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com CC: rguenth at gcc dot gnu.org, shenhan at google dot com Host: i386 Target: i386 Created attachment 32141 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32141&action=edit Testcase This bug is found in google branch but I think the same problem also exists on trunk (but not exposed). For the testcase 1.c attached (1.c is extracted from libgcc/soft-fp/divtf3.c), use trunk compiler gcc-r202164 (Target: x86_64-unknown-linux-gnu) + the patch r204497 could expose the problem. The command: gcc -v -O2 -fno-omit-frame-pointer -fpic -c -S -m32 1.c The error: ./1.c: In function ‘__divtf3’: ./1.c:64:1194: error: ‘asm’ operand has impossible constraints The inline asm in error message is as follow: do { __asm__ ( "sub{l} {%11,%3|%3,%11}\n\t" "sbb{l} {%9,%2|%2,%9}\n\t" "sbb{l} {%7,%1|%1,%7}\n\t" "sbb{l} {%5,%0|%0,%5}" : "=r" ((USItype) (A_f[3])), "=&r" ((USItype) (A_f[2])), "=&r" ((USItype) (A_f[1])), "=&r" ((USItype) (A_f[0])) : "0" ((USItype) (B_f[2])), "g" ((USItype) (A_f[2])), "1" ((USItype) (B_f[1])), "g" ((USItype) (A_f[1])), "2" ((USItype) (B_f[0])), "g" ((USItype) (A_f[0])), "3" ((USItype) (0)), "g" ((USItype) (_n_f[_i]))); } while () Because -fno-omit-frame-pointer is turned on and the command line uses -fpic, there are only 5 registers for register allocation. Before IVOPT, %0, %1, %2, %3 require 4 registers. The index variable i of _n_f[_i] requires another register. So 5 registers are used up here. After IVOPT, MEM reference _n_f[_i] is converted to MEM[base: _874, index: ivtmp.22_821, offset: 0B]. base and index require 2 registers, Now 6 registers are required, so LRA cannot find enough registers to allocate. trunk compiler doesn't expose the problem because of patch r202165. With patch r202165, IVOPT doesn't change _n_f[_i] in inline asm above. But it just hided the problem. Should IVOPT care about the constraints in inline-asm and restrict its optimization in some case?
[Bug regression/58985] [4.9 Regression]: gcc.dg/pr57518.c scan-rtl-dump-not ira REG_EQUIV...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58985 --- Comment #9 from wmi at google dot com --- Backported r200720 to gcc 4.8 branch at r204660.
[Bug regression/58985] [4.9 Regression]: gcc.dg/pr57518.c scan-rtl-dump-not ira REG_EQUIV...
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58985 --- Comment #4 from wmi at google dot com --- This is the testcase problem. For cris-axis-elf target, gcc doesn't use subreg of reg 31 for the above testcase, so it is ok to generate REG_EQUIV note for reg 31. I will send out a patch for it soon. Thanks for pointing out the problem. Regards, Wei Mi.
[Bug rtl-optimization/57878] Incorrect code: live register clobbered in split2
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57878 wmi at google dot com changed: What|Removed |Added CC||wmi at google dot com --- Comment #3 from wmi at google dot com --- Seems problem is at deciding the priority of assign hardreg for reload pseudos .i.e the func reload_pseudo_compare_func. This is the trace of 2nd iteration of reload pseudo assignments in r.ii.209r.reload: 2nd iter for reload pseudo assignments: Reload r196 assignment failure Reload r199 assignment failure Reload r204 assignment failure Reload r204 assignment failure Spill reload r194(hr=1, freq=1426) Spill reload r195(hr=5, freq=1426) Spill reload r197(hr=1, freq=1426) Spill reload r198(hr=5, freq=1426) Spill reload r202(hr=1, freq=1426) Spill reload r203(hr=5, freq=1426) Assigning to 194 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)... Assign 1 to reload r194 (freq=1426) Assigning to 197 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)... Assign 1 to reload r197 (freq=1426) Hard reg 1 is preferable by r222 with profit 3029 Hard reg 2 is preferable by r222 with profit 1425 Assigning to 202 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190, tfreq=4991)... Assign 1 to reload r202 (freq=1426) Assigning to 195 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)... Assign 5 to reload r195 (freq=1426) Assigning to 198 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)... Assign 5 to reload r198 (freq=1426) Assigning to 203 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191, tfreq=4278)... Assign 5 to reload r203 (freq=1426) Assigning to 196 (cl=GENERAL_REGS, orig=196, freq=1426, tfirst=196, tfreq=1426)... Trying 2: spill 225(freq=1426) Assigning to 199 (cl=GENERAL_REGS, orig=199, freq=1426, tfirst=199, tfreq=1426)... Trying 2: spill 225(freq=1426) Assigning to 204 (cl=GENERAL_REGS, orig=204, freq=1426, tfirst=204, tfreq=1426)... Trying 2: spill 216(freq=2139) Assign 0 to reload r196 (freq=1426) Assign 0 to reload r199 (freq=1426) Assign 0 to reload r204 (freq=1426) Reassigning non-reload pseudos Here is the dump after lra_constraints. These are the insns related with r194, r195, r196: (insn 200 120 201 6 (set (reg/f:SI 194 [orig:138 D.3281 ] [138]) (reg/f:SI 138 [ D.3281 ])) 1.ii:197 89 {*movsi_internal} (nil)) (insn 201 200 121 6 (set (reg/f:SI 195 [orig:140 D.3282 ] [140]) (reg/f:SI 140 [ D.3282 ])) 1.ii:197 89 {*movsi_internal} (nil)) (insn 121 201 202 6 (set (reg:DI 196) (mem:DI (plus:SI (plus:SI (reg/f:SI 99 [ D.3281 ]) (reg/f:SI 126 [ D.3282 ])) (const_int 8 [0x8])) [10 MEM[base: _1, index: _44, offset: 8]+0 S8 A64])) 1.ii:197 88 {*movdi_internal} (expr_list:REG_DEAD (reg:DI 131 [ D.3287 ]) (nil))) (insn 202 121 122 6 (set (mem:DI (plus:SI (plus:SI (reg/f:SI 194 [orig:138 D.3281 ] [138]) (reg/f:SI 195 [orig:140 D.3282 ] [140])) (const_int 8 [0x8])) [10 MEM[base: _75, index: _77, offset: 8B]+0 S8 A64]) (reg:DI 196)) 1.ii:197 88 {*movdi_internal} (nil)) >From trace, r194 r195 are assigned hardreg before r196. Usually reload pseudos will not conflict with each other except a special case: they are in the same insn. r194,r195 and r196 just belong to such case. They are all in the insn 202. In addition, r194, r195 and r196 are all reload pseudos, so once r194 and r195 are allocated, they will not be spilled for assigning hardreg for r196. In this case, r194 and r195 get hardreg assigned before r196. So after r194 and r195 are assigned hardreg, r196 cannot find available hardreg because it has bigger mode and require a consecutive hardreg pair. All pseudos which cannot find hardreg after two iterations will be given ax simply, and report error. Trunk report error but 4.8.1 doesn't report it because lra_assert is only enabled in trunk but not in 4.8.1. A possible fix is to give bigger mode pseudos higher priority in lra assignment. Index: lra-assigns.c === --- lra-assigns.c(revision 200944) +++ lra-assigns.c(working copy) @@ -194,15 +194,15 @@ reload_pseudo_compare_func (const void * if ((diff = (ira_class_hard_regs_num[cl1] - ira_class_hard_regs_num[cl2])) != 0) return diff; - if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq - - regno_assign_info[regno_assign_info[r1].first].freq)) != 0) -return diff; /* Allocate bigger pseudos first to avoid regis
[Bug rtl-optimization/57518] [4.9 Regression] Redundant insn generated in LRA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 wmi at google dot com changed: What|Removed |Added CC||rguenth at gcc dot gnu.org --- Comment #3 from wmi at google dot com --- oh, sorry to make it misleading, but the 4.8.0 below is an experimental version (see its date is 20120613, at that time LRA has not been merged): Target: x86_64-linux-gnu gcc version 4.8.0 20120613 (experimental) (GCC) gcc -O2 -S 1.c .cfi_startproc movzblip+2(%rip), %eax andl$3, %eax movl%eax, total(%rip) ret .cfi_endproc I just verified using 4.8.0 and 4.8.1 releases, the problem was there for both.
[Bug rtl-optimization/57518] [4.8/4.9 Regression] Redundant insn generated in LRA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 --- Comment #1 from wmi at google dot com --- post a candidate patch here: http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00748.html
[Bug rtl-optimization/57459] [4.8/4.9 Regression] LRA inheritance bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57459 --- Comment #6 from wmi at google dot com --- continue the analysis in the first post, for the smallcase 1.c, the IR after calling inherit_in_ebb in lra_inheritance for bb12 is: (insn 289 47 48 12 (set (reg:SI 116 [79]) (reg:SI 121 [79])) 1.c:16 85 {*movsi_internal} (nil)) (insn 48 289 290 12 (set (reg:SI 116 [79]) (if_then_else:SI (eq (reg:CCNO 17 flags) (const_int 0 [0])) (reg:SI 122 [83]) (reg:SI 116 [79]))) 1.c:16 923 {*movsicc_noc} (expr_list:REG_DEAD (reg:SI 122 [83]) (nil))) (insn 290 48 294 12 (set (reg:SI 120 [79]) (reg:SI 116 [79])) 1.c:16 85 {*movsi_internal} (nil)) (insn 294 290 49 12 (set (reg:SI 79) (reg:SI 120 [79])) 1.c:16 85 {*movsi_internal} (nil)) .. (insn 292 50 51 12 (set (reg:QI 118) (subreg:QI (reg:SI 120 [79]) 0)) 1.c:16 87 {*movqi_internal} (nil)) (insn 51 292 52 12 (parallel [ (set (reg:CC 17 flags) (unspec:CC [ (subreg:QI (reg:SI 79) 0) (reg:QI 118) ] UNSPEC_ADD_CARRY)) (set (subreg:QI (reg:SI 79) 0) (plus:QI (subreg:QI (reg:SI 79) 0)It is still correct (reg:QI 118))) ]) 1.c:16 259 {addqi3_cc} (expr_list:REG_UNUSED (reg:SI 79) (nil))) The IR is still correct after this step. However, after update_ebb_live_info (called after inherit_in_ebb), insn 294 is removed. Then reg 79 cannot get updated value and it doesn't equal to reg 118 anymore. IR is wrong after this step. insn 294 is removed in update_ebb_live_info because the reg type of reg 79 is OP_INOUT but update_ebb_live_info only marks OP_IN type reg as live_regs. So the fix is: Index: gcc/lra-constraints.c === --- gcc/lra-constraints.c(revision 199752) +++ gcc/lra-constraints.c(working copy) @@ -4545,7 +4545,7 @@ update_ebb_live_info (rtx head, rtx tail bitmap_clear_bit (&live_regs, reg->regno); /* Mark each used value as live. */ for (reg = curr_id->regs; reg != NULL; reg = reg->next) -if (reg->type == OP_IN +if ((reg->type == OP_IN || reg->type == OP_INOUT) && bitmap_bit_p (&check_only_regs, reg->regno)) bitmap_set_bit (&live_regs, reg->regno); /* It is quite important to remove dead move insns because it Bootstrapped and tested on x86_64-linux.
[Bug rtl-optimization/57518] New: Redundent insn generated in LRA
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518 Bug ID: 57518 Summary: Redundent insn generated in LRA Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com Testcase: char ip[10]; int total, total1; void foo() { int t; t = ip[2]; total = t & 0x3; } Target: x86_64-linux-gnu gcc version 4.9.0 20130529 (experimental) (GCC) ~/workarea/gcc-r199418/build/install/bin/gcc -O2 -S 1.c .cfi_startproc movzbl ip+2(%rip), %eax movb%al, -16(%rsp) ==> redundent movl-16(%rsp), %eax==> redundent andl$3, %eax movl%eax, total(%rip) ret .cfi_endproc Target: x86_64-linux-gnu gcc version 4.8.0 20120613 (experimental) (GCC) gcc -O2 -S 1.c .cfi_startproc movzblip+2(%rip), %eax andl$3, %eax movl%eax, total(%rip) ret .cfi_endproc IR before LRA: (insn 12 7 8 2 (set (reg:QI 64 [ ip+2 ]) (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") ) (const_int 2 [0x2]))) [0 ip+2 S1 A8])) 1.c:9 87 {*movqi_internal} (expr_list:REG_EQUIV (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") ) (const_int 2 [0x2]))) [0 ip+2 S1 A8]) (nil))) (insn 8 12 9 2 (parallel [ (set (reg:SI 65 [ D.1731 ]) (and:SI (subreg:SI (reg:QI 64 [ ip+2 ]) 0) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) 1.c:9 387 {*andsi_1} (expr_list:REG_DEAD (reg:QI 64 [ ip+2 ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUIV (mem/c:SI (symbol_ref:DI ("total") ) [2 total+0 S4 A32]) (nil) IR after LRA: (insn 12 7 14 2 (set (reg:QI 0 ax [orig:64 ip+2 ] [64]) (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") ) (const_int 2 [0x2]))) [0 ip+2 S1 A8])) 1.c:9 87 {*movqi_internal} (expr_list:REG_EQUIV (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") ) (const_int 2 [0x2]))) [0 ip+2 S1 A8]) (nil))) (insn 14 12 15 2 (set (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int -16 [0xfff0])) [3 %sfp+-16 S1 A64]) (reg:QI 0 ax [orig:64 ip+2 ] [64])) 1.c:9 87 {*movqi_internal} (expr_list:REG_DEAD (reg:QI 0 ax [orig:64 ip+2 ] [64]) (nil))) (insn 15 14 8 2 (set (reg:SI 0 ax [orig:65 D.1731 ] [65]) (mem/c:SI (plus:DI (reg/f:DI 7 sp) (const_int -16 [0xfff0])) [3 %sfp+-16 S4 A64])) 1.c:9 85 {*movsi_internal} (nil)) (insn 8 15 16 2 (parallel [ (set (reg:SI 0 ax [orig:65 D.1731 ] [65]) (and:SI (reg:SI 0 ax [orig:65 D.1731 ] [65]) (const_int 3 [0x3]))) (clobber (reg:CC 17 flags)) ]) 1.c:9 387 {*andsi_1} (expr_list:REG_EQUIV (mem/c:SI (symbol_ref:DI ("total") ) [2 total+0 S4 A32]) (nil))) IRA Trace: Pass 0 for finding pseudo/allocno costs a0 (r65,l0) best GENERAL_REGS, allocno GENERAL_REGS a1 (r64,l0) best NO_REGS, allocno NO_REGS a1's rclass are all NO_REGS because it has REG_EQUIV note (equivalent to mem ip+2) Because reg 64 is marked as equivalent to mem ip+2, insn 12 is expected to be deleted and reg 64 in insn 8 replaced by mem ip+2. In LRA constraints, insn 12 is not deleted because the subreg op in insn 8 (see lra-constraints.c:3662 r199418). In addition, reg 64's rclass is NO_REGS, so redundent spills are inserted. The mode size check (lra-constraints.c:3662 r199418) needs to be considered in update_equiv_regs in IRA, in order not to mark the reg 64 equivalent with mem ip + 2 in this case.
[Bug rtl-optimization/57459] New: LRA inheritance bug
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57459 Bug ID: 57459 Summary: LRA inheritance bug Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: wmi at google dot com Created attachment 30218 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30218&action=edit small testcase To reproduce the bug on using 1.c attached: Target: x86_64-unknown-linux-gnu gcc version 4.9.0 20130529 (experimental) (GCC) $~/workarea/gcc-r199418/build/install/bin/gcc -fno-inline -O2 -minline-all-stringops -fno-omit-frame-pointer -m32 1.c $./a.out len = 9 $~/workarea/gcc-r199418/build/install/bin/gcc -O2 -m32 1.c $./a.out len = 8 The expanded __builtin_strlen is wrong: 80484c3: 8b 07 mov(%edi),%eax 80484c5: 83 c7 04add$0x4,%edi 80484c8: 8d 90 ff fe fe fe lea-0x1010101(%eax),%edx 80484ce: f7 d0 not%eax 80484d0: 21 c2 and%eax,%edx 80484d2: 81 e2 80 80 80 80 and$0x80808080,%edx 80484d8: 74 e9 je 80484c3 80484da: 89 d0 mov%edx,%eax 80484dc: 8b 55 ecmov-0x14(%ebp),%edx 80484df: 89 55 e8mov%edx,-0x18(%ebp) 80484e2: 89 c2 mov%eax,%edx 80484e4: c1 e8 10shr$0x10,%eax 80484e7: f7 c2 80 80 00 00 test $0x8080,%edx 80484ed: 89 45 ecmov%eax,-0x14(%ebp) 80484f0: 89 d0 mov%edx,%eax 80484f2: 8d 57 02lea0x2(%edi),%edx 80484f5: 0f 44 facmove %edx,%edi 80484f8: 8b 55 e8mov-0x18(%ebp),%edx 80484fb: 0f 44 45 ec cmove -0x14(%ebp),%eax 80484ff: 00 45 e4add%al,-0x1c(%ebp) > Wrong here, the correct insn is: add %al, %al. %al is either 0x80 or 0x0 here. The insn "add %al, %al" is used to check whether %al is 0x80, and it will produce carry bit for the following sbb. (The lowest 0x80 in %eax shows where the first '\0' is in the input string) 8048502: 83 df 03sbb$0x3,%edi 8048505: 8b 45 08mov0x8(%ebp),%eax 8048508: 2b 7d 08sub0x8(%ebp),%edi The IR after IRA and before LRA: (insn 51 50 52 12 (parallel [ (set (reg:CC 17 flags) (unspec:CC [ (subreg:QI (reg:SI 79) 0) (subreg:QI (re(insn 292 50 51 12 (set (reg:QI 118) (subreg:QI (reg:SI 79) 0)) 1.c:16 87 {*movqi_internal} (nil)) (insn 51 292 52 12 (parallel [ (set (reg:CC 17 flags) (unspec:CC [ (subreg:QI (reg:SI 79) 0) (reg:QI 118) ] UNSPEC_ADD_CARRY)) (set (subreg:QI (reg:SI 79) 0) (plus:QI (subreg:QI (reg:SI 79) 0) (reg:QI 118))) ]) 1.c:16 259 {addqi3_cc} (expr_list:REG_UNUSED (reg:SI 79) (nil)))g:SI 79) 0) ] UNSPEC_ADD_CARRY)) (set (subreg:QI (reg:SI 79) 0) (plus:QI (subreg:QI (reg:SI 79) 0) (subreg:QI (reg:SI 79) 0))) ]) 1.c:16 259 {addqi3_cc} (expr_list:REG_UNUSED (reg:SI 79) (nil))) The IR is correct till now. insn 51 will produce the problematic "add %al,-0x1c(%ebp)" finally. All the input and output operands of insn 51 are reg79. The reg79 gets no hardreg in IRA phase. The IR after lra_constraints: (insn 292 50 51 12 (set (reg:QI 118) (subreg:QI (reg:SI 79) 0)) 1.c:16 87 {*movqi_internal} (nil)) (insn 51 292 52 12 (parallel [ (set (reg:CC 17 flags) (unspec:CC [ (subreg:QI (reg:SI 79) 0) (reg:QI 118) ] UNSPEC_ADD_CARRY)) (set (subreg:QI (reg:SI 79) 0) (plus:QI (subreg:QI (reg:SI 79) 0) (reg:QI 118))) ]) 1.c:16 259 {addqi3_cc} (expr_list:REG_UNUSED (reg:SI 79) (nil))) The IR is still correct. The choosen constraints of insn 51 are "rm" "0" "rn". reg79 get no hardreg in IRA, so the output operand and the first input operand satisfy the constraint (staying in mem), but the second input operand should stay in register. That is why reg118 is introduced and insn 292 is inserted. The IR after lra_inheritance: (insn 289 47 48 12 (set (reg:SI 116 [79]) (reg:SI 121 [79])) 1.c:16 85 {*movsi_internal} (nil)) (insn 48 289 290 12 (set (reg:SI 116 [79]) (if_then_else:SI (eq (reg:CCNO 17
[Bug rtl-optimization/57130] New: Incorrect "and --> extract" conversion in combine
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57130 Bug #: 57130 Summary: Incorrect "and --> extract" conversion in combine Classification: Unclassified Product: gcc Version: 4.9.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassig...@gcc.gnu.org ReportedBy: w...@google.com Created attachment 29986 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29986 Testcase For the smallcase 1.ii attached. ~/workarea/gcc-r198433/build/install/bin/g++ 1.ii -o a1.out ./a1.out --> correct output: 1 3 ~/workarea/gcc-r198433/build/install/bin/g++ -fno-inline -fno-omit-frame-pointer -O2 1.ii -o a2.out ./a2.out --> incorrect output: 1 -1 In 1.ii.196r.ud_dce: (insn 7 2 84 2 (set (reg:DI 64) (const_int -4294967296 [0x])) 1.ii:26 84 {*movdi_internal} (nil)) ... (insn 40 36 44 2 (parallel [ (set (reg:DI 88) (and:DI (reg:DI 80) (reg:DI 64))) (clobber (reg:CC 17 flags)) ]) 1.ii:32 386 {*anddi_1} (expr_list:REG_DEAD (reg:DI 80) (expr_list:REG_DEAD (reg:DI 64) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (and:DI (reg:DI 80) (const_int -4294967296 [0x])) (nil)) (insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ]) (zero_extend:DI (subreg:SI (reg:DI 88) 0))) 1.ii:33 127 {*zero_extendsidi2} (expr_list:REG_DEAD (reg:DI 88) (nil))) The value of r92 here should always be 0. After try_combine with params (i3==insn44, i2==insn40, i1==insn7), insn44 is transformed to: (insn 44 40 45 2 (parallel [ (set (reg:DI 92 [ mini_rect ]) (ashiftrt:DI (reg:DI 88) (const_int 63 [0x3f]))) (clobber (reg:CC 17 flags)) ]) 1.ii:33 528 {ashrdi3_cvt} (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_DEAD (reg:DI 88) (nil The value of r92 now equals either 0 or -1 which depends on the highest bit of r88. Try to understand what happen in try_combine: In try_combine, after subst(PATTERN (i3), i2dest, i2src, ...), insn 44 is transformed to the following form. This step is correct. (insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ]) (neg:DI (ne:DI (subreg:SI (and:DI (reg:DI 80) (reg:DI 64)) 0) (const_int 0 [0] 1.ii:33 127 {*zero_extendsidi2} (expr_list:REG_DEAD (reg:DI 88) (nil))) In subst(PATTERN (i3), i1dest, i1src, ...), insn 44 is firstly transformed to the following in simplify_logical, which is correct: (insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ]) (neg:DI (ne:DI (subreg:SI ((and:DI (reg:DI 80) (const_int 34359738368 [0x8]))) 0) (const_int 0 [0] 1.ii:33 127 {*zero_extendsidi2} (expr_list:REG_DEAD (reg:DI 88) (nil))) then it is transformed to the following in make_compound_operation, which is incorrect: (insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ]) (sign_extract:DI (reg:DI 80) (const_int 1 [0x1]) (const_int 35 [0x23]))) 1.ii:33 127 {*zero_extendsidi2} (expr_list:REG_DEAD (reg:DI 88) (nil))) make_compound_operation transforms (and:DI (reg:DI 80) (const_int 34359738368 [0x8])) to (zero_extract:DI (reg:DI 80) (const_int 1 [0x1]) (const_int 35 [0x23])) because it thinks the "and expr" here is in a compare. But actually the "and expr" is firstly the kid in a subreg: subreg:SI ((and:DI (reg:DI 80) (const_int 34359738368 [0x8])) ==> always 0 is not identical with subreg:SI ((zero_extract:DI (reg:DI 80) (const_int 1 [0x1]) (const_int 35 [0x23])) ==> 0 or 1 So it is the cause of the problem. The second actual of make_compound_operation (combine.c:7701, r198433) should not be in_code.
[Bug other/55353] [asan] the flag for asan should match the one used in clang
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55353 --- Comment #2 from wmi at google dot com 2012-11-19 05:54:44 UTC --- Hi Kostya, Ok, I will extract the change from the tsan patch and send out a separate patch about it. Regards, Wei. On Sun, Nov 18, 2012 at 9:20 PM, konstantin.s.serebryany at gmail dot com wrote: > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55353 > > Konstantin Serebryany changed: > >What|Removed |Added > > CC||hjl.tools at gmail dot com, >||wmi at gcc dot gnu.org > > --- Comment #1 from Konstantin Serebryany dot com> 2012-11-19 05:20:24 UTC --- > Wei, this needs to happen ASAP, otherwise there will be too many places with > the old flag. > > -- > Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email > --- You are receiving this mail because: --- > You are on the CC list for the bug.