from:"wmi at google dot com"

[Bug libstdc++/77356] New: regex error for a ECMAScript syntax string

2016-08-23 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77356

Bug ID: 77356
   Summary: regex error for a ECMAScript syntax string
   Product: gcc
   Version: 7.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libstdc++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
  Target Milestone: ---

For the testcase 1.cxx:

#include 
int main() {
  static const char* kNumericAnchor ="(\\$|usd)(usd|\\$|to|and|up
to|[0-9,\\.\\-\\sk])+";
  const std::regex re(kNumericAnchor);
  return 0;
}


~/workarea/gcc-r239713/build/install/bin/g++ -std=c++11 -O0 1.cxx
-Wl,-rpath=/usr/local/google/home/wmi/workarea/gcc-r239713/build/install/lib64
./a.out

terminate called after throwing an instance of 'std::regex_error'
  what():  Unexpected end of bracket expression.
Aborted (core dumped)

I have no problem to compile and run the testcase using libc++.

For libstdc++, the exception is thrown because the second dash '-' is not in
any range and it is not the start or end of bracket expression. According to
the comment in _M_expression_term in
src/libstdc++-v3/include/bits/regex_compiler.tcc this is not allowed in POSIX
syntax but allowed in ECMAScript syntax. Since the input is ECMAScript syntax,
libstdc++ shoudn't throw exception for it?

[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction

2015-10-18 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #13 from wmi at google dot com ---
Use the extracted testcase vogt contributed. Here is some digging about why
rtx_refs_may_alias_p returns noalias for the load and store:

(gdb) c
Continuing.

Breakpoint 3, rtx_refs_may_alias_p (x=0x757fe768, mem=0x757fe708,
tbaa_p=true)
at ../../src/gcc/alias.c:385
385   if (!ao_ref_from_mem (&ref1, x)

**
(gdb) p print_rtl_single(stderr, x)
(mem/j:SI (reg/v/f:DI 1 %r1 [orig:64 ps ] [64]) [5 ps_8->f2+-1 S4 A32])
$1 = 1
(gdb) p print_rtl_single(stderr, mem)
(mem/j:QI (reg/v/f:DI 2 %r2 [orig:64 ps ] [64]) [5 ps_8->f1+0 S1 A32])
**

rtx_refs_may_alias_p(x, mem, true) returns no_alias for "x" and "mem". 

>From RTL representation, x's starting address is ps_8->f2+-1, size is 4 //See
[5 ps_8->f2+-1 S4 A32]
mem's starting address is ps_8->f1+0, size is 1 // see [5 ps_8->f1+0 S1 A32]
So x and mem are aliased with each other.

**
(gdb) p ref1
$3 = {ref = 0x7568d9f0, base = 0x757d7c30, offset = 8, size = 24,
max_size = 24, ref_alias_set = 5,
  base_alias_set = -1, volatile_p = false}
(gdb) p ref2
$4 = {ref = 0x7568d9c0, base = 0x757d7f50, offset = 0, size = 8,
max_size = 8, ref_alias_set = 5,
  base_alias_set = -1, volatile_p = false}
(gdb) p debug_generic_expr(ref1.base)
*ps_8
$6 = void
(gdb) p debug_generic_expr(ref2.base)
*ps_8
$7 = void
**

rtx_refs_may_alias_p(x, mem, true) calls refs_may_alias_p_1(&ref1, &ref2, ...)
as its helper func. For ref1 and ref2, they have the same base -- *ps_8, but
they have non-overlapping accessing ranges -- ref1 from 8 to 8+24, ref2 from 0
to 8, so ref1 has no-alias with ref2. 

The major difference is between ref1 and x. ref1 is initialized using
MEM_EXPR(x) (MEM_EXPR(x) is ps_8->f2). So ref1 has its offset to be 8 and its
size to be 24. However, x has starting address to be ps_8->f2-1 and size to be
32 bits. Usually ref1's offset and size will be adjusted according to
MEM_SIZE(x) and MEM_OFFSET(x). However, because of the if (...) clause below,
ao_ref_from_mem returns true without adjusting ref1->offset and ref1->size.

(gdb) p debug_generic_expr(MEM_EXPR(x))
ps_8->f2
(gdb) p MEM_OFFSET(x)
$11 = -1
(gdb) p MEM_SIZE(x)
$12 = 4

ao_ref_from_mem (ao_ref *ref, const_rtx mem)
{
  tree expr = MEM_EXPR (mem);
  ...
  ao_ref_init (ref, expr);
  base = ao_ref_base (ref);
  ...
  /* If the base decl is a parameter we can have negative MEM_OFFSET in
 case of promoted subregs on bigendian targets.  Trust the MEM_EXPR
 here.  */
  if (MEM_OFFSET (mem) < 0
  && (MEM_SIZE (mem) + MEM_OFFSET (mem)) * BITS_PER_UNIT == ref->size)
return true; 

  ref->offset += MEM_OFFSET (mem) * BITS_PER_UNIT;
  ref->size = MEM_SIZE (mem) * BITS_PER_UNIT;

  ...
}

I don't understand the code above well -- why can we trust MEM_EXPR instead of
relying on MEM_OFFSET and MEM_SIZE? It seems not the case for the testcase
here.

[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction

2015-10-17 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #12 from wmi at google dot com ---
Yes, I agree it is a problem that memrefs_conflict_p doesn't take effect. But I
am still wondering even if memrefs_conflict_p doesn't take effect, the alias
oracle query in rtx_refs_may_alias_p should have returned may-alias for the
load and store. Why rtx_refs_may_alias_p failed to do that?

[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction

2015-10-14 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #6 from wmi at google dot com ---
(In reply to Dominik Vogt from comment #3)
> I think the Rtl in comment 1 ist correct.  Note that "i" is stored at
> 0x.xx00 and "j" is stored at 0x.00xx.  That is the
> reason for the rather confusing mask in insn 9.  Your test program compiles
> and runs fine for me.

I am not familiar with s390 assembly. please correct me if I am wrong:

This is the assembly generated for my testcase:
.globl _Z3fooP1A
.type   _Z3fooP1A, @function
_Z3fooP1A:
.LFB0:
larl%r5,.L3
mvi 0(%r2),3// move 0x.0003 to 0(%r2) 
l   %r1,.L4-.L3(%r5)// load 0xff00 to %r1
n   %r1,0(%r2)  // %r1 = %r1 & 0(%r2) =
0x. 
oill%r1,5   // %r1 = %r1 | 5 = 0x.0005
st  %r1,0(%r2)  // store 0x.0005 to 0(%r2)
br  %r14
.section.rodata
.align  8
.L3:
.L4:
.long   -16777216
.align  2
.previous

According to the asm sequence above, the result of a should be:
0x.0005, but the correct result should be 0x.0305,
right?

[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction

2015-09-07 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #2 from wmi at google dot com ---
Another problem is found in true_dependence_1 in alias.c. true_mem_addr or
true_x_addr got after calling get_addr may be used as inputs of
memrefs_conflict_p. However memrefs_conflict_p expects to use VALUE type nodes
as its inputs, so the values of the memory addresses can be comparable. Only
find_base_term and base_alias_check should use true_mem_addr/true_x_addr in
true_dependence_1.

This problem is not a correctness issue, but may affect the effectiveness of
dse/postreload...

[Bug rtl-optimization/67443] [5/6 regression] DSE removes required store instruction

2015-09-07 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67443

--- Comment #1 from wmi at google dot com ---
Seems the patch makes some problem exposed. 

For the testcase 1.cxx below:

typedef struct A {
  unsigned i : 8;
  unsigned j : 24;
} A;

void foo(A *a) {
  a->i = 3;
  a->j = 5;
}

The rtl generated by s390x-ibm-linux-g++ seems wrong.

~/workarea/gcc-r227524/build/install/bin/s390x-ibm-linux-g++ -O2 -S 1.cxx
-fdump-rtl-expand-details-blocks

(note 4 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK)
(insn 2 4 3 2 (set (reg/v/f:DI 60 [ a ])
(reg:DI 2 %r2 [ a ])) 4.cxx:6 -1
 (nil))
(note 3 2 6 2 NOTE_INSN_FUNCTION_BEG)
(insn 6 3 8 2 (set (mem/j:QI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->i+0 S1 A32])
(const_int 3 [0x3])) 4.cxx:7 -1
 (nil))
(insn 8 6 9 2 (set (reg:SI 62)
(mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32])) 4.cxx:8 -1
 (nil))
(insn 9 8 10 2 (parallel [
(set (reg:SI 63)
(and:SI (reg:SI 62)
(const_int -16777216 [0xff00])))
(clobber (reg:CC 33 %cc))
]) 4.cxx:8 -1
 (nil))
(insn 10 9 11 2 (parallel [
(set (reg:SI 64)
(ior:SI (reg:SI 63)
(const_int 5 [0x5])))
(clobber (reg:CC 33 %cc))
]) 4.cxx:8 -1
 (nil))
(insn 11 10 0 2 (set (mem/j:SI (reg/v/f:DI 60 [ a ]) [1 a_2(D)->j+-1 S4 A32])
(reg:SI 64)) 4.cxx:8 -1
 (nil))
;;  succ:   EXIT [100.0%]  (FALLTHRU)

[Bug target/65474] sub-optimal code for __builtin_abs

2015-03-19 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474

--- Comment #3 from wmi at google dot com ---
Thanks. You are right. I wrote a microbenchmark (attached), and tested it on
different intel microarchitectures.

westmere:
1.gcc.out:19.42
1.llvm.out:   19.32

sandybridge:
1.gcc.out:18.61
1.llvm.out:   19.16

ivybridge:
1.gcc.out:15.79
1.llvm.out:   15.87

On sandybridge, llvm's version was slower. On other microarchitectures, they
were close to each other. So gcc's choose makes sense.

[Bug target/65474] sub-optimal code for __builtin_abs

2015-03-19 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474

--- Comment #2 from wmi at google dot com ---
Created attachment 35069
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=35069&action=edit
microbench

[Bug middle-end/65474] New: sub-optimal code for __builtin_abs

2015-03-19 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65474

Bug ID: 65474
   Summary: sub-optimal code for __builtin_abs
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: enhancement
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

int foo(int x) {
  return __builtin_abs(x);
}

~/workarea/gcc-r221398/build/install/bin/gcc -O2 -S 1.c -o 1.gcc.s
.cfi_startproc
movl%edi, %edx
movl%edi, %eax
sarl$31, %edx
xorl%edx, %eax
subl%edx, %eax
ret
.cfi_endproc

~/workarea/llvm-r224097/build/bin/clang -O2 -S 1.c -o 1.llvm.s
.cfi_startproc
movl%edi, %eax
negl%eax
cmovll  %edi, %eax
retq
.cfi_endproc

[Bug rtl-optimization/64557] get_addr in true_dependence_1 cannot handle VALUE inside an expr

2015-01-10 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557

--- Comment #1 from wmi at google dot com ---
The experimental patch is to call get_addr for VALUE of base before plus other
constant, when creating mem_addr for dependence check and for store_info.
bootstrap and regression on x86_64-linux-gnu are ok. 

Index: dse.c
===
--- dse.c(revision 219421)
+++ dse.c(working copy)
@@ -1564,6 +1564,7 @@ record_store (rtx body, bb_info_t bb_inf
 = rtx_group_vec[group_id];
   mem_addr = group->canon_base_addr;
 }
+  mem_addr = get_addr (mem_addr);
   if (offset)
 mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
 }
@@ -2177,6 +2178,7 @@ check_mem_read_rtx (rtx *loc, bb_info_t
 = rtx_group_vec[group_id];
   mem_addr = group->canon_base_addr;
 }
+  mem_addr = get_addr (mem_addr);
   if (offset)
 mem_addr = plus_constant (get_address_mode (mem), mem_addr, offset);
 }

[Bug rtl-optimization/64557] New: get_addr in true_dependence_1 cannot handle VALUE inside an expr

2015-01-10 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64557

Bug ID: 64557
   Summary: get_addr in true_dependence_1 cannot handle VALUE
inside an expr
   Product: gcc
   Version: 4.9.3
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

We saw a bug in dse2 after porting the patch
https://gcc.gnu.org/ml/gcc-patches/2014-10/msg01209.html from gcc-4_9 to
google-4_9 branch. From the analysis below, I think the problem exists but is
hidden in trunk and gcc-4_9 too. I cannot extract a small testcase to show it
independently without turning on some optimization in google-4_9, so I just
described it here:

We have such IR in a case:

The IR before dse2:
(insn/f 67 4 68 2 (set (mem:DI (pre_dec:DI (reg/f:DI 7 sp)) [0  S8 A8])
(reg/f:DI 6 bp)) contentads/adx/mixer/auction/candidate.cc:14 -1
 (nil))
(insn/f 68 67 69 2 (set (reg/f:DI 6 bp)
(reg/f:DI 7 sp)) contentads/adx/mixer/auction/candidate.cc:14 -1
 (nil))
(insn/f 70 69 71 2 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int -24 [0xffe8])))
(clobber (reg:CC 17 flags))
(clobber (mem:BLK (scratch) [0  A8]))
]) contentads/adx/mixer/auction/candidate.cc:14 -1
 (nil))
(note 71 70 2 2 NOTE_INSN_PROLOGUE_END)

(insn 7 3 9 2 (set (mem/c:SI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507754]+0 S4
A128])
(const_int 0 [0])) ./ads/base/money.h:67 90 {*movsi_internal}
 (nil))
(insn 9 7 10 2 (set (mem/c:HI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507754]+0 S2
A128])
(const_int 21333 [0x5355])) ./ads/base/money.h:68 92 {*movhi_internal}
 (nil))
(insn 10 9 11 2 (set (mem/c:QI (plus:DI (reg/f:DI 7 sp)
(const_int 2 [0x2])) [0 MEM[(void *)&D.3507754]+2 S1 A16])
(const_int 68 [0x44])) ./ads/base/money.h:68 93 {*movqi_internal}
 (nil))
(insn 11 10 12 2 (set (reg:SI 0 ax [orig:87 D.3507754 ] [87])
(mem/c:SI (reg/f:DI 7 sp) [0 D.3507754+0 S4 A128]))
./ads/base/money.h:302 90 {*movsi_internal}
 (expr_list:REG_EQUIV (mem/c:SI (plus:DI (reg/f:DI 20 frame)
(const_int -16 [0xfff0])) [0 D.3507754+0 S4 A128])
(nil)))
...
(insn 15 13 17 2 (set (mem/c:SI (reg/f:DI 7 sp) [0 MEM[(void *)&D.3507756]+0 S4
A128])
(const_int 0 [0])) ./ads/base/money.h:67 90 {*movsi_internal}
 (nil))

The IR after dse2:
The store in insn 10 is deleted. The other part is the same as above.

(mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 2 [0x2])) in insn10 is regarded
to have no alias with (mem/c:SI (reg/f:DI 7 sp) in insn11, which is wrong. This
is because with the applied patch, get_addr is used to extract original
addresses for x_addr and mem_addr before they are used to find_base_term and
used in base_alias_check. See the description of x_addr and mem_addr below:

x is (mem/c:SI (reg/f:DI 7 sp)
x_addr before calling get_addr is:
(value:DI 4:12939 @0x84355f8/0x84354a0)
x_addr after calling get_addr is:
(plus:DI (value:DI 3:8637 @0x84355e8/0x8435478)
(const_int -24 [0xffe8]))
x_addr_base is: (address:DI -4)

mem is (mem/c:QI (plus:DI (reg/f:DI 7 sp) (const_int 2 [0x2]))
mem_addr before calling get_addr is:
(plus:DI (value:DI 4:12939 @0x84355f8/0x84354a0)
(const_int 2 [0x2]))
mem_addr after calling get_addr is:  // Notice: get_addr cannot handle plus
expr, so it returns the origin expr.
(plus:DI (value:DI 4:12939 @0x84355f8/0x84354a0)
(const_int 2 [0x2]))
mem_addr_base is: (address:DI -1)

// value:DI 4:12939 @0x84355f8/0x84354a0 corresponds to reg/f:DI 7 sp
// value:DI 3:8637 @0x84355e8/0x8435478 corresponds to reg/f:DI 6 bp
// address:DI -1 corresponds to reg/f:DI 7 sp
// address:DI -4 corresponds to reg/f:DI 6 bp

x_addr_base and mem_addr_base are different, and unique_base_value_p will
return true for (address:DI -1) and (address:DI -4), so base_alias_check will
return 0, which is wrong.

I think the root cause of the problem is get_addr can only handle VALUE but
cannot handle VALUE inside an expr, like:(plus:DI (value:DI 4:12939
@0x84355f8/0x84354a0) (const_int 2 [0x2])), so find_base_term cannot know
x_addr and mem_addr actually have the same base.

[Bug tree-optimization/64072] New: wrong cgraph node profile count

2014-11-25 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64072

Bug ID: 64072
   Summary: wrong cgraph node profile count
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
CC: davidxl at gcc dot gnu.org, hubicka at gcc dot gnu.org

We have a program like this:

A() {// hot func
  ...
}

B() {
  A();// very hot
  if (i) {
A();  // very cold
  }
}

Both callsites of A will be inlined into B. In gcc func
save_inline_function_body in inline_transform stage, A's first clone
will be choosen and materialized. For our case, the clone
node choosen corresponds to the cold callsite of A.
cgraph_rebuild_references in tree_function_versioning will reset the
cgraph node count of the choosen clone to the entry bb count of func A
(A is hot). So the cgraph node count of the choosen clone becomes hot
while its inline edge count is still cold. It breaks the assumption
described here:
https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01366.html:
for inline node, bb->count == edge->count == edge->callee->count

For the patch committed in the thread above (it is listed below),
cg_edge->callee->count is used for profile update to its inline
instance, which leads to a hot BB in func B which is actually very
cold. The wrong profile information causes performance regression in
one of our internal benchmarks. Our internal workround is to change
cg_edge->callee->count to MIN(cg_edge->callee->count, cg_edge->count).

Index: gcc/tree-inline.c
===
--- gcc/tree-inline.c (revision 210535)
+++ gcc/tree-inline.c (working copy)
@@ -4355,7 +4355,7 @@ expand_call_inline (basic_block bb, gimple stmt, c
  function in any way before this point, as this CALL_EXPR may be
  a self-referential call; if we're calling ourselves, we need to
  duplicate our body before altering anything.  */
-  copy_body (id, bb->count,
+  copy_body (id, cg_edge->callee->count,
GCOV_COMPUTE_SCALE (cg_edge->frequency, CGRAPH_FREQ_BASE),
  bb, return_block, NULL);

[Bug ipa/63970] [4.9/5 Regression] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag

2014-11-24 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970

--- Comment #6 from wmi at google dot com ---
The patch was committed to trunk at r217973.

[Bug ipa/63970] [4.9/5 Regression] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag

2014-11-21 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970

--- Comment #4 from wmi at google dot com ---
(In reply to Jan Hubicka from comment #3)
> Created attachment 34047 [details]
> Patch
> 
> Something like this (untested) may work

Thanks! I tested your patch after minor change. It passed bootstrap and
regression. It also solved the performance regression we saw in internal
benchmarks. 

+  if (origin_node && !origin_node->used_as_abstract_origin)
+{
+  origin_node->used_as_abstract_origin = true;
+  enqueue_node (origin_node, &first, &reachable); // enqueue_node
moved here
+  gcc_assert (!origin_node->prev_sibling_clone);
+  gcc_assert (!origin_node->next_sibling_clone);
+  for (origin_node = origin_node->clones; origin_node;
+   origin_node = origin_node->next_sibling_clone)
+if (origin_node->decl == DECL_ABSTRACT_ORIGIN (node->decl))
+  origin_node->used_as_abstract_origin = true;
+}

Wei.

[Bug tree-optimization/63970] gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag

2014-11-19 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970

--- Comment #1 from wmi at google dot com ---
> I think we need to keep the functions but do not need to account for them in 
> the unit size if we otherwise could remove them
>
> Richard.

But there is code in symbol_table::remove_unreachable_nodes:

  if (TREE_CODE (node->decl) == FUNCTION_DECL
  && DECL_ABSTRACT_ORIGIN (node->decl))
{
  struct cgraph_node *origin_node
  = cgraph_node::get_create (DECL_ABSTRACT_ORIGIN (node->decl));
  origin_node->used_as_abstract_origin = true;
  enqueue_node (origin_node, &first, &reachable);
}

If we remove the check in can_remove_node_now_p_1, the original node will be
removed or reused as clone node in ipa inline analysis, but it will be
recreated in symbol_table::remove_unreachable_nodes after ipa inline analysis
finishes, if only its clone nodes are reachable.

So can we just remove the original node in inline analysis and let
symbol_table::remove_unreachable_nodes to restore it after ipa inline analysis?

Thanks,
Wei.

[Bug tree-optimization/63970] New: gcc-4_9 inlines less funcs than gcc-4_8 because of used_as_abstract_origin flag

2014-11-19 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63970

Bug ID: 63970
   Summary: gcc-4_9 inlines less funcs than gcc-4_8 because of
used_as_abstract_origin flag
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
CC: davidxl at google dot com, dehao at google dot com,
hubicka at gcc dot gnu.org, rguenth at gcc dot gnu.org,
tejohnson at google dot com

We see an inline problem as below caused by r201408
(https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00027.html).

hoo() {
  foo();
  ...
}

foo {
  goo();
  ...
}

foo is func splitted, so its body changes to

foo {
  goo();
  ...
  foo.part();
}

and the used_as_abstract_origin of cgraph node of foo will be set to
true after func splitting.

In ipa-inline, when inlining foo into hoo, the original node of foo
will not be reused as clone node because used_as_abstract_origin of
cgraph node of foo is true and can_remove_node_now_p_1 will return
false, so that a new clone node of foo will be created. This is the
case in gcc-4_9.
In gcc-4_8, the original node of foo will be reused as clone node.

gcc-4_8
foo
  |
goo

gcc-4_9
foofoo_clone
\ /
  goo

Because of the difference of whether to create a new clone for foo,
when inlining goo to foo, the overall growth of inlining all callsites
of goo in gcc-4_8 will be less than gcc-4_9 (goo has two callsites in
gcc-4_9 but only one in gcc-4_8). If we have many cases like this,
gcc-4_8 will actually have more inline growth budget than gcc-4_9 and
will inline more aggressively than gcc-4_9.

I don't understand the exact usage of the check about
node->used_as_abstract_origin in can_remove_node_now_p_1, but I feel
puzzled about following two points:

1. https://gcc.gnu.org/ml/gcc-patches/2013-08/msg00027.html said the
patch was to ensure all abstract origin functions do have nodes
attached. However, even if the node of origin function is reused as a
clone node, a new clone node will be created in following code in
symbol_table::remove_unreachable_nodes if only the node that needs
abstract origin is reachable.

  if (TREE_CODE (node->decl) == FUNCTION_DECL
  && DECL_ABSTRACT_ORIGIN (node->decl))
{
  struct cgraph_node *origin_node
  = cgraph_node::get_create (DECL_ABSTRACT_ORIGIN (node->decl));
  origin_node->used_as_abstract_origin = true;
  enqueue_node (origin_node, &first, &reachable);
}

2. DECL_ABSTRACT_ORIGIN(decl) seems only useful for debug info of
clone nodes. But now the check of used_as_abstract_origin affect
inline decisions, which should be the same with or without keeping
debug info.

[Bug rtl-optimization/63548] New: redundent reload in loop after removing regmove

2014-10-15 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63548

Bug ID: 63548
   Summary: redundent reload in loop after removing regmove
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

Created attachment 33730
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33730&action=edit
testcase 1.c

For program with many insns like "a = b + c", where operands "b" and "c" are
both dead immediately after the add insn, the hardreg preference heuristic
seems not perfect.

Here is a testcase 1.c,

For gcc after r204212, they generates two redundent reload insns caused by
imperfect hardreg preference heuristic in IRA. 

~/workarea/gcc-r214579/build/install/bin/gcc -O2 -S 1.c

.L5:
movl%ebx, %edi
callgoo
leal2(%rbx), %edi
movl%eax, %r13d
callgoo
leal4(%rbx), %edi
movl%eax, %r12d
callgoo
leal6(%rbx), %edi
movl%eax, %ebp
addl$1, %ebx
callgoo
movl%eax, %edx // redundent mov
movl%r13d, %eax// redundent mov
imull   %r12d, %eax
imull   %ebp, %eax
imull   %edx, %eax
addl%eax, total(%rip)
cmpl%ebx, M(%rip)
jg  .L5

For old gcc with regmove, it happens to be better than hardreg preference
heuristic and generates one redundent reload.

~/workarea/gcc-r199418/build/install/bin/gcc -O2 -S 1.c
.L3:
movl%ebx, %edi
callgoo
leal2(%rbx), %edi
movl%eax, %r13d
callgoo
leal4(%rbx), %edi
movl%eax, %r12d
callgoo
leal6(%rbx), %edi
movl%eax, %ebp
addl$1, %ebx
callgoo
movl%r13d, %edx// redundent mov
imull   %r12d, %edx
imull   %ebp, %edx
imull   %eax, %edx
addl%edx, total(%rip)
cmpl%ebx, M(%rip)
jg  .L3

llvm generates no redundent move insn.

clang-r217862 -O2 -S 1.c
.LBB0_2:   
movl%ebx, %edi
callq   goo
movl%eax, %r14d
leal2(%rbx), %edi
callq   goo
movl%eax, %ebp
leal4(%rbx), %edi
callq   goo
movl%eax, %r15d
leal6(%rbx), %edi
callq   goo
imull   %r14d, %ebp
imull   %r15d, %ebp
imull   %eax, %ebp
addl%ebp, total(%rip)
incl%ebx
cmplM(%rip), %ebx
jl  .LBB0_2

[Bug rtl-optimization/63525] New: unnecessary reloads generated in loop

2014-10-13 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63525

Bug ID: 63525
   Summary: unnecessary reloads generated in loop
   Product: gcc
   Version: 5.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
CC: vmakarov at gcc dot gnu.org

Created attachment 33700
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33700&action=edit
testcase 1.cxx

For the testcase 1.cxx attached, trunk (r214579) generates an addpd with mem
operand and one extra reload insn in kernel loop. For g++ before r204274, it
generate less insns in the kernel loop.

~/workarea/gcc-r214579/build/install/bin/g++ -O2 -S 1.cxx -o 1.s
kernel loop:
.L3:
   pxor%xmm0, %xmm0
   cvtsi2sd%eax, %xmm0
   addl$1, %eax
   cmpl%edx, %eax
   unpcklpd%xmm0, %xmm0
   addpd   -24(%rsp), %xmm0 ===> mem operand used
   movaps  %xmm0, -24(%rsp)   ===> reload
   jne .L3

~/workarea/gcc-r199418/build/install/bin/g++ -O2 -S 1.cxx -o 2.s
kernel loop:
.L3:
   xorpd   %xmm1, %xmm1
   cvtsi2sd%eax, %xmm1
   addl$1, %eax
   unpcklpd%xmm1, %xmm1
   addpd   %xmm1, %xmm0
   cmpl%edx, %eax
   jne .L3


The reload insns in trunk are generated because of following steps:

With r204274, the IR after expand like this:
Loop:
...
(insn 15 14 16 5 (set (reg/v:V2DF 83 [ v ])
   (plus:V2DF (reg/v:V2DF 83 [ v ])
   (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 -1
(nil))
...
end Loop.
(insn 23 22 24 7 (set (reg/v:TI 90 [ tmp ])
   (subreg:TI (reg/v:V2DF 83 [ v ]) 0))
/usr/local/google/home/wmi/workarea/gcc-r212442/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.10.0/include/emmintrin.h:157
-1
(nil))
(insn 24 23 25 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  ) [2 x+0 S8 A64])
   (subreg:DF (reg/v:TI 90 [ tmp ]) 0)) 1.cxx:17 -1
(nil))
(insn 25 24 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  ) [2 y+0 S8 A64])
   (subreg:DF (reg/v:TI 90 [ tmp ]) 8)) 1.cxx:18 -1
(nil))

forward propagation will propagate reg 90 from insn 23 to insn 24 and insn 25,
and remove subreg:TI, so we get the IR before IRA like this:

Loop:
...
(insn 15 14 16 4 (set (reg/v:V2DF 83 [ v ])
   (plus:V2DF (reg/v:V2DF 83 [ v ])
   (reg:V2DF 92 [ D.5005 ]))) 1.cxx:14 1263 {*addv2df3}
(expr_list:REG_DEAD (reg:V2DF 92 [ D.5005 ])
   (nil)))
...
end Loop.
(insn 24 22 25 5 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  ) [2 x+0 S8 A64])
   (subreg:DF (reg/v:V2DF 83 [ v ]) 0)) 1.cxx:17 128 {*movdf_internal}
(nil))
(insn 25 24 0 5 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  ) [2 y+0 S8 A64])
   (subreg:DF (reg/v:V2DF 83 [ v ]) 8)) 1.cxx:18 128 {*movdf_internal}
(expr_list:REG_DEAD (reg/v:V2DF 83 [ v ])
   (nil)))

ix86_cannot_change_mode_class doesn't allow such subreg: "subreg:DF (reg/v:V2DF
83 [ v ]) 8)" in insn 25, so reg 83 will be added in invalid_mode_changes by
record_subregs_of_mode and will be allocated NO_REGS regclass.

reg 83 has NO_REGS regclass while plus:V2DF requires the target operand to be
xmm register in insn 15, so reload insns are needed. The kernel loop has low
register pressure and it doesn't form a separate IRA region, so live range
splitting on region boarder doesn't kick in here.

Without r204274, IR after expand is like this:
Loop:
...
(insn 15 14 16 5 (set (reg/v:V2DF 61 [ v ])
   (plus:V2DF (reg/v:V2DF 61 [ v ])
   (reg:V2DF 68 [ D.4966 ]))) 1.cxx:14 -1
(nil))
...
End Loop.
(insn 25 24 26 7 (set (subreg:V2DF (reg/v:TI 66 [ tmp ]) 0)
   (reg/v:V2DF 61 [ v ]))
/usr/local/google/home/wmi/workarea/gcc-r199418/build/install/lib/gcc/x86_64-unknown-linux-gnu/4.9.0/include/emmintrin.h:147
-1
(nil))
(insn 26 25 27 7 (set (mem/c:DF (symbol_ref:DI ("x") [flags 0x2]  ) [2 x+0 S8 A64])
   (subreg:DF (reg/v:TI 66 [ tmp ]) 0)) 1.cxx:17 -1
(nil))
(insn 27 26 0 7 (set (mem/c:DF (symbol_ref:DI ("y") [flags 0x2]  ) [2 y+0 S8 A64])
   (subreg:DF (reg/v:TI 66 [ tmp ]) 8)) 1.cxx:18 -1
(nil))

Because the subreg is on the left handside of insn 25, it is impossible for
forward propagation to merge insn 25 to insn 26 and insn 27. reg 61 will not
have reference like this: "subreg:DF (reg/v:V2DF 61 [ v ]) 8)", so it gets SSE
regclass and will not introduce extra reload insns in kernel loop.

r204274 just enables more forward propagations and exposes the problem here.

[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-17 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

--- Comment #6 from wmi at google dot com ---
(In reply to davidxl from comment #5)
> (In reply to wmi from comment #4)
> > Can we move the pure/const resetting loop to an earlier place: inside
> > branch_prob , after instrument_edges and before gsi_commit_edge_inserts
> > (where stmt_ends_bb_p  is checked), so that gsi_commit_edge_inserts() which
> > changes cfg could take reset const/pure flags into consideration?
> 
> Sounds plausible. Have you tried it?
> 
> David

I just tried but found it was not very easy.

FOR_EACH_DEFINED_FUNCTION (node) {
  execute_fixup_cfg() and cleanup_tree_cfg()
  branch_prob()
}

For the above loop, branch_prob is called one by one for each defined func.
Because a func could possibly call any other funcs on the cgraph, we need to
reset the const/pure flags for every defined func before any branch_prob() is
called, so we cannot put the const/pure reset code inside branch_prob().

We also cannot move the const/pure reset loop before the branch_prob() loop,
because execute_fixup_cfg will use const/pure flags to generate different cfg.
If we put the const/pure reset code before the branch_prob() loop, the
const/pure reset code should only be executed in intrumentation phase, not in
annotation phase, so that we may get different cfg between intrumentation and
annotation.

Wei.

[Bug middle-end/61776] [4.9/4.10 Regression] ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-17 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

--- Comment #4 from wmi at google dot com ---
Can we move the pure/const resetting loop to an earlier place: inside
branch_prob , after instrument_edges and before gsi_commit_edge_inserts (where
stmt_ends_bb_p  is checked), so that gsi_commit_edge_inserts() which changes
cfg could take reset const/pure flags into consideration?

[Bug tree-optimization/61776] New: ICE: verify_flow_info failed: control flow in the middle of basic block with -fprofile-generate

2014-07-10 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61776

Bug ID: 61776
   Summary: ICE: verify_flow_info failed: control flow in the
middle of basic block with -fprofile-generate
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
CC: rguenth at gcc dot gnu.org
  Host: x86_64-linux-gnu
Target: x86_64-linux-gnu
 Build: x86_64-linux-gnu

Created attachment 33106
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=33106&action=edit
testcase 1.c

~/workarea/gcc-r211604/build/install/bin/gcc -O2 -S -fprofile-generate 1.c

1.c: In function ‘foo’:
1.c:24:1: error: control flow in the middle of basic block 3
 }
 ^
1.c:24:1: error: control flow in the middle of basic block 3
1.c:24:1: error: control flow in the middle of basic block 3
1.c:24:1: internal compiler error: verify_flow_info failed
0x71f8c2 verify_flow_info()
   ../../src/gcc/cfghooks.c:260
0xbdaf1b cleanup_tree_cfg_noloop
   ../../src/gcc/tree-cfgcleanup.c:737
0xbdafec cleanup_tree_cfg()
   ../../src/gcc/tree-cfgcleanup.c:786
0xc674b8 tree_profiling
   ../../src/gcc/tree-profile.c:652
0xc6754a execute
   ../../src/gcc/tree-profile.c:691

The cause of the problem is:

Before edge profiling instrumentation, goo in 1.c is regarded as a const
function since it has no side-effect, so during instrumentation call goo is not
regarded as a bb ending stmt, and some instrumentation code is inserted after
call goo in the same BB -- call goo stmt is in the middle of a BB now.

After edge profiling instrumentation, goo body now contains instrumentation
code, and goo's const flag is reset to false because now it has side-effect.
Since then call goo is regarded as a bb ending stmt, which is inconsistent with
the fact that call goo is in the middle of a BB. verify_flow_info() after that
fails and Error message "control flow in the middle of basic block" is
reported.

Google ref b/15936428

[Bug tree-optimization/61493] [4.10 Regression] Bug exposed by speculative devirtualizing

2014-06-24 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493

wmi at google dot com changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from wmi at google dot com ---
This is a source problem. So close the bug.

void foo(FST *fst) {
  const PAIR &final_pair = fst->Final().getpair();
  if (final_pair == global_pair)
__builtin_printf("equal\n");
  else
__builtin_printf("not equal\n");

  return;
}

The life time of the temporary object generated by fst->Final() will not be
extended after the statement generating it, according to the following rule :

a temporary bound to a return value of a function in a return statement is not
extended: it is destroyed immediately at the end of the return expression. Such
function always returns a dangling reference.
(http://en.cppreference.com/w/cpp/language/reference_initialization)

So it is meaningless to access final_pair afterwards.

[Bug tree-optimization/61493] [4.10 Regression] Bug exposed by speculative devirtualizing

2014-06-13 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493

--- Comment #3 from wmi at google dot com ---
Fix a typo in the first post.

$~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx
$./a.out
not equal

$~/workarea/gcc-r211604/build/install/bin/g++ -O0 1.cxx
$./a.out
equal

$~/workarea/gcc-r211604/build/install/bin/g++ -O2
-fno-devirtualize-speculatively 1.cxx
$./a.out
equal

[Bug tree-optimization/61493] Bug exposed by speculative devirtualizing

2014-06-12 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493

--- Comment #1 from wmi at google dot com ---
Created attachment 32931
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=32931&action=edit
testcase

[Bug tree-optimization/61493] New: Bug exposed by speculative devirtualizing

2014-06-12 Thread wmi at google dot com

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61493

Bug ID: 61493
   Summary: Bug exposed by speculative devirtualizing
   Product: gcc
   Version: 4.10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

1.cxx is attached.

$~/workarea/gcc-r211604/build/install/bin/g++ -v
Using built-in specs.
COLLECT_GCC=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install/bin/g++
COLLECT_LTO_WRAPPER=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install/libexec/gcc/x86_64-unknown-linux-gnu/4.10.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure --enable-languages=c,c++ --disable-bootstrap
--prefix=/usr/local/google/home/wmi/workarea/gcc-r211604/build/install
Thread model: posix
gcc version 4.10.0 20140613 (experimental) (GCC)

$~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx
$./a.out
not equal

$~/workarea/gcc-r211604/build/install/bin/g++ -O2 1.cxx
$./a.out
equal

$~/workarea/gcc-r211604/build/install/bin/g++ -O2
-fno-devirtualize-speculatively 1.cxx
$./a.out
equal

Google ref b/15521306

[Bug rtl-optimization/60738] New: A missing opportunity about process_single_reg_class_operands

2014-04-01 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60738

Bug ID: 60738
   Summary: A missing opportunity about
process_single_reg_class_operands
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

Testcase 1.c:

int a, b, c, d, e, cond;
void foo() {
  int r1, r2, r3;
  r1 = b;
  r2 = d;
  if (__builtin_expect(cond > 3, 0)) {
e = e * 5;
c = a << r1;
  }
  c = c << r2;
  __builtin_printf("r1 + r2 = %d\n", r1 + r2);
}

~/workarea/gcc-r208410/build/install/bin/gcc -O2 -S 1.c

foo:
.LFB0:
.cfi_startproc
cmpl$3, cond(%rip)
movlb(%rip), %esi
movld(%rip), %eax
jg  .L2
movlc(%rip), %edx
.L3:
movl%eax, %ecx  // r2 gets assigned %eax. This is reload for
insn1.
addl%eax, %esi
movl$.LC0, %edi
sall%cl, %edx   // insn1. Its constraint requires r2 in %ecx
xorl%eax, %eax
movl%edx, c(%rip)
jmp printf
.p2align 4,,10
.p2align 3
.L2:
movle(%rip), %edx
movl%esi, %ecx  // r1 gets assigned %esi. This is reload for
insn2.
leal(%rdx,%rdx,4), %edx
movl%edx, e(%rip)
movla(%rip), %edx
sall%cl, %edx   // insn2. Its constraint requires r1 in %ecx
jmp .L3
.cfi_endproc

Because the bb starting from L2 is relatively cold, it is better to generate
the code below:

foo:
.LFB0:
.cfi_startproc
cmpl$3, cond(%rip)
movlb(%rip), %esi
movld(%rip), %ecx
jg  .L2
movlc(%rip), %eax
.L3:
sall%cl, %eax   // r2 gets assigned %ecx. no reload is
needed.
addl%ecx, %esi
movl$.LC0, %edi
movl%eax, c(%rip)
xorl%eax, %eax
jmp printf
.p2align 4,,10
.p2align 3
.L2:
movle(%rip), %eax
movl%ecx, %edx   // r2's live range is splitted here. This
is the start of the splitted live range.
movl%esi, %ecx   // r1 gets assigned %esi, this is reload
for insn2. 
leal(%rax,%rax,4), %eax
movl%eax, e(%rip)
movla(%rip), %eax
sall%cl, %eax// insn2. constraint of insn2 requires r1
in %ecx
movl%edx, %ecx   // r2's live range is splitted here. This
is the end of the splitted live range.
jmp .L3
.cfi_endproc

Now there is less code in the hotpath (The bb starting from .L3).

r1 and r2 used in sall insns need CX_REG class which is
single_reg_operand_class in IRA. Existing logic in
process_single_reg_class_operands in ira-lives.c doesn't allow %ecx to being
assigned to r1 or r2. May it need improvement here?

[Bug tree-optimization/60206] IVOPT has no idea of inline asm

2014-03-10 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

--- Comment #7 from wmi at google dot com ---
After looking into the problem more, I found IVOPT may not be the root cause.
Even if IVOPT create a memory operand using two registers, if only the
following optimizations doesn't propagate the memory operand to an asm_operand,
the problem will not happen. 

So I created another smallcase 2.c for which gcc at the head of trunk will
report the same error. -fno-ivopts will not help here.

gcc -v
Target: x86_64-unknown-linux-gnu
Configured with: ../src/configure
--prefix=/usr/local/google/home/wmi/workarea/gcc-r208410/build/install
Thread model: posix
gcc version 4.9.0 20140307 (experimental) (GCC)

gcc -O2 -fno-omit-frame-pointer -m32 -S 2.c

2.c: In function ‘foo’:
2.c:25:1: error: ‘asm’ operand has impossible constraints
 __asm__ (
 ^

The problem will disappear after I use -fno-tree-ter and -fdisable-rtl-combine.
These two phases could propagate a memory reference using a register into an
asm operand with constraint "g", which make the registers used in asm stmt
increase. 

For TER, TER of loads into input arguments is allowed. For combine,
insn_invalid_p() will only check whether an asm operand will satisfy its
constraint. However, neither TER nor combine check whether the propagation
could make the registers in asm stmt exceed available register number.

[Bug tree-optimization/60206] IVOPT has no idea of inline asm

2014-03-10 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

--- Comment #6 from wmi at google dot com ---
Created attachment 32328
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32328&action=edit
2.c

[Bug tree-optimization/60206] IVOPT has no idea of inline asm

2014-02-14 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

--- Comment #4 from wmi at google dot com ---

> On Fri, 14 Feb 2014, pinskia at gcc dot gnu.org wrote:
> 
> > I think the real issue __FP_FRAC_SUB_4 needs to be fixed not to use 
> > inline-asm
> > but normal C code.  The normal C code should be able to produce as good as 
> > the
> > inline-asm code now too.
> 
> Does GCC do a good job of detecting add-with-carry and 
> subtract-with-borrow patterns (i.e. detecting the comparison that 
> corresponds to the carry flag and its use in a subsequent operation)?

I remember at least the expansion of builtin_strlen could generate
sub-with-borrow and it works well, so I think rtl passes could handle
add-with-carry/subtract-with-borrow.

[Bug tree-optimization/60206] IVOPT has no idea of inline asm

2014-02-14 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

--- Comment #2 from wmi at google dot com ---
This is a way to fix the problem. libgcc/soft-fp/op-4.h has provided a C
version of __FP_FRAC_SUB_4, but now it is overrided by the inline asm version
in config/i386/32/sfp-machine.h.

But the inline asm looks legal right? Isn't it compiler's responsiblity to keep
the inline asm constraints always satisfiable?

[Bug tree-optimization/60206] New: IVOPT has no idea of inline asm

2014-02-14 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60206

Bug ID: 60206
   Summary: IVOPT has no idea of inline asm
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com
CC: rguenth at gcc dot gnu.org, shenhan at google dot com
  Host: i386
Target: i386

Created attachment 32141
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=32141&action=edit
Testcase

This bug is found in google branch but I think the same problem also exists on
trunk (but not exposed).

For the testcase 1.c attached (1.c is extracted from libgcc/soft-fp/divtf3.c),
use trunk compiler gcc-r202164 (Target: x86_64-unknown-linux-gnu) + the patch
r204497 could expose the problem.

The command:
gcc -v -O2 -fno-omit-frame-pointer -fpic -c -S -m32 1.c

The error:
./1.c: In function ‘__divtf3’:
./1.c:64:1194: error: ‘asm’ operand has impossible constraints

The inline asm in error message is as follow:
do {
 __asm__ (
"sub{l} {%11,%3|%3,%11}\n\t"
"sbb{l} {%9,%2|%2,%9}\n\t"
"sbb{l} {%7,%1|%1,%7}\n\t"
"sbb{l} {%5,%0|%0,%5}"
: "=r" ((USItype) (A_f[3])), "=&r" ((USItype) (A_f[2])), "=&r" ((USItype)
(A_f[1])), "=&r" ((USItype) (A_f[0])) : "0" ((USItype) (B_f[2])), "g"
((USItype) (A_f[2])), "1" ((USItype) (B_f[1])), "g" ((USItype) (A_f[1])), "2"
((USItype) (B_f[0])), "g" ((USItype) (A_f[0])), "3" ((USItype) (0)), "g"
((USItype) (_n_f[_i])));
} while ()

Because -fno-omit-frame-pointer is turned on and the command line uses -fpic,
there are only 5 registers for register allocation.

Before IVOPT,
%0, %1, %2, %3 require 4 registers. The index variable i of _n_f[_i] requires
another register. So 5 registers are used up here.

After IVOPT, MEM reference _n_f[_i] is converted to MEM[base: _874, index:
ivtmp.22_821, offset: 0B]. base and index require 2 registers, Now 6 registers
are required, so LRA cannot find enough registers to allocate.

trunk compiler doesn't expose the problem because of patch r202165. With patch
r202165, IVOPT doesn't change _n_f[_i] in inline asm above. But it just hided
the problem.

Should IVOPT care about the constraints in inline-asm and restrict its
optimization in some case?

[Bug regression/58985] [4.9 Regression]: gcc.dg/pr57518.c scan-rtl-dump-not ira REG_EQUIV...

2013-11-10 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58985

--- Comment #9 from wmi at google dot com ---
Backported r200720 to gcc 4.8 branch at r204660.

[Bug regression/58985] [4.9 Regression]: gcc.dg/pr57518.c scan-rtl-dump-not ira REG_EQUIV...

2013-11-04 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58985

--- Comment #4 from wmi at google dot com ---
This is the testcase problem. For cris-axis-elf target, gcc doesn't use subreg
of reg 31 for the above testcase, so it is ok to generate REG_EQUIV note for
reg 31.

I will send out a patch for it soon.

Thanks for pointing out the problem.

Regards,
Wei Mi.

[Bug rtl-optimization/57878] Incorrect code: live register clobbered in split2

2013-07-14 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57878

wmi at google dot com changed:

   What|Removed |Added

 CC||wmi at google dot com

--- Comment #3 from wmi at google dot com ---
Seems problem is at deciding the priority of assign hardreg for reload pseudos
.i.e the func reload_pseudo_compare_func.

This is the trace of 2nd iteration of reload pseudo assignments in
r.ii.209r.reload:

  2nd iter for reload pseudo assignments:
 Reload r196 assignment failure
 Reload r199 assignment failure
 Reload r204 assignment failure
 Reload r204 assignment failure
  Spill reload r194(hr=1, freq=1426)
  Spill reload r195(hr=5, freq=1426)
  Spill reload r197(hr=1, freq=1426)
  Spill reload r198(hr=5, freq=1426)
  Spill reload r202(hr=1, freq=1426)
  Spill reload r203(hr=5, freq=1426)
 Assigning to 194 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190,
tfreq=4991)...
   Assign 1 to reload r194 (freq=1426)
 Assigning to 197 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190,
tfreq=4991)...
   Assign 1 to reload r197 (freq=1426)
Hard reg 1 is preferable by r222 with profit 3029
Hard reg 2 is preferable by r222 with profit 1425
 Assigning to 202 (cl=GENERAL_REGS, orig=138, freq=1426, tfirst=190,
tfreq=4991)...
   Assign 1 to reload r202 (freq=1426)
 Assigning to 195 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191,
tfreq=4278)...
   Assign 5 to reload r195 (freq=1426)
 Assigning to 198 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191,
tfreq=4278)...
   Assign 5 to reload r198 (freq=1426)
 Assigning to 203 (cl=INDEX_REGS, orig=140, freq=1426, tfirst=191,
tfreq=4278)...
   Assign 5 to reload r203 (freq=1426)
 Assigning to 196 (cl=GENERAL_REGS, orig=196, freq=1426, tfirst=196,
tfreq=1426)...
 Trying 2: spill 225(freq=1426)
 Assigning to 199 (cl=GENERAL_REGS, orig=199, freq=1426, tfirst=199,
tfreq=1426)...
 Trying 2: spill 225(freq=1426)
 Assigning to 204 (cl=GENERAL_REGS, orig=204, freq=1426, tfirst=204,
tfreq=1426)...
 Trying 2: spill 216(freq=2139)
   Assign 0 to reload r196 (freq=1426)
   Assign 0 to reload r199 (freq=1426)
   Assign 0 to reload r204 (freq=1426)
  Reassigning non-reload pseudos

Here is the dump after lra_constraints. These are the insns related with r194,
r195, r196:

(insn 200 120 201 6 (set (reg/f:SI 194 [orig:138 D.3281 ] [138])
(reg/f:SI 138 [ D.3281 ])) 1.ii:197 89 {*movsi_internal}
 (nil))
(insn 201 200 121 6 (set (reg/f:SI 195 [orig:140 D.3282 ] [140])
(reg/f:SI 140 [ D.3282 ])) 1.ii:197 89 {*movsi_internal}
 (nil))
(insn 121 201 202 6 (set (reg:DI 196)
(mem:DI (plus:SI (plus:SI (reg/f:SI 99 [ D.3281 ])
(reg/f:SI 126 [ D.3282 ]))
(const_int 8 [0x8])) [10 MEM[base: _1, index: _44, offset: 8]+0
S8 A64])) 1.ii:197 88 {*movdi_internal}
 (expr_list:REG_DEAD (reg:DI 131 [ D.3287 ])
(nil)))
(insn 202 121 122 6 (set (mem:DI (plus:SI (plus:SI (reg/f:SI 194 [orig:138
D.3281 ] [138])
(reg/f:SI 195 [orig:140 D.3282 ] [140]))
(const_int 8 [0x8])) [10 MEM[base: _75, index: _77, offset:
8B]+0 S8 A64])
(reg:DI 196)) 1.ii:197 88 {*movdi_internal}
 (nil))

>From trace, r194 r195 are assigned hardreg before r196. Usually reload pseudos
will not conflict with each other except a special case: they are in the same
insn. r194,r195 and r196 just belong to such case. They are all in the insn
202. 

In addition, r194, r195 and r196 are all reload pseudos, so once r194 and r195
are allocated, they will not be spilled for assigning hardreg for r196. In this
case, r194 and r195 get hardreg assigned before r196. So after r194 and r195
are assigned hardreg, r196 cannot find available hardreg because it has bigger
mode and require a consecutive hardreg pair. All pseudos which cannot find
hardreg after two iterations will be given ax simply, and report error. Trunk
report error but 4.8.1 doesn't report it because lra_assert is only enabled in
trunk but not in 4.8.1.

A possible fix is to give bigger mode pseudos higher priority in lra
assignment.
Index: lra-assigns.c
===
--- lra-assigns.c(revision 200944)
+++ lra-assigns.c(working copy)
@@ -194,15 +194,15 @@ reload_pseudo_compare_func (const void *
   if ((diff = (ira_class_hard_regs_num[cl1]
- ira_class_hard_regs_num[cl2])) != 0)
 return diff;
-  if ((diff = (regno_assign_info[regno_assign_info[r2].first].freq
-   - regno_assign_info[regno_assign_info[r1].first].freq)) != 0)
-return diff;
   /* Allocate bigger pseudos first to avoid regis

[Bug rtl-optimization/57518] [4.9 Regression] Redundant insn generated in LRA

2013-06-20 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518

wmi at google dot com changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu.org

--- Comment #3 from wmi at google dot com ---
oh, sorry to make it misleading, but the 4.8.0 below is an experimental version
(see its date is 20120613, at that time LRA has not been merged):

Target: x86_64-linux-gnu
gcc version 4.8.0 20120613 (experimental) (GCC)
gcc -O2 -S 1.c
.cfi_startproc
movzblip+2(%rip), %eax
andl$3, %eax
movl%eax, total(%rip)
ret
.cfi_endproc

I just verified using 4.8.0 and 4.8.1 releases, the problem was there for both.

[Bug rtl-optimization/57518] [4.8/4.9 Regression] Redundant insn generated in LRA

2013-06-12 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518

--- Comment #1 from wmi at google dot com ---
post a candidate patch here:

http://gcc.gnu.org/ml/gcc-patches/2013-06/msg00748.html

[Bug rtl-optimization/57459] [4.8/4.9 Regression] LRA inheritance bug

2013-06-06 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57459

--- Comment #6 from wmi at google dot com ---
continue the analysis in the first post, for the smallcase 1.c, the IR after
calling inherit_in_ebb in lra_inheritance for bb12 is:

(insn 289 47 48 12 (set (reg:SI 116 [79])
(reg:SI 121 [79])) 1.c:16 85 {*movsi_internal}
 (nil))
(insn 48 289 290 12 (set (reg:SI 116 [79])
(if_then_else:SI (eq (reg:CCNO 17 flags)
(const_int 0 [0]))
(reg:SI 122 [83])
(reg:SI 116 [79]))) 1.c:16 923 {*movsicc_noc}
 (expr_list:REG_DEAD (reg:SI 122 [83])
(nil)))
(insn 290 48 294 12 (set (reg:SI 120 [79])
(reg:SI 116 [79])) 1.c:16 85 {*movsi_internal}
 (nil))
(insn 294 290 49 12 (set (reg:SI 79)
(reg:SI 120 [79])) 1.c:16 85 {*movsi_internal}
 (nil))
..
(insn 292 50 51 12 (set (reg:QI 118)
(subreg:QI (reg:SI 120 [79]) 0)) 1.c:16 87 {*movqi_internal}
 (nil))
(insn 51 292 52 12 (parallel [
(set (reg:CC 17 flags)
(unspec:CC [
(subreg:QI (reg:SI 79) 0)
(reg:QI 118)
] UNSPEC_ADD_CARRY))
(set (subreg:QI (reg:SI 79) 0)
(plus:QI (subreg:QI (reg:SI 79) 0)It is still correct 
(reg:QI 118)))
]) 1.c:16 259 {addqi3_cc}
 (expr_list:REG_UNUSED (reg:SI 79)
(nil)))

The IR is still correct after this step. 

However, after update_ebb_live_info (called after inherit_in_ebb), insn 294 is
removed. Then reg 79 cannot get updated value and it doesn't equal to reg 118
anymore. IR is wrong after this step. 
insn 294 is removed in update_ebb_live_info because the reg type of reg 79 is
OP_INOUT but update_ebb_live_info only marks OP_IN type reg as live_regs.

So the fix is:
Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c(revision 199752)
+++ gcc/lra-constraints.c(working copy)
@@ -4545,7 +4545,7 @@ update_ebb_live_info (rtx head, rtx tail
   bitmap_clear_bit (&live_regs, reg->regno);
   /* Mark each used value as live.  */
   for (reg = curr_id->regs; reg != NULL; reg = reg->next)
-if (reg->type == OP_IN
+if ((reg->type == OP_IN || reg->type == OP_INOUT)
 && bitmap_bit_p (&check_only_regs, reg->regno))
   bitmap_set_bit (&live_regs, reg->regno);
   /* It is quite important to remove dead move insns because it

Bootstrapped and tested on x86_64-linux.

[Bug rtl-optimization/57518] New: Redundent insn generated in LRA

2013-06-03 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57518

Bug ID: 57518
   Summary: Redundent insn generated in LRA
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

Testcase:

char ip[10];
int total, total1;

void foo() {
  int t;

  t = ip[2];
  total = t & 0x3;
}

Target: x86_64-linux-gnu
gcc version 4.9.0 20130529 (experimental) (GCC) 
~/workarea/gcc-r199418/build/install/bin/gcc -O2 -S 1.c
.cfi_startproc
movzbl  ip+2(%rip), %eax
movb%al, -16(%rsp) ==> redundent
movl-16(%rsp), %eax==> redundent
andl$3, %eax
movl%eax, total(%rip)
ret
.cfi_endproc


Target: x86_64-linux-gnu
gcc version 4.8.0 20120613 (experimental) (GCC)
gcc -O2 -S 1.c
.cfi_startproc
movzblip+2(%rip), %eax
andl$3, %eax
movl%eax, total(%rip)
ret
.cfi_endproc



IR before LRA:

(insn 12 7 8 2 (set (reg:QI 64 [ ip+2 ])
(mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip")  )
(const_int 2 [0x2]))) [0 ip+2 S1 A8])) 1.c:9 87
{*movqi_internal}
 (expr_list:REG_EQUIV (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") 
)
(const_int 2 [0x2]))) [0 ip+2 S1 A8])
(nil)))
(insn 8 12 9 2 (parallel [
(set (reg:SI 65 [ D.1731 ])
(and:SI (subreg:SI (reg:QI 64 [ ip+2 ]) 0)
(const_int 3 [0x3])))
(clobber (reg:CC 17 flags))
]) 1.c:9 387 {*andsi_1}
 (expr_list:REG_DEAD (reg:QI 64 [ ip+2 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUIV (mem/c:SI (symbol_ref:DI ("total")  ) [2 total+0 S4 A32])
(nil)

IR after LRA:

(insn 12 7 14 2 (set (reg:QI 0 ax [orig:64 ip+2 ] [64])
(mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip")  )
(const_int 2 [0x2]))) [0 ip+2 S1 A8])) 1.c:9 87
{*movqi_internal}
 (expr_list:REG_EQUIV (mem/j/c:QI (const:DI (plus:DI (symbol_ref:DI ("ip") 
)
(const_int 2 [0x2]))) [0 ip+2 S1 A8])
(nil)))
(insn 14 12 15 2 (set (mem/c:QI (plus:DI (reg/f:DI 7 sp)
(const_int -16 [0xfff0])) [3 %sfp+-16 S1 A64])
(reg:QI 0 ax [orig:64 ip+2 ] [64])) 1.c:9 87 {*movqi_internal}
 (expr_list:REG_DEAD (reg:QI 0 ax [orig:64 ip+2 ] [64])
(nil)))
(insn 15 14 8 2 (set (reg:SI 0 ax [orig:65 D.1731 ] [65])
(mem/c:SI (plus:DI (reg/f:DI 7 sp)
(const_int -16 [0xfff0])) [3 %sfp+-16 S4 A64]))
1.c:9 85 {*movsi_internal}
 (nil))
(insn 8 15 16 2 (parallel [
(set (reg:SI 0 ax [orig:65 D.1731 ] [65])
(and:SI (reg:SI 0 ax [orig:65 D.1731 ] [65])
(const_int 3 [0x3])))
(clobber (reg:CC 17 flags))
]) 1.c:9 387 {*andsi_1}
 (expr_list:REG_EQUIV (mem/c:SI (symbol_ref:DI ("total")  ) [2 total+0 S4 A32])
(nil)))

IRA Trace:

Pass 0 for finding pseudo/allocno costs

a0 (r65,l0) best GENERAL_REGS, allocno GENERAL_REGS
a1 (r64,l0) best NO_REGS, allocno NO_REGS

a1's rclass are all NO_REGS because it has REG_EQUIV note (equivalent to mem
ip+2)

Because reg 64 is marked as equivalent to mem ip+2, insn 12 is expected to be
deleted and reg 64 in insn 8 replaced by mem ip+2. In LRA constraints, insn 12
is not deleted because the subreg op in insn 8 (see lra-constraints.c:3662
r199418). In addition, reg 64's rclass is NO_REGS, so redundent spills are
inserted. 

The mode size check (lra-constraints.c:3662 r199418) needs to be considered in
update_equiv_regs in IRA, in order not to mark the reg 64 equivalent with mem
ip + 2 in this case.

[Bug rtl-optimization/57459] New: LRA inheritance bug

2013-05-29 Thread wmi at google dot com

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57459

Bug ID: 57459
   Summary: LRA inheritance bug
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: wmi at google dot com

Created attachment 30218
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=30218&action=edit
small testcase

To reproduce the bug on using 1.c attached: 

Target: x86_64-unknown-linux-gnu
gcc version 4.9.0 20130529 (experimental) (GCC)

$~/workarea/gcc-r199418/build/install/bin/gcc -fno-inline -O2
-minline-all-stringops -fno-omit-frame-pointer -m32 1.c
$./a.out
len = 9

$~/workarea/gcc-r199418/build/install/bin/gcc -O2 -m32 1.c
$./a.out
len = 8

The expanded __builtin_strlen is wrong:

 80484c3:   8b 07   mov(%edi),%eax
 80484c5:   83 c7 04add$0x4,%edi
 80484c8:   8d 90 ff fe fe fe   lea-0x1010101(%eax),%edx
 80484ce:   f7 d0   not%eax
 80484d0:   21 c2   and%eax,%edx
 80484d2:   81 e2 80 80 80 80   and$0x80808080,%edx
 80484d8:   74 e9   je 80484c3 
 80484da:   89 d0   mov%edx,%eax
 80484dc:   8b 55 ecmov-0x14(%ebp),%edx
 80484df:   89 55 e8mov%edx,-0x18(%ebp)
 80484e2:   89 c2   mov%eax,%edx
 80484e4:   c1 e8 10shr$0x10,%eax
 80484e7:   f7 c2 80 80 00 00   test   $0x8080,%edx
 80484ed:   89 45 ecmov%eax,-0x14(%ebp)
 80484f0:   89 d0   mov%edx,%eax
 80484f2:   8d 57 02lea0x2(%edi),%edx
 80484f5:   0f 44 facmove  %edx,%edi
 80484f8:   8b 55 e8mov-0x18(%ebp),%edx
 80484fb:   0f 44 45 ec cmove  -0x14(%ebp),%eax

 80484ff:   00 45 e4add%al,-0x1c(%ebp)   > Wrong
here, the correct insn is: add %al, %al. %al is either 0x80 or 0x0 here. The
insn "add  %al, %al" is used to check whether %al is 0x80, and it will produce
carry bit for the following sbb. (The lowest 0x80 in %eax shows where the first
'\0' is in the input string)

 8048502:   83 df 03sbb$0x3,%edi
 8048505:   8b 45 08mov0x8(%ebp),%eax
 8048508:   2b 7d 08sub0x8(%ebp),%edi


The IR after IRA and before LRA:

(insn 51 50 52 12 (parallel [
(set (reg:CC 17 flags)
(unspec:CC [
(subreg:QI (reg:SI 79) 0)
(subreg:QI (re(insn 292 50 51 12 (set (reg:QI 118)
(subreg:QI (reg:SI 79) 0)) 1.c:16 87 {*movqi_internal}
 (nil))
(insn 51 292 52 12 (parallel [
(set (reg:CC 17 flags)
(unspec:CC [
(subreg:QI (reg:SI 79) 0)
(reg:QI 118)
] UNSPEC_ADD_CARRY))
(set (subreg:QI (reg:SI 79) 0)
(plus:QI (subreg:QI (reg:SI 79) 0)
(reg:QI 118)))
]) 1.c:16 259 {addqi3_cc}
 (expr_list:REG_UNUSED (reg:SI 79)
(nil)))g:SI 79) 0)
] UNSPEC_ADD_CARRY))
(set (subreg:QI (reg:SI 79) 0)
(plus:QI (subreg:QI (reg:SI 79) 0)
(subreg:QI (reg:SI 79) 0)))
]) 1.c:16 259 {addqi3_cc}
 (expr_list:REG_UNUSED (reg:SI 79)
(nil)))

The IR is correct till now. insn 51 will produce the problematic "add
%al,-0x1c(%ebp)" finally. All the input and output operands of insn 51 are
reg79. The reg79 gets no hardreg in IRA phase.  

The IR after lra_constraints:

(insn 292 50 51 12 (set (reg:QI 118)
(subreg:QI (reg:SI 79) 0)) 1.c:16 87 {*movqi_internal}
 (nil))
(insn 51 292 52 12 (parallel [
(set (reg:CC 17 flags)
(unspec:CC [
(subreg:QI (reg:SI 79) 0)
(reg:QI 118)
] UNSPEC_ADD_CARRY))
(set (subreg:QI (reg:SI 79) 0)
(plus:QI (subreg:QI (reg:SI 79) 0)
(reg:QI 118)))
]) 1.c:16 259 {addqi3_cc}
 (expr_list:REG_UNUSED (reg:SI 79)
(nil)))

The IR is still correct. 
The choosen constraints of insn 51 are "rm" "0" "rn". reg79 get no hardreg in
IRA, so the output operand and the first input operand satisfy the constraint
(staying in mem), but the second input operand should stay in register. That is
why reg118 is introduced and insn 292 is inserted. 

The IR after lra_inheritance:

(insn 289 47 48 12 (set (reg:SI 116 [79])
(reg:SI 121 [79])) 1.c:16 85 {*movsi_internal}
 (nil))
(insn 48 289 290 12 (set (reg:SI 116 [79])
(if_then_else:SI (eq (reg:CCNO 17

[Bug rtl-optimization/57130] New: Incorrect "and --> extract" conversion in combine

2013-04-30 Thread wmi at google dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57130



 Bug #: 57130

   Summary: Incorrect "and --> extract" conversion in combine

Classification: Unclassified

   Product: gcc

   Version: 4.9.0

Status: UNCONFIRMED

  Severity: normal

  Priority: P3

 Component: rtl-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: w...@google.com





Created attachment 29986

  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=29986

Testcase



For the smallcase 1.ii attached.



~/workarea/gcc-r198433/build/install/bin/g++ 1.ii -o a1.out

./a1.out  --> correct output:

1

3



~/workarea/gcc-r198433/build/install/bin/g++ -fno-inline

-fno-omit-frame-pointer -O2 1.ii -o a2.out

./a2.out  --> incorrect output:

1

-1



In 1.ii.196r.ud_dce:



(insn 7 2 84 2 (set (reg:DI 64)

(const_int -4294967296 [0x])) 1.ii:26 84

{*movdi_internal}

 (nil))

...

(insn 40 36 44 2 (parallel [

(set (reg:DI 88)

(and:DI (reg:DI 80)

(reg:DI 64)))

(clobber (reg:CC 17 flags))

]) 1.ii:32 386 {*anddi_1}

 (expr_list:REG_DEAD (reg:DI 80)

(expr_list:REG_DEAD (reg:DI 64)

(expr_list:REG_UNUSED (reg:CC 17 flags)

(expr_list:REG_EQUAL (and:DI (reg:DI 80)

(const_int -4294967296 [0x]))

(nil))

(insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ])

(zero_extend:DI (subreg:SI (reg:DI 88) 0))) 1.ii:33 127

{*zero_extendsidi2}

 (expr_list:REG_DEAD (reg:DI 88)

(nil)))



The value of r92 here should always be 0.



After try_combine with params (i3==insn44, i2==insn40, i1==insn7), insn44 is

transformed to: 

(insn 44 40 45 2 (parallel [

(set (reg:DI 92 [ mini_rect ])

(ashiftrt:DI (reg:DI 88)

(const_int 63 [0x3f])))

(clobber (reg:CC 17 flags))

]) 1.ii:33 528 {ashrdi3_cvt}

 (expr_list:REG_UNUSED (reg:CC 17 flags)

(expr_list:REG_DEAD (reg:DI 88)

(nil



The value of r92 now equals either 0 or -1 which depends on the highest bit of

r88. 



Try to understand what happen in try_combine:

In try_combine, after subst(PATTERN (i3), i2dest, i2src, ...), insn 44 is

transformed to the following form. This step is correct. 



(insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ])

(neg:DI (ne:DI (subreg:SI (and:DI (reg:DI 80)

(reg:DI 64)) 0)

(const_int 0 [0] 1.ii:33 127 {*zero_extendsidi2}

 (expr_list:REG_DEAD (reg:DI 88)

(nil)))



In subst(PATTERN (i3), i1dest, i1src, ...), insn 44 is firstly transformed to

the following in simplify_logical, which is correct:



(insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ])

(neg:DI (ne:DI (subreg:SI ((and:DI (reg:DI 80)

(const_int 34359738368 [0x8]))) 0)

(const_int 0 [0] 1.ii:33 127 {*zero_extendsidi2}

 (expr_list:REG_DEAD (reg:DI 88)

(nil)))



then it is transformed to the following in make_compound_operation, which is

incorrect:



(insn 44 40 45 2 (set (reg:DI 92 [ mini_rect ])

(sign_extract:DI (reg:DI 80)

(const_int 1 [0x1])

(const_int 35 [0x23]))) 1.ii:33 127 {*zero_extendsidi2}

 (expr_list:REG_DEAD (reg:DI 88)

(nil)))





make_compound_operation transforms



(and:DI (reg:DI 80)

(const_int 34359738368 [0x8]))



to



(zero_extract:DI (reg:DI 80)

(const_int 1 [0x1])

(const_int 35 [0x23]))



because it thinks the "and expr" here is in a compare. But actually the "and

expr" is firstly the kid in a subreg: 



subreg:SI ((and:DI (reg:DI 80)

(const_int 34359738368 [0x8]))  ==> always 0



is not identical with



subreg:SI ((zero_extract:DI (reg:DI 80)

(const_int 1 [0x1])

(const_int 35 [0x23]))  ==> 0 or 1



So it is the cause of the problem. The second actual of make_compound_operation

(combine.c:7701, r198433) should not be in_code.

[Bug other/55353] [asan] the flag for asan should match the one used in clang

2012-11-18 Thread wmi at google dot com



http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55353



--- Comment #2 from wmi at google dot com 2012-11-19 05:54:44 UTC ---

Hi Kostya,



Ok, I will extract the change from the tsan patch and send out a

separate patch about it.



Regards,

Wei.



On Sun, Nov 18, 2012 at 9:20 PM, konstantin.s.serebryany at gmail dot

com  wrote:

>

> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55353

>

> Konstantin Serebryany  changed:

>

>What|Removed |Added

> 

>  CC||hjl.tools at gmail dot com,

>||wmi at gcc dot gnu.org

>

> --- Comment #1 from Konstantin Serebryany  dot com> 2012-11-19 05:20:24 UTC ---

> Wei, this needs to happen ASAP, otherwise there will be too many places with

> the old flag.

>

> --

> Configure bugmail: http://gcc.gnu.org/bugzilla/userprefs.cgi?tab=email

> --- You are receiving this mail because: ---

> You are on the CC list for the bug.

41 matches

Mail list logo