[Bug rtl-optimization/87047] [7/8/9/10 Regression] performance regression because of if-conversion

2019-10-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047

--- Comment #14 from Alexander Monakov  ---
Author: amonakov
Date: Wed Oct  2 15:37:12 2019
New Revision: 276466

URL: https://gcc.gnu.org/viewcvs?rev=276466&root=gcc&view=rev
Log:
ifcvt: improve cost estimation (PR 87047)

PR rtl-optimization/87047
* ifcvt.c (average_cost): New static function.  Use it...
(noce_process_if_block): ... here.

testsuite/
* gcc.dg/pr87047.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr87047.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/ifcvt.c
trunk/gcc/ifcvt.h
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/87047] [7/8/9 Regression] performance regression because of if-conversion

2019-10-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047

Alexander Monakov  changed:

   What|Removed |Added

Summary|[7/8/9/10 Regression]   |[7/8/9 Regression]
   |performance regression  |performance regression
   |because of if-conversion|because of if-conversion

--- Comment #15 from Alexander Monakov  ---
Fixed on the trunk, I'll ask about backporting to gcc-9 branch after a month or
so.

[Bug other/91972] New: Bootstrap should use -Wmissing-declarations

2019-10-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91972

Bug ID: 91972
   Summary: Bootstrap should use -Wmissing-declarations
   Product: gcc
   Version: 10.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

In the good old days when gcc was written in C, bootstrap stage2/3 enabled
-Wmissing-prototypes and so it caught attempted definitions of functions that
should be static, but were not declared so.

Transition to C++ did not change -Wmissing-prototypes to
-Wmissing-declarations, so over time several violations crept in. In particular
this penalizes optimization during non-LTO bootstrap (the compiler has to
assume the function might be used in another TU, even though in reality all
uses are in current file and it simply misses the 'static' keyword).

[Bug tree-optimization/91965] missing simplification for (C - a) << N

2019-10-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91965

--- Comment #1 from Alexander Monakov  ---
On a related thought, I wonder if we can canonicalize (x << CST) to (x * CST')
where CST' is 1<

[Bug tree-optimization/91965] missing simplification for (C - a) << N

2019-10-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91965

--- Comment #3 from Alexander Monakov  ---
(In reply to Marc Glisse from comment #2)
> What exact transformation do you want? Canonicalize the constant C to
> something like C % (1 << (bitsize - N))?

I'm thinking (C << N) >>> N where '>>>' is sign-extending right shift. In other
words, duplicate the bit at position (bitsize - 1 - N) to the left. In the
opening example, I want to go from

  (0xf - a) << 44

to

  (-1 - a) << 44

> For unsigned only... didn't we use to canonicalize in the reverse direction,
> i.e. from x*4 to x<<2?

For unsigned - because we promise not to use undefined overflow for signed left
shifts? Can we canonicalize to a wrapping multiplication?

[Bug target/92030] Wrong asm code for aliases on MIPS.

2019-10-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92030

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
The alias attribute usually works fine with declarations of static variables.

So, have you tried

static struct memtype MTYPE_##name[1] __attribute__((alias("_mt_" #name)));

(i.e. with 'static' rather that 'extern')?

[Bug inline-asm/92151] Spurious register copying

2019-10-21 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92151

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
>> (or write actual assembly rather than using inline-asm).
> In this case, yes -- I now declare the function "naked" and avoid the issue.

I think this solution (hand-writing asm for the entire function) is generally
undesirable because you become responsible for ABI/calling convention, the
compiler won't help you with things like properly restoring callee-saved
registers, and violations may stay unnoticed as long as callers don't try to
use a particular callee-saved reg.

Here's a manually reduced variant that exhibits a similar issue at -O1:

void foo(int num, int c) {
asm("# %0" : "+r"(num));
while (--c)
asm goto("# %0" :: "r"(num) :: l2);
l2:
asm("# %0" :: "r"(num));
}

The main issue seems to be our 'asmcons' pass transforming RTL in such a way
that REG_DEAD notes are "behind" the actual death, so if the RA takes them
literally it operates on wrong (too conservative) lifetime information; e.g.,
for the first asm, just before IRA we have:

(insn 29 4 8 2 (set (reg:SI 84 [ num ])
(reg:SI 85)) "./example.c":3:5 -1
 (nil))
(insn 8 29 7 2 (parallel [
(set (reg:SI 84 [ num ])
(asm_operands:SI ("# %0") ("=r") 0 [
(reg:SI 84 [ num ])
]
 [
(asm_input:SI ("0") ./example.c:3)
]
 [] ./example.c:3))
(clobber (reg:CC 17 flags))
]) "./example.c":3:5 -1
 (expr_list:REG_DEAD (reg:SI 85)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil

but register 85 actually dies in insn 29, not in insn 8.

[Bug middle-end/92250] valgrind: ira_traverse_loop_tree – Conditional jump or move depends on uninitialised value

2019-10-28 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92250

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
Be sure to enable Valgrind annotations (configure with
--enable-valgrind-annotations), otherwise false positives on sparseset
functions are expected: sparse set algorithm accesses uninitialized memory by
design (an explanation is available at e.g. https://research.swtch.com/sparse
).

[Bug rtl-optimization/87047] [7/8/9 Regression] performance regression because of if-conversion

2019-11-05 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047

--- Comment #16 from Alexander Monakov  ---
I'd like to backport this to gcc-9 branch and then close this bug (Richi
already indicated that further backports are not desirable). Thoughts?

[Bug rtl-optimization/87047] [7/8/9 Regression] performance regression because of if-conversion

2019-11-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87047

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #19 from Alexander Monakov  ---
Nothing left to do then, closing.

[Bug tree-optimization/92283] [10 Regression] 454.calculix miscomparison since r276645 with -O2 -march=znver2

2019-11-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92283

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #17 from Alexander Monakov  ---
(In reply to Richard Biener from comment #16)
> interestingly 66:66 and 67:67 generate exactly the same code and
> 66:67 add a single loop.  That's totally odd but probably an
> artifact of a bug in dbg_cnt_is_enabled which does
> 
> bool
> dbg_cnt_is_enabled (enum debug_counter index)
> {
>   unsigned v = count[index];
>   return v > limit_low[index] && v <= limit_high[index];
> }
> 
> where it should be v >= limit_low[index].

This is intentionally like that, the idea is that a:b makes a half-open
interval with the right bound (b) not included.  So 66:66 and 67:67 are both
simply empty intervals.

dbg_cnt_is_enabled tests left bound with '>' and right bound with '<=' because
its caller (dbg_cnt) incremented the counter before the call.

[Bug inline-asm/84861] -flto with asm() optimizes too much

2018-03-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84861

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
PR 57703 seems to be the "canonical instance" for the toplevel-asms-with-lto
issue.

[Bug sanitizer/84761] AddressSanitizer is not compatible with glibc 2.27 on x86

2018-03-19 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84761

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #7 from Alexander Monakov  ---
Is it possible that a distribution would backport glob() changes together with
its symver update (without also backporting the regparm change)? In that case
the dlvsym check shown above will be wrong I think.

Would the approach with confstr query for glibc version (discussed on irc) be
less fragile?

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
Vadim, can you please check if the issue is reproducible on preprocessed (-E)
input as well, and if so, attach the preprocessed testcase so people can try to
repro it without downloading Debian's MinGW headers? Thanks.

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

--- Comment #12 from Alexander Monakov  ---
I can reproduce it with downloaded Debian's cc1plus, and for me -Wnonnull alone
is sufficient to cause diverging codegen. It diverges very early, in the
frontend: diff of .tu dumps starts with:

--- a/1/16795.cpp.001t.tu
+++ b/2/16795.cpp.001t.tu
@@ -110354,336 +110354,337 @@
 @56158  bind_exprtype: @27  body: @59125
 @56159  cond_exprtype: @27  op 0: @5106op 1: @59126
  op 2: @59127
-@56160  cleanup_point_expr type: @27  op 0: @59128
-@56161  convert_expr type: @27  op 0: @59129
-@56162  call_exprtype: @109 fn  : @59130   0   : @59131
- 1   : @59132
-@56163  expr_stmttype: @27  line: 732  expr: @59133
-@56164  cleanup_point_expr type: @109 op 0: @59134
+@56160  cond_exprtype: @27  op 0: @5106op 1: @59128
+ op 2: @59129
+@56161  convert_expr type: @27  op 0: @59130
+@56162  call_exprtype: @109 fn  : @59131   0   : @59132
+ 1   : @59133
+@56163  expr_stmttype: @27  line: 732  expr: @59134
+@56164  cleanup_point_expr type: @109 op 0: @59135

and .original diff has the following hunk:

@@ -17695,8 +17695,11 @@ return  = __out;
   <;
 }
-  <;
+}
 }


(in the diffs, plus-lines correspond to -Wnonnull added to command line)

[Bug c++/85091] Compiler generates different code depending on whether -Wnonnull -Woverloaded-virtual given or not

2018-03-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85091

--- Comment #13 from Alexander Monakov  ---
> (in the diffs, plus-lines correspond to -Wnonnull added to command line)

No, sorry, it was the other way around. Here's the reverse diff with more
context:

   if (0)
 {
   <;
 }
-  if (0)
-{
-  <;
-}
 }

It corresponds to

if(!(!std::signbit(bourn_cast( From(0) { lmi_test::record_error();
};
if(!(std::signbit(bourn_cast(-From(0) { lmi_test::record_error();
};

in template instantiation test_floating_conversions.
Essentially, with -Wnonnull the second condition seems to be folded to truth
value.

[Bug tree-optimization/85275] New: copyheader peels off almost the entire iteration

2018-04-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85275

Bug ID: 85275
   Summary: copyheader peels off almost the entire iteration
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

I expected predcom to eliminate one of the loads in this loop at -O3:

int is_sorted(int *a, int n)
{
  for (int i = 0; i < n - 1; i++)
if (a[i] > a[i + 1])
  return 0;
  return 1;
}

Unfortunately, predcom bails out since the loads it sees are not
always-executed. Ideally loop header copying would make this a suitable
do-while loop, but in this case it duplicates too much:


;; Loop 1
;;  header 5, latch 4
;;  depth 1, outer 0
;;  nodes: 5 4 3
;; 2 succs { 5 }
;; 3 succs { 6 4 }
;; 4 succs { 5 }
;; 5 succs { 3 6 }
;; 6 succs { 1 }
Analyzing loop 1
Loop 1 is not do-while loop: latch is not empty.
Will duplicate bb 5
Will duplicate bb 3
  Not duplicating bb 4: it is single succ.
Duplicating header of the loop 1 up to edge 3->4, 12 insns.
[...]
   [local count: 114863532]:
  _17 = n_12(D) + -1;
  if (_17 > 0)
goto ; [94.50%]
  else
goto ; [5.50%]

   [local count: 108546038]:
  _18 = 0;
  _19 = _18 * 4;
  _20 = a_13(D) + _19;
  _21 = *_20;
  _22 = _18 + 1;
  _23 = _22 * 4;
  _24 = a_13(D) + _23;
  _25 = *_24;
  if (_21 > _25)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 906139986]:
  _1 = (long unsigned int) i_15;
  _2 = _1 * 4;
  _3 = a_13(D) + _2;
  _4 = *_3;
  _5 = _1 + 1;
  _6 = _5 * 4;
  _7 = a_13(D) + _6;
  _8 = *_7;
  if (_4 > _8)
goto ; [5.50%]
  else
goto ; [94.50%]

   [local count: 958878293]:
  # i_26 = PHI <0(3), i_15(4)>
  i_15 = i_26 + 1;
  _9 = n_12(D) + -1;
  if (_9 > i_15)
goto ; [94.50%]
  else
goto ; [5.50%]



(throttling it down with --param max-loop-header-insns=5 gives the expected
optimization)

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659

Alexander Monakov  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
Summary|[6/7/8 Regression] ICE: |[6/7 Regression] ICE:
   |Segmentation fault (stack   |Segmentation fault (stack
   |overflow in bb_note) w/ |overflow in bb_note) w/
   |selective scheduling|selective scheduling

--- Comment #3 from Alexander Monakov  ---
Fixed on the trunk. Unfortunately the Changelog entry had a typo in the PR#:

Author: amonakov
Date: Wed Apr 11 10:40:07 2018
New Revision: 259313

URL: https://gcc.gnu.org/viewcvs?rev=259313&root=gcc&view=rev
Log:
sel-sched: run cleanup_cfg just before loop_optimizer_init (PR 84659)

  PR rtl-optimization/84659
  * sel-sched-ir.c (sel_init_pipelining): Invoke cleanup_cfg.

testsuite/
  * gcc.dg/pr84659.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr84659.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 10:48:42 2018
New Revision: 259314

URL: https://gcc.gnu.org/viewcvs?rev=259314&root=gcc&view=rev
Log:
fix PR 84659 references in ChangeLog files

Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog

[Bug target/84301] [6/7/8 Regression] ICE in create_pre_exit, at mode-switching.c:451

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301

--- Comment #5 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 14:32:32 2018
New Revision: 259321

URL: https://gcc.gnu.org/viewcvs?rev=259321&root=gcc&view=rev
Log:
sched-rgn: run add_branch_dependencies for sel-sched (PR 84301)

PR target/84301
* sched-rgn.c (add_branch_dependences): Move sel_sched_p check here...
(compute_block_dependences): ... from here.

testsuite/
* gcc.target/i386/pr84301.c: New test.


Added:
trunk/gcc/testsuite/gcc.target/i386/pr84301.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sched-rgn.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Wed Apr 11 14:36:04 2018
New Revision: 259322

URL: https://gcc.gnu.org/viewcvs?rev=259322&root=gcc&view=rev
Log:
sched-deps: respect deps->readonly in macro-fusion (PR 84566)

PR rtl-optimization/84566
* sched-deps.c (sched_analyze_insn): Check deps->readonly when invoking
sched_macro_fuse_insns.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/sched-deps.c

[Bug target/84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301

Alexander Monakov  changed:

   What|Removed |Added

  Known to work||8.0
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
Summary|[6/7/8 Regression] ICE in   |[6/7 Regression] ICE in
   |create_pre_exit, at |create_pre_exit, at
   |mode-switching.c:451|mode-switching.c:451
  Known to fail|8.0 |

--- Comment #6 from Alexander Monakov  ---
Fixed on the trunk.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 84566, which changed state.

Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on 
aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/82407] [meta-bug] qsort_chk fallout tracking

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82407
Bug 82407 depends on bug 84566, which changed state.

Bug 84566 Summary: error: qsort comparator not anti-commutative: -1, -1 on 
aarch64 in sched1
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/84566] error: qsort comparator not anti-commutative: -1, -1 on aarch64 in sched1

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84566

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

Alexander Monakov  changed:

   What|Removed |Added

 CC||abel at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org

--- Comment #1 from Alexander Monakov  ---
Thanks. Judging from the backtrace, we shouldn't call cleanup_cfg after
dominators are computed: it will invalidate dominators without freeing or
fixing them. I wonder if that's "by design".

A simple way out is to run cleanup_cfg early enough. I'll bootstrap/regtest the
following on gcc112:

diff --git a/gcc/sel-sched-ir.c b/gcc/sel-sched-ir.c
index 50a7daafba6..ee970522890 100644
--- a/gcc/sel-sched-ir.c
+++ b/gcc/sel-sched-ir.c
@@ -30,7 +30,6 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfgrtl.h"
 #include "cfganal.h"
 #include "cfgbuild.h"
-#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "recog.h"
@@ -6122,9 +6121,6 @@ make_regions_from_loop_nest (struct loop *loop)
 void
 sel_init_pipelining (void)
 {
-  /* Remove empty blocks: their presence can break assumptions elsewhere,
- e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
-  cleanup_cfg (0);
   /* Collect loop information to be used in outer loops pipelining.  */
   loop_optimizer_init (LOOPS_HAVE_PREHEADERS
| LOOPS_HAVE_FALLTHRU_PREHEADERS
diff --git a/gcc/sel-sched.c b/gcc/sel-sched.c
index cd29df35666..59762964c6e 100644
--- a/gcc/sel-sched.c
+++ b/gcc/sel-sched.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "regs.h"
 #include "cfgbuild.h"
+#include "cfgcleanup.h"
 #include "insn-config.h"
 #include "insn-attr.h"
 #include "params.h"
@@ -7661,6 +7662,10 @@ sel_sched_region (int rgn)
 static void
 sel_global_init (void)
 {
+  /* Remove empty blocks: their presence can break assumptions elsewhere,
+ e.g. the logic to invoke update_liveness_on_insn in sel_region_init.  */
+  cleanup_cfg (0);
+
   calculate_dominance_info (CDI_DOMINATORS);
   alloc_sched_pools ();

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Thu Apr 12 15:40:44 2018
New Revision: 259348

URL: https://gcc.gnu.org/viewcvs?rev=259348&root=gcc&view=rev
Log:
sel-sched: move cleanup_cfg before calculate_dominance_info (PR 85354)

PR rtl-optimization/85354
* sel-sched-ir.c (sel_init_pipelining): Move cfg_cleanup call...
* sel-sched.c (sel_global_init): ... here.



Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/sel-sched.c

[Bug rtl-optimization/85354] [8 regression] ICE with gcc.dg/graphite/pr84872.c starting with r259313

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Blocks||84659
 Resolution|--- |FIXED

--- Comment #5 from Alexander Monakov  ---
Fixed.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
[Bug 84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in
bb_note) w/ selective scheduling

[Bug rtl-optimization/84659] [6/7 Regression] ICE: Segmentation fault (stack overflow in bb_note) w/ selective scheduling

2018-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84659
Bug 84659 depends on bug 85354, which changed state.

Bug 85354 Summary: [8 regression] ICE with gcc.dg/graphite/pr84872.c starting 
with r259313
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85354

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

Alexander Monakov  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org

--- Comment #6 from Alexander Monakov  ---
Candidate patch for gcc-9 stage 1:

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 95e1e0df2d5..4708fc328c6 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3207,7 +3207,8 @@ df_insn_refs_collect (struct df_collection_rec
*collection_rec,
   if (CALL_P (insn_info->insn))
 df_get_call_refs (collection_rec, bb, insn_info, flags);

-  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
+  if (asm_noperands (PATTERN (insn_info->insn)) >= 0
+  && volatile_insn_p (PATTERN (insn_info->insn)))
 for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (global_regs[i])
{

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #4 from Alexander Monakov  ---
Can you please share tree and rtl dumps for the nice testcase in comment #3 by
re-running it with -fdump-tree-all -fdump-rtl-all and attaching a tar.gz with
those? I could not reproduce it either, so having the dumps might help us see
what's different on our side.

(and an additional archive for a non-failing run without
-fselective-scheduling2 might be helpful too)

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #7 from Alexander Monakov  ---
The testcase is not easily reproducible because the rs6000 backend has some
implicit dependencies on capabilities of configure-time binutils, and they are
not visible as 'gcc -v' flags.

So, to reproduce this we need to know the version and configure flags of cross
binutils that were found and checked by gcc's configure.

[Bug rtl-optimization/84842] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #8 from Alexander Monakov  ---
Or as Jakub (thanks!) noted on IRC, gcc/auto-host.h from the build tree may be
also helpful and simpler for us to work with.

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #8 from Alexander Monakov  ---
Can you also run the tests under 'perf stat'?

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

--- Comment #13 from Alexander Monakov  ---
This is most likely a variant of 

  https://bugzilla.redhat.com/show_bug.cgi?id=1421121

so hitting this bug requires a specific CPU model.

It looks as if SSE-AVX transition penalties appear when switching between
pure-SSE sinf code and VEX-prefixed SSE code in the main program after the
ld.so runtime resolver affects AVX state tracking in the CPU.

I'm not sure if any patches have landed on Glibc side to avoid this, but in any
case this should be re-reported against Glibc if needed, GCC cannot improve the
situation.

An easy workaround would be to pass -Wl,-z,now when linking.

[Bug tree-optimization/85416] Massive performance regression when switching on "-march=native"

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85416

Alexander Monakov  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #14 from Alexander Monakov  ---
Ah, the linked report actually says very clearly that fixes landed in Glibc
2.25, so I'll close this bug: nothing to do on GCC side about this.

[Bug rtl-optimization/84842] [7/8 Regression] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-17 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-04-17
 Ever confirmed|0   |1

--- Comment #11 from Alexander Monakov  ---
Thanks, I managed to reproduce it. The unusual thing here is hardreg 63 being
considered call-clobbered in its reg_raw_mode=TImode but not narrower modes. We
have

(insn 97 29 98 4 (set (reg:DI 63 31 [160])
(unspec:DI [
(reg:SI 29 29)
] UNSPEC_LFIWAX)) "pr84842.i":5 344 {lfiwax}
 (expr_list:REG_DEAD (reg:SI 29 29)
(nil)))

and sched-deps noting a REG_DEP_OUTPUT dependence on regno 63 against a
preceding call insn according to rs6000_hard_regno_call_part_clobbered
(regno=63, mode=E_TImode). I assume what the backend in conveying there is that
only the low part of the register will be preserved by callees.

However, when we move up the instruction we don't have a dependence. The LHS is
DImode, so that seems correct as well: sched-deps had a more conservative
answer because its dependence lists are not separated per mode.

Andrey, does the above make sense? Can the assert be relaxed?

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-20 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #7 from Alexander Monakov  ---
Or rather like this:

diff --git a/gcc/df-scan.c b/gcc/df-scan.c
index 95e1e0df2d5..732705c0385 100644
--- a/gcc/df-scan.c
+++ b/gcc/df-scan.c
@@ -3207,11 +3207,11 @@ df_insn_refs_collect (struct df_collection_rec
*collection_rec,
   if (CALL_P (insn_info->insn))
 df_get_call_refs (collection_rec, bb, insn_info, flags);

-  if (asm_noperands (PATTERN (insn_info->insn)) >= 0)
+  if (GET_CODE (PATTERN (insn_info->insn)) == ASM_INPUT)
 for (unsigned i = 0; i < FIRST_PSEUDO_REGISTER; i++)
   if (global_regs[i])
{
- /* As with calls, asm statements reference all global regs. */
+ /* As with calls, basic asms reference all global regs. */
  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],
 NULL, bb, insn_info, DF_REF_REG_USE, flags);
  df_ref_record (DF_REF_BASE, collection_rec, regno_reg_rtx[i],

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-04-21 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #8 from Alexander Monakov  ---
Unfortunately the above doesn't fully address the issue, as schedulers and
other passes still have no idea that DF makes those assumptions and will allow
reordering of asms:

register int r asm("ebx");

int f(int x, int y)
{
int t = x/y/r;
asm("#asm" );
return t-x;
}

_Z1fii:
#APP
#asm
#NO_APP
movl%edi, %eax
cltd
idivl   %esi
cltd
idivl   %ebx
subl%edi, %eax
ret

See how the asm is first, even though from DF point of view it should remain
after the read of %ebx for division by r; here cprop_hardreg makes the
offending propagation.

So currently GCC has a rather split personality when it comes to deps w.r.t
global reg vars in asm statements. The documentation should spell out the
intended behavior. My suggestion is to require that references are exposed to
the compiler via constraints, allowing to remove the ad-hoc treatment in DF. I
intend to do that early in stage 1.

[Bug rtl-optimization/85423] [8 Regression] ICE in code_motion_process_successors, at sel-sched.c:6403

2018-04-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Blocks||80463
 Resolution|--- |FIXED

--- Comment #7 from Alexander Monakov  ---
Thanks. I've added one more "Blocks" edge so indicate that this should be taken
when backporting the earlier patch.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463
[Bug 80463] [6/7 Regression] ICE with -fselective-scheduling2 and
-fvar-tracking-assignments

[Bug rtl-optimization/80463] [6/7 Regression] ICE with -fselective-scheduling2 and -fvar-tracking-assignments

2018-04-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80463
Bug 80463 depends on bug 85423, which changed state.

Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at 
sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-04-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 85423, which changed state.

Bug 85423 Summary: [8 Regression] ICE in code_motion_process_successors, at 
sel-sched.c:6403
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85423

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug inline-asm/85546] GCC assumes volatile asm block returns same value in loop

2018-04-27 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85546

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #3 from Alexander Monakov  ---
I'm not sure Richard is correct about the definition of volatile asms: similar
to reads of volatile objects, volatile asms can produce different output on
each invocation (iow they are not pure/const).

In any case the inline asm in io() is missing clobbers for rcx, r11 and memory,
which makes the bug invalid.

[Bug rtl-optimization/84842] [7/8/9 Regression] ICE in verify_target_availability, at sel-sched.c:1569

2018-04-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84842

--- Comment #14 from Alexander Monakov  ---
Thanks. I think the root cause on this x86_64 testcase is different.

Arseny, in the meantime if by chance you have another x86_64 variant of this
failure that doesn't require -funroll-all-loops, please post it as well.

[Bug rtl-optimization/85673] ICE in create_pre_exit, at mode-switching.c:451

2018-05-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85673

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2018-05-06
 CC||abel at gcc dot gnu.org,
   ||amonakov at gcc dot gnu.org
 Blocks||84301
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
PR 84301 is related (not backported to 6/7 so failure is expected there).

The fix was incomplete because 'cant_move' insn flag only restricts inter-block
motion (argh!), so sel-sched is still free to move %eax assignment up. Oops.

Perhaps we can additionally set sched_group_p in add_branch_dependences for
pre-RA sel-sched to ensure insns stay at the end of basic block; after reload
that would also pin mutex_p cond-exec insns to BB end as well.

(apropos: flag_sched_group_heuristic should be removed, the way it's used in
rank_for_schedule is not a heuristic, but a correctness requirement)

Overall I'm concerned that mode-switching is making unreasonable assumptions,
if it really needs that some insns stay in sequence just before function
return, they should be arranged to have a barrier insn or SCHED_GROUP_P from
the beginning. So maybe it's better to adjust mode-switching instead, but
unfortunately it's not quite obvious how it works :)


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84301
[Bug 84301] [6/7 Regression] ICE in create_pre_exit, at mode-switching.c:451

[Bug target/85683] [8 Regression] GCC 8 stopped using RMW (Read Modify Write) instructions on x86[_64]

2018-05-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85683

Alexander Monakov  changed:

   What|Removed |Added

   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-05-07
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
Smaller testcase:

void f(void);

void g(int *p)
{
if (!--*p)
f();
}

On gcc-7.3 this is optimized by the peephole2 pass so it doesn't really help
with register pressure (combine pass seems more suitable for that); don't know
why the peephole doesn't trigger on gcc-8.

[Bug tree-optimization/85757] New: tree optimizers fail to fully clean up fixed-size memcpy

2018-05-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85757

Bug ID: 85757
   Summary: tree optimizers fail to fully clean up fixed-size
memcpy
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

This is minimized from one of suboptimal stack consumption issues in gcc_qsort.

gcc_qsort uses code similar to this to move potentially-unaligned data:

void f(int n, char *p0, char *p1, char *p2, char *o)
{
int t0, t1;
__builtin_memcpy(&t0, p0, 1);
__builtin_memcpy(&t1, p1, 1);
if (n==3) __builtin_memcpy(o+2, p2, 1);
__builtin_memcpy(o+0, &t0, 1);
__builtin_memcpy(o+1, &t1, 1);
}

Note the mismatch between memcpy size (1) and temporaries' size (4).

If the sizes match, there's no problem. If not, tree optimizers fail to fully
clean up the copies (and, unlike in this minimal testcase, in full gcc_qsort
RTL optimizers can't clean it up either and we get dead stack stores). The
.optimized dump reads (note dead writes to t0 and t1 in BB 2):

f (int n, char * p0, char * p1, char * p2, char * o)
{
  int t1;
  int t0;
  unsigned char _4;
  unsigned char _7;
  unsigned char _12;

   [local count: 1073741825]:
  _4 = MEM[(char * {ref-all})p0_3(D)];
  MEM[(char * {ref-all})&t0] = _4;
  _7 = MEM[(char * {ref-all})p1_6(D)];
  MEM[(char * {ref-all})&t1] = _7;
  if (n_9(D) == 3)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 365072220]:
  _12 = MEM[(char * {ref-all})p2_11(D)];
  MEM[(char * {ref-all})o_10(D) + 2B] = _12;

   [local count: 1073741825]:
  MEM[(char * {ref-all})o_10(D)] = _4;
  MEM[(char * {ref-all})o_10(D) + 1B] = _7;
  t0 ={v} {CLOBBER};
  t1 ={v} {CLOBBER};
  return;

}

[Bug tree-optimization/85758] New: questionable bitwise folding (missing single use check?)

2018-05-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85758

Bug ID: 85758
   Summary: questionable bitwise folding (missing single use
check?)
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

The following should be translated as-is:

void f(int a, int b);
void g(int a, int b, int m, int s)
{
m &= s;
a += m;
m ^= s;
b += m;
f(a, b);
}

However instead of and/add/xor/add we get mov/not/and/and/add/add:

movl%edx, %eax
notl%edx
andl%ecx, %eax
andl%edx, %ecx
addl%eax, %edi
addl%ecx, %esi
jmp f

This is because forwprop applies an identity to m = (m & s) ^ s:

g (int a, int b, int m, int s)
{
   :
  m_3 = m_1(D) & s_2(D);
  a_5 = a_4(D) + m_3;
  m_6 = m_3 ^ s_2(D);
  b_8 = b_7(D) + m_6;
  f (a_5, b_8);
  return;
}

gimple_simplified to _11 = ~m_1(D);
m_6 = s_2(D) & _11;
g (int a, int b, int m, int s)
{
  int _11;

   :
  m_3 = m_1(D) & s_2(D);
  a_5 = m_3 + a_4(D);
  _11 = ~m_1(D);
  m_6 = s_2(D) & _11;
  b_8 = m_6 + b_7(D);
  f (a_5, b_8);
  return;
}

However since m_3 is used, this is more costly. Shouldn't this folding check
for single use of the intermediate expr? From a quick look, this is probably
match.pd:/* Fold (X & Y) ^ Y and (X ^ Y) & Y as ~X & Y.  */

[Bug target/41084] Filling xmm register with all bit set is not optimized

2018-05-15 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41084

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #2 from Alexander Monakov  ---
Starting from gcc-4.5 (released in 2010) GCC emits pcmpeq for the
explicit-constructor variant (where it would previously emit a load) as well as
for a more concise form:

  __m128i r = {-1, -1};

The implicit variant with _mm_cmpeq_epi32 is optimized as expected starting
with gcc-5 (released in 2015).

So as far as I can see both issues raised in this report have been addressed in
the meantime. If there are other cases that are not well optimized, please let
us know (they deserve separate bug reports).

[Bug c++/85783] alloc-size-larger-than fires incorrectly with new[] and can't be disabled

2018-05-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85783

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2018-05-16
 CC||amonakov at gcc dot gnu.org
 Resolution|WONTFIX |---
 Ever confirmed|0   |1

--- Comment #10 from Alexander Monakov  ---
Reopening: the request to be able to disable the warning (via
-Wno-alloc-size-larger-than) is valid and should be addressed.

[Bug rtl-optimization/80318] GCC takes too much RAM and time compiling a template file (var-tracking)

2018-05-22 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80318

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
Second largest seems to be the frontend, as with -fsyntax-only we still need
18s and 1.8GB (this is 8.1 with release checking):

Time variable   usr   sys  wall
  GGC
 phase setup:   0.00 (  0%)   0.00 (  0%)   0.00 (  0%)
   1381 kB (  0%)
 phase parsing  :   4.01 ( 22%)   0.80 ( 30%)   4.82 ( 23%)
 519422 kB ( 27%)
 phase lang. deferred   :  13.96 ( 78%)   1.83 ( 70%)  15.82 ( 77%)
1414614 kB ( 73%)
 |name lookup   :   1.89 ( 11%)   0.35 ( 13%)   2.08 ( 10%)
  99986 kB (  5%)
 |overload resolution   :   8.94 ( 50%)   1.29 ( 49%)  10.10 ( 49%)
 934750 kB ( 48%)
 garbage collection :   1.79 ( 10%)   0.00 (  0%)   1.80 (  9%)
  0 kB (  0%)
 preprocessing  :   0.14 (  1%)   0.12 (  5%)   0.37 (  2%)
   2890 kB (  0%)
 parser (global):   0.58 (  3%)   0.21 (  8%)   0.73 (  4%)
 115783 kB (  6%)
 parser struct body :   0.74 (  4%)   0.09 (  3%)   0.77 (  4%)
  81383 kB (  4%)
 parser enumerator list :   0.01 (  0%)   0.00 (  0%)   0.00 (  0%)
364 kB (  0%)
 parser function body   :   0.05 (  0%)   0.03 (  1%)   0.07 (  0%)
   4688 kB (  0%)
 parser inl. func. body :   0.10 (  1%)   0.01 (  0%)   0.12 (  1%)
   6402 kB (  0%)
 parser inl. meth. body :   0.41 (  2%)   0.06 (  2%)   0.39 (  2%)
  27538 kB (  1%)
 template instantiation :  13.86 ( 77%)   2.06 ( 78%)  16.02 ( 78%)
1694216 kB ( 88%)
 constant expression evaluation :   0.26 (  1%)   0.04 (  2%)   0.27 (  1%)
729 kB (  0%)
 varconst   :   0.01 (  0%)   0.00 (  0%)   0.03 (  0%)
 39 kB (  0%)
 symout :   0.02 (  0%)   0.01 (  0%)   0.07 (  0%)
  0 kB (  0%)
 TOTAL  :  17.97  2.63 20.65   
1935427 kB

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

--- Comment #9 from Alexander Monakov  ---
Author: amonakov
Date: Wed May 23 15:01:28 2018
New Revision: 260613

URL: https://gcc.gnu.org/viewcvs?rev=260613&root=gcc&view=rev
Log:
df-scan: remove ad-hoc handling of global regs in asms

PR rtl-optimization/79985
* df-scan.c (df_insn_refs_collect): Remove special case for
global registers and asm statements.

testsuite/
* gcc.dg/pr79985.c: New testcase.

Added:
trunk/gcc/testsuite/gcc.dg/pr79985.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/df-scan.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/79985] ICE in code_motion_path_driver, at sel-sched.c:6580

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #10 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2018-05-23 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 79985, which changed state.

Bug 79985 Summary: ICE in code_motion_path_driver, at sel-sched.c:6580
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79985

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug bootstrap/85921] /gcc/c-family/c-warn.c fails to build

2018-05-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85921

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #5 from Alexander Monakov  ---
Glibc bits/sigcontext.h should not include Linux asm/sigcontext.h (but it used
to on i386).

This was fixed back in 2012 for Glibc 2.16 by this Glibc commit:
https://sourceware.org/git/?p=glibc.git;a=commit;h=48495318fa5ae223a8b777ed144bd769d9f6c67f

I doubt this warrants a change on GCC side, given that a workaround is simple.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
   Last reconfirmed||2018-05-29
 CC||amonakov at gcc dot gnu.org
 Resolution|DUPLICATE   |---
 Ever confirmed|0   |1

--- Comment #7 from Alexander Monakov  ---
Reopening, the issue here is way more subtle than bug 323 and points to a
possible issue in DOM. Hopefully Richi can have a look and comment.

It appears dom2 pass performs something like jump threading based on
compile-time-evaluated floating-point expression values without also
substituting those expressions in IR. At run time, they are evaluated to
different values, leading to an inconsistency. Namely, dom2 creates bb 10:

  :
  # iftmp.1_1 = PHI <"true"(7), "false"(8), "true"(10)>
  printf ("(a6 == b6) = %s\n", iftmp.1_1);
  return 0;

  :
  _24 = __n2_13 * 1.0e+6;
  b6_25 = (guint64) _24;
  printf ("a6 = %llu\n", 1);
  printf ("b6 = %llu\n", b6_25);
  goto ;

where jump to bb 9 implies that _24 evaluates to 1.0 and b6_25 to 1, but they
are not substituted as such, and at run time evaluate to 0.99... and 0 due to
excess precision.

The following reduced testcase demonstrates the same issue, but requires
-fdisable-tree-dom3 (on gcc-6 at least, as otherwise dom3 substitutes results
of compile-time evaluation).

__attribute__((noinline,noclone))
static double f(void)
{
  return 1e-6;
}

int main(void)
{
  double a = 1e-6, b = f();

  if (a != b) __builtin_printf("uneq");

  unsigned long long ia = a * 1e6, ib = b * 1e6;

  __builtin_printf("%lld %s %lld\n", ia, ia == ib ? "==" : "!=", ib);
}

[Bug target/85961] scratch register rsi used after function call

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85961

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
You'd need to disable IPA-RA after forcing -O2 with the pragma, i.e.:

#pragma GCC optimize "O2"
#pragma GCC optimize "no-ipa-ra"

We already have logic to disable IPA-RA when instrumentation/profiling is
active, but it's done once in toplev.c. Here the pragma re-enables IPA-RA after
toplev.c:process_options() has disabled it.

Do we want to adjust it given that "pragma optimized" is documented as "not
suitable for production use"?

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #8 from Alexander Monakov  ---
To expand a bit: DOM makes the small testcase behave as if 'b' and 'ib' are
evaluated twice:

* one time, 'b' is evaluated in precision matching 'a' (either infinite or
double), and 'ib' is evaluated to 1; this instance is used in 'ia == ib'
comparison;
* a second time, 'b' is evaluated in extended precision and 'ib' is evaluated
to 0; this instance is passed as the last argument to printf.

This is surprising as the original program clearly evaluates 'b' and 'ib' just
once.

If there's no bug in DOM and the observed transformation is allowed to happen
when -fexcess-precision=fast is in effect, I think it would be nice to mention
that in the compiler manual.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #9 from Alexander Monakov  ---
Sorry, the above comment should have said 'b * 1e6' every time it said 'b'.

[Bug c/85957] i686: Integers appear to be different, but compare as equal

2018-05-29 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85957

--- Comment #10 from Alexander Monakov  ---
Also note that both the original and the reduced testcase can be tweaked to
exhibit the surprising transformation even when -fexcess-precision=standard is
enabled. A "lazy" way is via -mpc64, but I think it's possible even without the
additional option (by making the code more convoluted to enforce rounding to
double). Here's what happens on the reduced testcase:

$ gcc -m32 d.c -O -fdisable-tree-dom3 && ./a.out 
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
1 == 0

$ gcc -m32 d.c -O -fdisable-tree-dom3 -fexcess-precision=standard -mpc64 &&
./a.out 
cc1: note: disable pass tree-dom3 for functions in the range of [0, 4294967295]
0 == 1

[Bug target/85994] Comparison failure in 64-bit libgcc *_{sav,res}ms64*.o on Solaris/x86

2018-05-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85994

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #2 from Alexander Monakov  ---
Why does this affect only new files, i.e. how did existing libgcc .S files
avoid running into the same issue?

[Bug c/86026] Document and/or change allowed operations on integer representation of pointers

2018-06-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
Please add full testcase source, the snippet is missing (at least) declarations
of 'g' and 't'. The Godbolt link does not work correctly for me right now, and
in general such links are not reliable long-term.

[Bug c/86026] Document and/or change allowed operations on integer representation of pointers

2018-06-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86026

--- Comment #3 from Alexander Monakov  ---
Tree optimizations already manage to avoid "optimizing" f_intadd, but
unfortunately on RTL types and casts are not visible in IR and various passes
make no distinction between (char*)((uintptr_t)t + o) and (t + o).

Perhaps GCC should consider lowering pointer-to-integer casts to a
non-transparent assignment, making the result alias all for the purposes of RTL
alias analysis, akin to

char __attribute__ ((noinline)) f_intadd1(ptrdiff_t o) {
  g = 1;
  uintptr_t t1 = (uintptr_t)t;
  asm("" : "+g"(t1));
  *(char*)(t1 + o) = 2;
  return g;
}

[Bug rtl-optimization/49330] Integer arithmetic on addresses optimised with pointer arithmetic rules

2018-06-04 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=49330

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #16 from Alexander Monakov  ---
What do you think about the suggestion made in the most recent duplicate,
namely expanding GIMPLE pointer-to-integer casts to non-transparent RTL
assignments, i.e. going from

  val = (intptr_t) ptr;

to

  asm ("" : "=g" (rval) : "0" (rptr));

Wouldn't this plug the hole in one shot instead of chasing down missing
REG_POINTERs in multiple RTL passes?

[Bug tree-optimization/86071] -O0 -foptimize-sibling-calls doesn't optimize

2018-06-06 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86071

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
In GCC there's no way to selectively enable a few optimizations with their -f
flags at -O0 level: -O0 means that optimizations are completely disabled,
regardless of -f flags. This is mentioned in the manual:

  "Most optimizations are only enabled if an -O level is set on the command
line.  Otherwise they are disabled, even if individual optimization flags are
specified."


Tail call optimization sometimes is not applied because there's an escaping
local variable (possibly from an inlined function), and GCC does not take into
account its life range. This might be what you're seeing at -O3. There's a
recent report: PR 86050.

[Bug tree-optimization/86072] Poor codegen with atomics

2018-06-07 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86072

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
As for the segfault mentioned in comment 0, this is not a compiler bug: it's
the assembler segfaulting, and it segfaults even with an empty source, so it's
probably an issue/misconfiguration on the godbolt.org side.

[Bug c/86093] [8/9 Regression] volatile ignored on pointer in C

2018-06-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86093

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-06-08
 CC||amonakov at gcc dot gnu.org
  Known to work||7.3.0
Summary|volatile ignored on pointer |[8/9 Regression] volatile
   |in C|ignored on pointer in C
 Ever confirmed|0   |1
  Known to fail||8.1.0, 9.0

--- Comment #1 from Alexander Monakov  ---
gcc-7 got this right.

[Bug c++/86094] New: [8/9 Regression] Call ABI changed for small objects with defaulted ctor

2018-06-08 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094

Bug ID: 86094
   Summary: [8/9 Regression] Call ABI changed for small objects
with defaulted ctor
   Product: gcc
   Version: 8.0
Status: UNCONFIRMED
  Keywords: ABI, wrong-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

When compiling the following with -O2 -std=c++11:

struct S {
S(S&&) = default;
int i;
};

S foo(S s)
{
return s;
}

gcc-7 and earlier emit

_Z3foo1S:
movl%edi, %eax
ret

but gcc-8 and trunk emit

_Z3foo1S:
movl(%rsi), %edx
movq%rdi, %rax
movl%edx, (%rdi)
ret

i.e. the object is now passed in memory rather than on register. This appears
to be a silent ABI change.

(Clang generates the same code as gcc-7)

[Bug rtl-optimization/86096] ICE: qsort checking failed (error: qsort comparator non-negative on sorted output: 0)

2018-06-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86096

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2018-06-09
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
df_mw_compare has:

   if (mw1->mw_reg != mw2->mw_reg)
 return mw1->mw_order - mw2->mw_order;

Note mw_reg in the 'if' vs mw_order in the 'return'. This is invalid.

It's simpler and more efficient to just use mw_order as the last tie-breaker
regardless of mw_reg value.

[Bug c++/86094] [8/9 Regression] Call ABI changed for small objects with defaulted ctor

2018-06-11 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86094

--- Comment #3 from Alexander Monakov  ---
-fabi-version=12 is not documented, not mentioned in release notes, and not
wired up in -Wabi.

[Bug c/86150] Trunk Segmentation Fault

2018-06-14 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86150

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
This is the *assembler* segfaulting, not the *compiler*. The assembly produced
by trunk is not different from gcc-8 output on empty input, so it's probably
some weird issue with Binutils installation for gcc-trunk worker(s) on Godbolt
side.

[Bug c/86174] Poor vectorization/register allocation with omp simd, FMA

2018-06-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86174

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #1 from Alexander Monakov  ---
It might be useful to note that what the testcase "wants" to happen is for the
compiler to notice that the temporary array 'double C[Si][Sk]' does not need to
live in memory - ideally it would correspond to 8 256-bit (or 4 512-bit)
registers.

[Bug lto/86175] LTO code generator does not respect ld -u option to force symbol inclusion in the link product

2018-06-18 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86175

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
> but really gets an empty blob from the LTO plugin for foo.

Are you sure about this? Compiling with -save-temps shows that the symbol is
present in GCC's assembly output; specifying --print-gc-sections also shows
that the linker is discarding it:

/usr/bin/ld.bfd: Removing unused section '.text.KeepMe' in file
'/tmp/ccWbtSKK.ltrans0.ltrans.o'


Gold linker does not exhibit this (try -fuse-ld=gold). Can you report it
against the BFD linker at sourceware.org/bugzilla?

[Bug rtl-optimization/87273] [8/9 Regression] ICE in merge_fences, at sel-sched-ir.c:708

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87273

--- Comment #6 from Alexander Monakov  ---
Author: amonakov
Date: Mon Apr  1 15:20:13 2019
New Revision: 270059

URL: https://gcc.gnu.org/viewcvs?rev=270059&root=gcc&view=rev
Log:
sel-sched: remove assert in merge_fences (PR 87273)

2019-04-01  Andrey Belevantsev  

PR rtl-optimization/87273
* sel-sched-ir.c (merge_fences): Remove assert.

* gcc.dg/pr87273.c: New test.

Added:
trunk/gcc/testsuite/gcc.dg/pr87273.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/87273] [8 Regression] ICE in merge_fences, at sel-sched-ir.c:708

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87273

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org
Summary|[8/9 Regression] ICE in |[8 Regression] ICE in
   |merge_fences, at|merge_fences, at
   |sel-sched-ir.c:708  |sel-sched-ir.c:708

--- Comment #7 from Alexander Monakov  ---
Fixed on the trunk.

[Bug rtl-optimization/86928] ICE in compute_live, at sel-sched.c:3097

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928

--- Comment #4 from Alexander Monakov  ---
Author: amonakov
Date: Mon Apr  1 16:32:24 2019
New Revision: 270061

URL: https://gcc.gnu.org/viewcvs?rev=270061&root=gcc&view=rev
Log:
sel-sched: update liveness in redirect_edge_and_branch hooks (PR 86928)

2019-04-01  Andrey Belevantsev  

PR rtl-optimization/86928
* sel-sched-ir.c (sel_redirect_edge_and_branch_force): Invoke
compute_live if necessary.
(sel_redirect_edge_and_branch): Likewise.

* gcc.dg/pr86928.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr86928.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/86928] ICE in compute_live, at sel-sched.c:3097

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Alexander Monakov  ---
I didn't have any better ideas, so fixed via comment #2.

[Bug rtl-optimization/85099] [meta-bug] selective scheduling issues

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85099
Bug 85099 depends on bug 86928, which changed state.

Bug 86928 Summary: ICE in compute_live, at sel-sched.c:3097
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug rtl-optimization/85412] [8/9 Regression] ICE in put_TImodes, at sel-sched.c:7191

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85412

--- Comment #14 from Alexander Monakov  ---
Author: amonakov
Date: Mon Apr  1 18:05:08 2019
New Revision: 270065

URL: https://gcc.gnu.org/viewcvs?rev=270065&root=gcc&view=rev
Log:
sel-sched: correct reset of reset_sched_cycles_p (PR 85412)

2019-04-01  Andrey Belevantsev  

PR rtl-optimization/85412
* sel-sched.c (sel_sched_region): Assign reset_sched_cycles_p before
sel_sched_region_1, not after.

* gcc.dg/pr85412.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr85412.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/85412] [8 Regression] ICE in put_TImodes, at sel-sched.c:7191

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85412

Alexander Monakov  changed:

   What|Removed |Added

Summary|[8/9 Regression] ICE in |[8 Regression] ICE in
   |put_TImodes, at |put_TImodes, at
   |sel-sched.c:7191|sel-sched.c:7191

--- Comment #15 from Alexander Monakov  ---
Fixed on the trunk.

[Bug testsuite/89916] New test case gcc.dg/pr86928.c fails on 64 bit targets (r270061)

2019-04-01 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89916

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2019-04-02
 Blocks||86928
   Assignee|unassigned at gcc dot gnu.org  |amonakov at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Alexander Monakov  ---
Thanks. I assume the test should not attempt to add -m32 and this line needs to
be removed:

/* { dg-additional-options "-m32" { target powerpc*-*-* } } */


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928
[Bug 86928] ICE in compute_live, at sel-sched.c:3097

[Bug testsuite/89916] New test case gcc.dg/pr86928.c fails on 64 bit targets (r270061)

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89916

--- Comment #2 from Alexander Monakov  ---
Author: amonakov
Date: Tue Apr  2 11:04:22 2019
New Revision: 270087

URL: https://gcc.gnu.org/viewcvs?rev=270087&root=gcc&view=rev
Log:
testsuite: do not try to add -m32 (PR 89916)

PR testsuite/89916
* gcc.dg/pr86928.c: Do not attempt to add -m32.


Modified:
trunk/gcc/testsuite/ChangeLog
trunk/gcc/testsuite/gcc.dg/pr86928.c

[Bug rtl-optimization/86928] ICE in compute_live, at sel-sched.c:3097

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86928
Bug 86928 depends on bug 89916, which changed state.

Bug 89916 Summary: New test case gcc.dg/pr86928.c fails on 64 bit targets 
(r270061)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89916

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

[Bug testsuite/89916] New test case gcc.dg/pr86928.c fails on 64 bit targets (r270061)

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89916

Alexander Monakov  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/85876] ICE in move_op_ascend, at sel-sched.c:6164

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85876

--- Comment #2 from Alexander Monakov  ---
Author: amonakov
Date: Tue Apr  2 15:39:22 2019
New Revision: 270095

URL: https://gcc.gnu.org/viewcvs?rev=270095&root=gcc&view=rev
Log:
sel-sched: fixup reset of first_insn (PR 85876)

2019-04-02  Andrey Belevantsev  

PR rtl-optimization/85876
* sel-sched.c (code_motion_path_driver): Avoid unwinding first_insn
beyond the original fence.

* gcc.dg/pr85876.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr85876.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched.c
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/85876] ICE in move_op_ascend, at sel-sched.c:6164

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85876

--- Comment #3 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/84206] ICE in get_all_loop_exits, at sel-sched-ir.h:1138

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84206

--- Comment #2 from Alexander Monakov  ---
Author: amonakov
Date: Tue Apr  2 15:45:57 2019
New Revision: 270096

URL: https://gcc.gnu.org/viewcvs?rev=270096&root=gcc&view=rev
Log:
sel-sched: skip outer loop in get_all_loop_exits (PR 84206)

2019-04-02  Andrey Belevantsev  

PR rtl-optimization/84206
* sel-sched-ir.h (get_all_loop_exits): Avoid the outer loop when
iterating over loop headers.

* gcc.dg/pr84206.c: New test.


Added:
trunk/gcc/testsuite/gcc.dg/pr84206.c
Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched-ir.h
trunk/gcc/testsuite/ChangeLog

[Bug rtl-optimization/84206] ICE in get_all_loop_exits, at sel-sched-ir.h:1138

2019-04-02 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84206

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #3 from Alexander Monakov  ---
Fixed.

[Bug rtl-optimization/90007] [9 Regression] ICE in extract_constrain_insn_cached, at recog.c:2223

2019-04-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

--- Comment #2 from Alexander Monakov  ---
We have a pseudo:SI<-hardreg:SI assignment followed by
pseudo:DF<-float(pseudo:SI) conversion, and we substitute the latter through
the former, creating a pseudo:DF<-float(hardreg:SI) insn that fails in recog.

I'm not exactly sure why RA would reject reloading the operand when it's a
hardreg, but happily reload when it's a pseudo. Am I missing something obvious,
or are such constraints written down somewhere?

[Bug rtl-optimization/90007] [9 Regression] ICE in extract_constrain_insn_cached, at recog.c:2223

2019-04-10 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90007

--- Comment #4 from Alexander Monakov  ---
Well, often sel-sched just does not discriminate hardregs and pseudos when
checking if renaming/substitution may be applied. Sure, as a matter of
efficiency we should probably disallow substitution through such mixed
pseudo=hardreg assignments.

Nevertheless, if it's not only a matter of optimization, but also of internal
consistency, then I'd like to understand it better. Hence the question in
comment #2.

[Bug translation/90061] ARM cortex-M hard fault on 64 bit sized object store to unaligned address

2019-04-12 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90061

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2019-04-12
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #2 from Alexander Monakov  ---
Please provide an example, as a simple smoke-test is compiled correctly:

long f(struct hardwareExample *h)
{
return h->a + h->b;
}

produces

f:
ldr r2, [r0, #1]  @ unaligned
ldr r0, [r0, #5]  @ unaligned
add r0, r0, r2
bx  lr

[Bug c/90106] builtin sqrt() ignoring libm's sqrt call result

2019-04-16 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90106

Alexander Monakov  changed:

   What|Removed |Added

 Status|RESOLVED|NEW
   Last reconfirmed||2019-04-16
 CC||amonakov at gcc dot gnu.org
 Resolution|INVALID |---
 Ever confirmed|0   |1

--- Comment #6 from Alexander Monakov  ---
Reopening and confirming, GCC's code looks less efficient than possible for no
good reason.

CDCE does

y = sqrt (x);
 ==>
y = IFN_SQRT (x);
if (__builtin_isless (x, 0))
sqrt (x);

but it could do

y = IFN_SQRT (x);
if (__builtin_isless (x, 0))
y = sqrt (x);

(note two assignments to y)

or to mimic LLVM's approach:

if (__builtin_isless (x, 0))
y = sqrt (x);
else
y = IFN_SQRT (x);

[Bug inline-asm/90193] [8/9 Regression] asm goto with TLS "m" input operand generates incorrect assembler in O1 and O2

2019-04-19 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90193

Alexander Monakov  changed:

   What|Removed |Added

 Target||x86_64-*-*, i?86-*-*
 Status|UNCONFIRMED |NEW
  Known to work||7.3.0
   Keywords||wrong-code
   Last reconfirmed||2019-04-20
 CC||amonakov at gcc dot gnu.org
 Ever confirmed|0   |1
Summary|asm goto with TLS "m" input |[8/9 Regression] asm goto
   |operand generates incorrect |with TLS "m" input operand
   |assembler in O1 and O2  |generates incorrect
   ||assembler in O1 and O2
  Known to fail||8.3.0, 9.0

--- Comment #1 from Alexander Monakov  ---
split1 transforms JUMP_INSN with the asm into a plain INSN, after which the cfg
becomes corrupted in various ways.

[Bug c/90253] New: no warning for cv-qualified selectors in _Generic

2019-04-25 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90253

Bug ID: 90253
   Summary: no warning for cv-qualified selectors in _Generic
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

Controlling expression in _Generic undergoes lvalue conversion, so it will have
const/volatile qualifiers stripped. Therefore, a qualified selector cannot
possibly match, and it might make sense to warn when a user writes one.

In the following example the function returns 0, even though the user might
have expected to distinguish 'char' vs. 'const char':

int f(const char *c)
{
return _Generic(*c, const char: 1, char: 0);
}

[Bug tree-optimization/90292] GCC Fails to hoist loop invariant in nested loops

2019-04-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90292

Alexander Monakov  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||amonakov at gcc dot gnu.org
 Resolution|--- |INVALID

--- Comment #1 from Alexander Monakov  ---
The compiler cannot perform this hoisting, because the computation 'n*(i) +
(j)' happens in 'unsigned int' type, where wrapping overflow matters when
pointers are 64-bit.

If the testcase is changed to either use 'int' (undefined overflow) or
'unsigned long' (same as pointer size), the desired hoisting is performed, as
far as I can tell. Therefore, closing as invalid.

(it's not rare that using size_t for array indexes helps optimization, this is
one of such examples)

[Bug tree-optimization/90292] GCC Fails to hoist loop invariant in nested loops

2019-04-30 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90292

--- Comment #3 from Alexander Monakov  ---
When changing iterators to 'int', you also need to change n to int as well,
otherwise in 'n*(i) + (j)', i and j are promoted to unsigned anyway.

[Bug rtl-optimization/88879] [9/10 Regression] ICE in sel_target_adjust_priority, at sel-sched.c:3332

2019-05-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88879

--- Comment #10 from Alexander Monakov  ---
Author: amonakov
Date: Thu May  9 18:13:28 2019
New Revision: 271039

URL: https://gcc.gnu.org/viewcvs?rev=271039&root=gcc&view=rev
Log:
sel-sched: allow negative insn priority (PR 88879)

PR rtl-optimization/88879
* sel-sched.c (sel_target_adjust_priority): Remove assert.


Modified:
trunk/gcc/ChangeLog
trunk/gcc/sel-sched.c

[Bug rtl-optimization/88879] [9 Regression] ICE in sel_target_adjust_priority, at sel-sched.c:3332

2019-05-09 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88879

Alexander Monakov  changed:

   What|Removed |Added

Summary|[9/10 Regression] ICE in|[9 Regression] ICE in
   |sel_target_adjust_priority, |sel_target_adjust_priority,
   |at sel-sched.c:3332 |at sel-sched.c:3332

--- Comment #11 from Alexander Monakov  ---
Fixed on the trunk.

[Bug c/90452] New: no warning for misaligned pointer to #pragma-pack'ed fields

2019-05-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90452

Bug ID: 90452
   Summary: no warning for misaligned pointer to #pragma-pack'ed
fields
   Product: gcc
   Version: 9.0
Status: UNCONFIRMED
  Keywords: diagnostic
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: amonakov at gcc dot gnu.org
  Target Milestone: ---

GCC 9 introduced a new warning (-Waddress-of-packed-member) for situations
where the code tries to assign address of an under-aligned struct field to a
normal pointer (which should point to a properly aligned object).

However, the new warning doesn't trigger if unaligned fields appear under
#pragma pack (although funny that it works fine when -fpack-struct is given on
the command line):

#pragma pack(push,1)
struct
//__attribute__((packed))
s {
char c;
long l;
} *s;
#pragma pack(pop)

long *f()
{
return &s->l;
}

[Bug target/90061] ARM cortex-M hard fault on 64 bit sized object store to unaligned address

2019-05-13 Thread amonakov at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90061

Alexander Monakov  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #5 from Alexander Monakov  ---
One common cause for such issues is taking the address of an under-aligned
field, assigning the misaligned address to a normal 'long *' pointer, and then
trying to access the field via that pointer. Sometimes the compiler propagates
alignment info for such pointers, but in general it can't (the code should use
a typedef for a misaligned pointer).

GCC 9 implemented a new warning, -Waddress-of-packed-member, to catch such
misuse (although sadly it wouldn't trigger here due to #pragma pack, see PR
90452).

Of course without seeing an example it's impossible to say what went wrong.
Closing, please reopen or file a new bug when a testcase is available.

<    1   2   3   4   5   6   7   8   9   10   >