[Bug bootstrap/112497] [14 Regression] Bootstrap comparison failure: gcc/analyzer/constraint-manager.o differs on loongarch64-linux-gnu

2023-11-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112497

--- Comment #5 from Jeffrey A. Law  ---
This failure means the stage1 and stage2 compilers generated different code for
the same input.

So when I need to debug this I usually start by first getting that source code.
 Based in the title of this bugzilla you're going to want the .ii file for
constraint-manager as built by either the stage1 or stage2 compiler.

Then I feed that into the stage1 and stage2 compiler with the same optimization
options to verify that they indeed generate different code.  Sometimes that
doesn't work when the issue is debug insns, but that's where I start.

Once I have confirmed the two compilers generate different code, then I try to
isolate where/why.  This can often be done by looking a debug dumps to narrow
things down to a pass that's behaving differently.  Alternately you can replace
objects in the stage2 compiler with those from the stage1 compiler to narrow it
down to a single .o that causes the compiler's behavior to diverge.

Then it's usually a matter going into the debugger and understanding why the
given pass is behaving differently.

It's a long, painful process.

*Sometimes* you can just build the stage1 compiler and run the testsuite and
see if there are new failures on your target.  It doesn't always generate
something useful, but when it does it's often faster than the process I
mentioned above.

[Bug bootstrap/112497] [14 Regression] Bootstrap comparison failure: gcc/analyzer/constraint-manager.o differs on loongarch64-linux-gnu

2023-11-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112497

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #3 from Jeffrey A. Law  ---
If at all possible, cc Jin Ma in this since it's his change, I just reviewed
and committed the bits on Jin's behalf.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #41 from Jeffrey A. Law  ---
I would agree.  In fact,the whole point of the f-m-o pass is to bring those
immediates into the memory reference.  It'd be really useful to know why that
isn't happening.

The only thing I can think of would be if multiple instructions needed the %r20
in the RTL you attached.  Which might point to a refinement we should make in
f-m-o, specifically the transformation isn't likely profitable if we aren't
able to fold away a term or fold a constant term into the actual memory
reference.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-09 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #31 from Jeffrey A. Law  ---
IIRC r21 is call-clobbered.  So I guess the question turns into what was the
sequence before f-m-o got involved -- was it assuming r21 would be preserved,
or did f-m-o make r21 live across the call?

[Bug tree-optimization/112468] New: [14 Regression] Missed phi-opt after recent change

2023-11-09 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112468

Bug ID: 112468
   Summary: [14 Regression] Missed phi-opt after recent change
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This change:

commit 3f176e1adc6bc9cc2c21222d776b51d9f43cb66b (HEAD)
Author: Tamar Christina 
Date:   Thu Nov 9 13:59:39 2023 +

middle-end: optimize fneg (fabs (x)) to copysign (x, -1) [PR109154]

This patch transforms fneg (fabs (x)) into copysign (x, -1) which is more
canonical and allows a target to expand this sequence efficiently.  Such
sequences are common in scientific code working with gradients.

There is an existing canonicalization of copysign (x, -1) to fneg (fabs
(x))
which I remove since this is a less efficient form.  The testsuite is also
updated in light of this.

gcc/ChangeLog:

PR tree-optimization/109154
* match.pd: Add new neg+abs rule, remove inverse copysign rule.

gcc/testsuite/ChangeLog:

PR tree-optimization/109154
* gcc.dg/fold-copysign-1.c: Updated.
* gcc.dg/pr55152-2.c: Updated.
* gcc.dg/tree-ssa/abs-4.c: Updated.
* gcc.dg/tree-ssa/backprop-6.c: Updated.
* gcc.dg/tree-ssa/copy-sign-2.c: Updated.
* gcc.dg/tree-ssa/mult-abs-2.c: Updated.
* gcc.target/aarch64/fneg-abs_1.c: New test.
* gcc.target/aarch64/fneg-abs_2.c: New test.
* gcc.target/aarch64/fneg-abs_3.c: New test.
* gcc.target/aarch64/fneg-abs_4.c: New test.
* gcc.target/aarch64/sve/fneg-abs_1.c: New test.
* gcc.target/aarch64/sve/fneg-abs_2.c: New test.
* gcc.target/aarch64/sve/fneg-abs_3.c: New test.
* gcc.target/aarch64/sve/fneg-abs_4.c: New test.



Is causing a testsuite regression on moxie-elf.  This is a scan dump failure,
so you don't need a full toolchain, just a cross compiler.

moxie-sim: gcc.dg/tree-ssa/phi-opt-24.c scan-tree-dump-not phiopt2 "if"

[Bug target/112462] New: RISC-V zicond cost model enhancements

2023-11-09 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112462

Bug ID: 112462
   Summary: RISC-V zicond cost model enhancements
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

Currently the costing of zicond always returns COSTS_N_INSNS (1) which can be
inaccurate.  I see two primary issues that need to be fixed.

First, for conditions which are not equality comparisons against zero the
expander will need to emit a sCC insn.  That additional instruction needs to be
included in the cost.

Second, the expander needs to look at the true/false arms and potentially emit
additional code because of the limitations of the czero instruction.  Those
additional instructions need to be included in the cost as well.

It's unclear if we should refactor the expander logic so that its basic
structure can be used to drive costing as well as expansion logic or if we
should just mirror the basic structure with new code and keep it in sync with
the expander logic.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #26 from Jeffrey A. Law  ---
As a compiler junkie, I tend to think compiler first until I can prove it
otherwise.  I wouldn't get too hung up on aliasing issues and such at this
point.

Do we already have a dump for the key function?  Presumably f-m-o doesn't
trigger *that* much.  And if this is triggering w/o LTO we can probably move to
cross debugging and analysis of those dump files and assembly code with and
without f-m-o enabled, narrowing our focus on the key function.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #19 from Jeffrey A. Law  ---
f-m-o runs post-allocation, so the scope of where it's behavior can change
things is narrower.  So testing with -fno-schedule-insns isn't going to be
useful, but -fno-schedule-insns2 might.

I'm a bit concerned that we can't turn off f-m-o with an attribute.  That would
indicating something isn't wired up right in the options handling.

[Bug rtl-optimization/112415] [14 regression] Python 3.11 miscompiled on HPPA with new RTL fold mem offset pass, since r14-4664-g04c9cf5c786b94

2023-11-06 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112415

--- Comment #6 from Jeffrey A. Law  ---
Do we have assembly code around the faulting point (x/20i $pc) and a register
dump (i r)?  The biggest concern I'd have with f-m-o on the PA would be the
implicit segment selection that happens on the base register -- but it would
only be an issue if we are faulting on an unscaled indexed addressing mode and
only if the linux-gnu port was actually putting different values into the space
registers.

WRT testing -- we did test this on hppa1.1-linux-gnu.  Just a bootstrap and
regression test of the compiler itself.

[Bug target/111311] RISC-V regression testsuite errors with --param=riscv-autovec-preference=scalable

2023-11-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111311

--- Comment #14 from Jeffrey A. Law  ---
As Andrew said, if there's a test that depends on behavior of -INT_MIN, then
the test needs to be fixed.  That's undefined behavior.

[Bug rtl-optimization/109035] meaningless memory store on RISC-V and LoongArch

2023-11-02 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109035

--- Comment #8 from Jeffrey A. Law  ---
No spills on rv64 either.

[Bug rtl-optimization/104387] aarch64: Redundant SXTH for “bag of bits” moves

2023-11-01 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104387

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
As noted in bz111384, this can be addressed via Joern's extension DCE pass that
we're beating on right now.  Conceptually it tracks liveness of sub-word
objects within a register and when it encounters an extension that sets bits
that are never read, it eliminates the extension. Conceptually simple and we've
confirmed it addresses the issue in 111384.  I strongly suspect it would fix
this one as well.

It's still got bugs and isn't really for integration, but to date Joern's basic
approach seems the most viable for eliminating unnecessary extensions.

[Bug tree-optimization/112320] [14 Regression] crash from insert_debug_temp_for_var_def since r14-5032-ge3da1d7bb288c8

2023-10-31 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112320

--- Comment #6 from Jeffrey A. Law  ---
Created attachment 56480
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=56480=edit
Testcase for fr30-elf -Os -g

[Bug tree-optimization/112320] [14 Regression] crash from insert_debug_temp_for_var_def since r14-5032-ge3da1d7bb288c8

2023-10-31 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112320

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 CC||law at gcc dot gnu.org
   Last reconfirmed||2023-11-01
 Ever confirmed|0   |1

--- Comment #5 from Jeffrey A. Law  ---
I've bisected a failure on fr30-elf to the same commit.  The failure mode is
different, but given it's the same commit, I'm attaching the testcase to this
BZ.

0xba6a67 phi_nodes_ptr(basic_block_def*)
/home/jlaw/test/gcc/gcc/gimple.h:4700
0xba6a67 gsi_start_phis(basic_block_def*)
/home/jlaw/test/gcc/gcc/gimple-iterator.cc:935
0xba6a67 gsi_for_stmt(gimple*)
/home/jlaw/test/gcc/gcc/gimple-iterator.cc:620
0xf477c1 replace_uses_by(tree_node*, tree_node*)
/home/jlaw/test/gcc/gcc/tree-cfg.cc:2055
0x111b731 clean_up_loop_closed_phi(function*)
/home/jlaw/test/gcc/gcc/tree-ssa-propagate.cc:1296
0xd23348 loop_optimizer_finalize(function*, bool)
/home/jlaw/test/gcc/gcc/loop-init.cc:146
0x10e1448 tree_ssa_loop_done
/home/jlaw/test/gcc/gcc/tree-ssa-loop.cc:478
0x10e1498 execute
/home/jlaw/test/gcc/gcc/tree-ssa-loop.cc:507

Compile with -Os -g

[Bug target/112298] Poor code for DImode operations on H8 port

2023-10-30 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112298

Jeffrey A. Law  changed:

   What|Removed |Added

 Target||h8300
   Priority|P3  |P4

[Bug target/112298] New: Poor code for DImode operations on H8 port

2023-10-30 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112298

Bug ID: 112298
   Summary: Poor code for DImode operations on H8 port
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

long long foo(long long x) { return x << 1; }

Highlights several code inefficiencies WRT DImode values on the H8.

I would expect that defining a reasonable adddi3 and some DImode shifts would
likely help this problem considerably.

I'm not currently working on this problem.

[Bug libstdc++/107885] H8/300: libsupc++/hash_bytes.cc fix shift-count-overflow warning

2023-10-28 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107885

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jeffrey A. Law  ---
No plans to backport this.

[Bug target/111466] RISC-V: redundant sign extensions despite ABI guarantees

2023-10-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111466

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #5 from Jeffrey A. Law  ---
Fixed on the trunk now.

[Bug tree-optimization/111798] New: [14 Regression] Recent change causing testsuite regression and poor code on mcore-elf

2023-10-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111798

Bug ID: 111798
   Summary: [14 Regression] Recent change causing testsuite
regression and poor code on mcore-elf
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This change:

commit 6decda1a35be5764101987c210b5693a0d914e58
Author: Richard Biener 
Date:   Thu Oct 12 11:34:57 2023 +0200

tree-optimization/111779 - Handle some BIT_FIELD_REFs in SRA

The following handles byte-aligned, power-of-two and byte-multiple
sized BIT_FIELD_REF reads in SRA.  In particular this should cover
BIT_FIELD_REFs created by optimize_bit_field_compare.

For gcc.dg/tree-ssa/ssa-dse-26.c we now SRA the BIT_FIELD_REF
appearing there leading to more DSE, fully eliding the aggregates.

This results in the same false positive -Wuninitialized as the
older attempt to remove the folding from optimize_bit_field_compare,
fixed by initializing part of the aggregate unconditionally.

PR tree-optimization/111779
gcc/
* tree-sra.cc (sra_handled_bf_read_p): New function.
(build_access_from_expr_1): Handle some BIT_FIELD_REFs.
(sra_modify_expr): Likewise.
(make_fancy_name_1): Skip over BIT_FIELD_REF.

gcc/fortran/
* trans-expr.cc (gfc_trans_assignment_1): Initialize
lhs_caf_attr and rhs_caf_attr codimension flag to avoid
false positive -Wuninitialized.

gcc/testsuite/
* gcc.dg/tree-ssa/ssa-dse-26.c: Adjust for more DSE.
* gcc.dg/vect/vect-pr111779.c: New testcase.

Causes execute/20040709-2.c to fail on mcore-elf at -O2.  It also results in
what appears to be significantly poorer code generation.

Note I haven't managed to get mcore-elf-gdb to work, so debugging is, umm,
painful.  And I wouldn't put a lot of faith in the simulator correctness.

I have simplified the test to this:
extern void abort (void);
extern void exit (int);

unsigned int
myrnd (void)
{
  static unsigned int s = 1388815473;
  s *= 1103515245;
  s += 12345;
  return (s / 65536) % 2048;
}

struct __attribute__((packed)) K
{
  unsigned int k:6, l:1, j:10, i:15;
};

struct K sK;

unsigned int
fn1K (unsigned int x)
{
  struct K y = sK;
  y.k += x;
  return y.k;
}

void
testK (void)
{
  int i;
  unsigned int mask, v, a, r;
  struct K x;
  char *p = (char *) 
  for (i = 0; i < sizeof (sK); ++i)
*p++ = myrnd ();
  v = myrnd ();
  a = myrnd ();
  sK.k = v;
  x = sK;
  r = fn1K (a);
  if (x.j != sK.j || x.l != sK.l)
abort ();
}

int
main (void)
{
  testK ();
  exit (0);
}


Which should at least make the poor code gen obvious.  I don't expect to have
time to debug this further anytime in the near future.

[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849

2023-10-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #11 from Jeffrey A. Law  ---
Fixed by Mary's patch on the trunk.

[Bug middle-end/111777] [14 regression] build breaks after r14-4558-g400efdddf3d849

2023-10-11 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111777

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-10-12
 Ever confirmed|0   |1

--- Comment #6 from Jeffrey A. Law  ---
I would hazard a guess that Mary doesn't have a bugzilla account.  I'll drop
her a direct email.

[Bug target/93062] Failed to generate indirect branch for long branches on riscv

2023-10-10 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93062

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
 CC||law at gcc dot gnu.org

--- Comment #3 from Jeffrey A. Law  ---
This should be fixed on the trunk now.  No plans to backport to the release
branches.

[Bug bootstrap/111664] [14 regression] Fails to build with mawk (error in gcc/opt-read.awk) after r14-4354-ge4a4b8e983bac8

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111664

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #6 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug rtl-optimization/111384] missed optimization: GCC adds extra any extend when storing subreg#0 multiple times

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111384

Jeffrey A. Law  changed:

   What|Removed |Added

   Last reconfirmed||2023-10-07
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #4 from Jeffrey A. Law  ---
So this is something we've been pondering over in rv64 land.  Joern has an
extension to DCE which tracks subobjects in an attempt to determine if bits set
by sign/zero extensions are never read.  If they aren't read, then the
extension can be eliminated.

[Bug target/109414] RISC-V: unnecessary sext.w in rv64

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109414

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED
 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
These code generation inefficiences have been fixed.  I didn't bisect, but I
would hazard a guess it was Jivan's work on exposing the widening nature of the
32 bit operations and extracting the result via a promoted subreg.

ie, for the first example we now generate this during expand:

(insn 2 5 3 2 (set (reg/v:DI 136 [ x ])
(reg:DI 10 a0 [ x ])) "j.c":1:26 -1
 (nil))
(insn 3 2 4 2 (set (reg/v:DI 137 [ n ])
(reg:DI 11 a1 [ n ])) "j.c":1:26 -1
 (nil))
(note 4 3 7 2 NOTE_INSN_FUNCTION_BEG)
(insn 7 4 8 2 (set (reg:DI 140)
(sign_extend:DI (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0)
(const_int 1 [0x1] "j.c":2:12 -1
 (nil))
(insn 8 7 9 2 (set (reg:SI 139)
(subreg/s/u:SI (reg:DI 140) 0)) "j.c":2:12 -1
 (expr_list:REG_EQUAL (plus:SI (subreg/s/u:SI (reg/v:DI 136 [ x ]) 0)
(const_int 1 [0x1]))
(nil)))
(insn 9 8 10 2 (set (reg:DI 141)
(xor:DI (reg/v:DI 137 [ n ])
(subreg:DI (reg:SI 139) 0))) "j.c":2:17 -1
 (nil))
(insn 10 9 11 2 (set (reg:DI 142)
(sign_extend:DI (subreg:SI (reg:DI 141) 0))) "j.c":2:17 discrim 1 -1
 (nil))
(insn 11 10 15 2 (set (reg:DI 135 [  ])
(reg:DI 142)) "j.c":2:17 discrim 1 -1
 (nil))
(insn 15 11 16 2 (set (reg/i:DI 10 a0)
(reg:DI 135 [  ])) "j.c":3:1 -1
 (nil))
(insn 16 15 0 2 (use (reg/i:DI 10 a0)) "j.c":3:1 -1
 (nil))


Which is much easier for combine to analyze and prove the trailing sign
extension is unnecessary.

[Bug target/106271] Bootstrap on RISC-V on Ubuntu 22.04 LTS: bits/libc-header-start.h: No such file or directory

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106271

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jeffrey A. Law  ---
I wasn't aware of this BZ when I made the commit referenced in c#6.  But yes,
the whole point of that commit was to fix this problem.

[Bug target/64215] -Os misses an opportunity to merge two ret instructions

2023-10-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64215

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
Andrew, the reason the patch you referenced doesn't help this case is because
we don't have an unconditional jump to a return only block.

To optimize this case we'd have to detect that we have a return only block that
is immediately preceded by another return block after bbro.

ie:

(note 48 23 59 6 [bb 6] NOTE_INSN_BASIC_BLOCK)
(insn 59 48 49 6 (use (reg/i:SI 10 a0)) -1
 (nil))
(jump_insn 49 59 37 6 (simple_return) 346 {simple_return}
 (nil)
 -> simple_return)
;; lr  out   1 [ra] 2 [sp] 10 [a0]
;; live  out 1 [ra] 2 [sp] 10 [a0]

;;  succ:   EXIT [always]  count:52738306 (estimated locally, freq 0.4591)

;; basic block 7, loop depth 0, count 6317494 (estimated locally, freq 0.0550),
maybe hot
;;  prev block 6, next block 1, flags: (REACHABLE, RTL)
;;  pred:   2 [5.5% (guessed)]  count:6317494 (estimated locally, freq
0.0550) (CAN_FALLTHRU)
;; bb 7 artificial_defs: { }
;; bb 7 artificial_uses: { u-1(2){ }}
;; lr  in1 [ra] 2 [sp] 10 [a0]
;; lr  use   2 [sp] 10 [a0]
;; lr  def
;; live  in  1 [ra] 2 [sp] 10 [a0]
;; live  gen
;; live  kill

(code_label 37 49 36 7 4 (nil) [1 uses])
(note 36 37 60 7 [bb 7] NOTE_INSN_BASIC_BLOCK)
(insn 60 36 51 7 (use (reg/i:SI 10 a0)) -1
 (nil))
(jump_insn 51 60 41 7 (simple_return) 346 {simple_return}
 (nil)
 -> simple_return)

[Bug target/111670] H8/300 SX uses incorrect code sequences

2023-10-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111670

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P4
 Target||h8300

[Bug target/111670] New: H8/300 SX uses incorrect code sequences

2023-10-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111670

Bug ID: 111670
   Summary: H8/300 SX uses incorrect code sequences
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

The H8/SX port can create sequences like

(set (mem (autoinc (reg sp)) (reg_sp))

Here autoinc is PRE_DECEMENT or PRE_INCREMENT addressing modes.

Which is invalid RTL.

I believe this is the root cause of the following H8/SX failures in the
testsuite:


h8300-sim/-msx/-mint32: gcc.c-torture/execute/920501-6.c   -O1  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/920501-6.c   -Os  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/pr20466-1.c   -O1  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/pr20466-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/pr20466-1.c   -O2 -flto
-fuse-linker-plugin -fno-fat-lto-objects  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/pr39339.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/ssad-run.c   -O1  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/ssad-run.c   -Os  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/usad-run.c   -O1  execution test
h8300-sim/-msx/-mint32: gcc.c-torture/execute/usad-run.c   -Os  execution test

I suspect we need to break the "Q" constraint into two variants.  One which
allows autoinc addressing modes and the other does not.

For movsi/movhi  we would use the version which does not allow autoinc
addressing modes and instead use the Z0/ZA approach like the other H8 variants
are using.

I'm not currently working on this.

[Bug rtl-optimization/111467] REE failing to eliminate redundant extension due to multiple reaching def(s)

2023-09-18 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111467

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law  ---
I thought REE handled multiple reaching definition. So this is a bit of a
surprise.

[Bug target/82666] [11/12/13/14 regression]: sum += (x>128 ? x : 0) puts the cmov on the critical path (at -O2)

2023-08-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82666

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #14 from Jeffrey A. Law  ---
A better approach might be to to try and create COND_EXPRs for the conditional
move in the gimple code.  The biggest problem I see with that is the
gimple->rtl converters aren't great at creating efficient code on targets
without conditional moves.

Meaning that we could well end up improving x86, but making several other
targets worse.  

I know this because I was recently poking at a similar problem.  We expressed a
conditional move of 0, C as a multiply of a boolean by C in gimple.  It really
should just have been a COND_EXPR, but when we generate that form targets
without good conditional move expanders will end up recreating branchy code :(

[Bug driver/77576] gcc-ar doesn't work if all options are read from file

2023-07-31 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77576

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
 Status|NEW |RESOLVED
 Resolution|--- |FIXED

--- Comment #9 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/110748] RISC-V: optimize store of DF 0.0

2023-07-20 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
I'd bet it's const_0_operand not allowing CONST_DOUBLE.

The question is what unintended side effects we'd have if we allowed
CONST_DOUBLE 0.0 in const_0_operand.

[Bug tree-optimization/105832] [13/14 Regression] Dead Code Elimination Regression at -O3 (trunk vs. 12.1.0)

2023-07-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105832

--- Comment #11 from Jeffrey A. Law  ---
Looks viable to me.  Are you thinking match.pd?

[Bug target/110559] Bad mask_load/mask_store codegen of RVV

2023-07-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110559

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-07-07
 Ever confirmed|0   |1

--- Comment #2 from Jeffrey A. Law  ---
Yea, we definitely want pressure sensitive scheduling.  While it's more
valuable for scalar cases, it can help with some vector as well.  Also note
there's two variants of the pressure sensitive scheduler support.  I think we
use the newer one which is supposed to be better, but I don't think we've
really evaluated one over the other.

Setting issue rate to 1 for the first pass scheduler is a bit of a hack, though
not terribly uncommon.  It's something I've wanted to go back and review, so
fully support you digging into that as well.

[Bug tree-optimization/110460] New: [14 Regression] ft32 ICE on 931110-1.c with new TYPE_PRECISION checking

2023-06-28 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110460

Bug ID: 110460
   Summary: [14 Regression] ft32 ICE on 931110-1.c with new
TYPE_PRECISION checking
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

commit fe48f2651334bc4d96b6df6b2bb6b29fcb732a83
Author: Richard Biener 
Date:   Fri Jun 9 09:31:14 2023 +0200

Prevent TYPE_PRECISION on VECTOR_TYPEs

The following makes sure that using TYPE_PRECISION on VECTOR_TYPE
ICEs when tree checking is enabled.  This should avoid wrong-code
in cases like PR110182 and instead ICE.

It also introduces a TYPE_PRECISION_RAW accessor and adjusts
places I found that are eligible to use that.

* tree.h (TYPE_PRECISION): Check for non-VECTOR_TYPE.
(TYPE_PRECISION_RAW): Provide raw access to the precision
field.
* tree.cc (verify_type_variant): Compare TYPE_PRECISION_RAW.
(gimple_canonical_types_compatible_p): Likewise.
* tree-streamer-out.cc (pack_ts_type_common_value_fields):
Stream TYPE_PRECISION_RAW.
* tree-streamer-in.cc (unpack_ts_type_common_value_fields):
Likewise.
* lto-streamer-out.cc (hash_tree): Hash TYPE_PRECISION_RAW.

gcc/lto/
* lto-common.cc (compare_tree_sccs_1): Use TYPE_PRECISION_RAW.


One example on ft32-elf:

Tests that now fail, but worked before (13 tests):

ft32-sim: gcc.c-torture/execute/931110-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess
errors)
ft32-sim: gcc.c-torture/execute/931110-1.c   -O3 -g  (test for excess errors)
ft32-sim: gcc.dg/pr108095.c (test for excess errors)

And if you dig into the 931110-1.c failure you find:
ft32-sim: gcc.c-torture/execute/931110-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (internal compiler
error: tree check: expected none of vector_type, have vector_type in
type_has_mode_precision_p, at tree.h:6644)
ft32-sim: gcc.c-torture/execute/931110-1.c   -O3 -g  (internal compiler error:
tree check: expected none of vector_type, have vector_type in
type_has_mode_precision_p, at tree.h:6644)

It looks like SCALAR_DEST in vectorizable_operation is actually a vector type
-- meaning that STMT was already vectorized.

This is the patch I'm testing.  There are other failures that don't seem to be
fixed by this patch.  Anyway, the whole point of the change is to find these
lurking bugs.

diff --git a/gcc/tree-vect-stmts.cc b/gcc/tree-vect-stmts.cc
index d642d3c257f..3dd8a284577 100644
--- a/gcc/tree-vect-stmts.cc
+++ b/gcc/tree-vect-stmts.cc
@@ -6481,6 +6481,10 @@ vectorizable_operation (vec_info *vinfo,
   scalar_dest = gimple_assign_lhs (stmt);
   vectype_out = STMT_VINFO_VECTYPE (stmt_info);

+  /* STMT may have already been vectorized.  */
+  if (VECTOR_TYPE_P (TREE_TYPE (scalar_dest)))
+return false;
+
   /* Most operations cannot handle bit-precision types without extra
  truncations.  */
   bool mask_op_p = VECTOR_BOOLEAN_TYPE_P (vectype_out);

[Bug rtl-optimization/110423] Redundant constants not getting eliminated on RISCV.

2023-06-28 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110423

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #2 from Jeffrey A. Law  ---
So there is another broad approach we can take here.

As Vineet mentioned, this isn't really a job for PRE/LCM as those are
formulated around a requirement that they never insert an expression evaluation
in any path that did not have an evaluation before.  ie no speculative constant
loads.

We could potentially relax that condition.  I'm not sure we'd formulate it as a
PRE/LCM problem, but it gives you a sense of how we could tackle this.  The
difficulty would be in the heuristics for when to apply this transformation
since it will make some codes slower and may increase register pressure.  This
is derived heavily from Click's work in the 90s.  This would happen in gimple
most likely, though I guess one could do it in RTL if they have a high pain
threshold.

In the simplest way to think about the placement algorithm is to find the
blocks where all the uses of any given constant C occur.  A trivially correct
placement of load of that constant would be the entry block as it must dominate
every block in that set.  Of course that would make the placement quite
speculative and lengthen live ranges.  That's usually referred to an an early
placement.

Next find the latest placement for the constant load that covers all the uses. 
That will be the lowest common ancestor in the dominator tree of the set of
blocks that use the constant.

If you were to imagine a path through the dominator tree starting at the early
placement (entry) and ending at the lowest common ancestor, any block on that
path could be selected for generating the constant load and would cover every
use with that single load.  Within the set of blocks on that path, find the set
with the lowest loop nesting, then within that reduced set find those with the
deepest control nesting (or lowest estimated frequency counts).  There may be
more than one block in that final set.  Any are valid and "reasonable" choices.


Click's paper is much more general, but the same concepts apply.  His paper
doesn't cover anything like bifurcating the graph (thus allowing multiple
constant loads in an effort to reduce undesired speculation or register
allocation conflicts).

We might be able to get away with this precisely because these are constant
loads and thus subject to rematerialization later if register pressure is high.

https://courses.cs.washington.edu/courses/cse501/06wi/reading/click-pldi95.pdf

[Bug debug/110308] [14 Regression] ICE on audiofile-0.3.6: RTL: vartrack: Segmentation fault in mode_to_precision(machine_mode)

2023-06-20 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110308

--- Comment #9 from Jeffrey A. Law  ---
Right.  It's fairly common with fold-mem-offsets to end up rewriting the
address arithmetic such that we'll have an sp->gpr copy of some sort in the IL.
 We'd really like to be able to cprop that copy away.

After Manolis's fixes to that code it seemed independently commit-able so I
acked it while we iterate on the fold-mem-offsets work.  It's tickled a few
problems, but nothing that seems unmanageable right now.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #4 from Jeffrey A. Law  ---
Yea, the tests aren't great.  They'll be better shortly.  They'll test
non-constant arguments and out-of-range constants, expecting a suitable
diagnostic.  They'll also test the extrema of valid constants.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-06-19
 Status|UNCONFIRMED |NEW

--- Comment #1 from Jeffrey A. Law  ---
It looks like some of the aes patterns have the same problem.  It may just have
been Liao not understanding the difference between an operand constraint and an
operand predicate.

[Bug target/110264] internal compiler error: riscv_vector::vector_insn_info::get_avl_reg_rtx

2023-06-17 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110264

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-06-17
 Ever confirmed|0   |1
 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
Note that Pan can cherry pick it into gcc-13.  Typically folks wait a week or
so after the patch is on the trunk to see if there's any fallout.  Given that I
don't expect gcc-13.2 until late summer, we've certainly got time.

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-17 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #23 from Jeffrey A. Law  ---
risc-v doesn't have any special instructions to implement add-with-carry or
subtract-with-borrow.  Depending on who you talk do, it's either a feature or a
mis-design.

[Bug tree-optimization/110218] sink pass heuristic not working in practice

2023-06-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110218

--- Comment #2 from Jeffrey A. Law  ---
So what I think was happening was that we would sink past a bunch of
conditionals that were never going to be true thinking that we were moving to a
deeper control nest.  So the idea was to use the frequency information to avoid
movements that weren't likely to improve anything.

I don't remember how I selected the param's value though.  I've got no
objection to adjusting how this works.

[Bug rtl-optimization/110163] [14 Regression] Comparing against a constant string is inefficient on some targets

2023-06-09 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110163

--- Comment #2 from Jeffrey A. Law  ---
It is a regression for rv64.  So probably P4 would be most appropriate.

[Bug rtl-optimization/110163] New: [14 Regression] Comparing against a constant string is inefficient on some targets

2023-06-07 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110163

Bug ID: 110163
   Summary: [14 Regression] Comparing against a constant string is
inefficient on some targets
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

Comparing against a constant string is expanded by inline_string_cmp and on
some targets the generated code can be inefficient.  This can be seen in
spec2017's omnetpp benchmark, particularly when the inline string comparison
limits are increased.

The problem is the expansion code arranges to do all the arithmetic and tests
in SImode.  On RV64 this introduces a sign extension for each test  due to how
RV64 expresses 32bit ops.

It would be better to do all the computations in word_mode, then convert the
final result to SImode, at least for RV64 and likely for other targets.

I experimented with starting to build out cost checks to determine what mode to
use for the internal computations.  That ran afoul of x86 where the cost of a
byte load is different than the cost of an extended byte load, even though they
use the exact same instruction.

There's also a need to cost out the computations, test & branch in the
different modes as well once the x86 hurdle is behind us.

I've set work on this aside for now.  But the discussion can be found in these
two threads:

https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620601.html
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/620577.html

#include 
int
foo (char *x)
{
   return strcmp (x, "lowerLayout");
}

Compiled with -O2 --param builtin-string-cmp-inline-length=100 on rv64 should
show the issue.

[Bug target/110109] RISC-V: ICE when build the Intrinsic code

2023-06-04 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110109

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||law at gcc dot gnu.org
 Resolution|--- |FIXED

--- Comment #4 from Jeffrey A. Law  ---
Should be fixed on the trunk.

[Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension

2023-05-30 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #10 from Jeffrey A. Law  ---
Created attachment 55218
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55218=edit
(Incomplete) Patch

[Bug tree-optimization/108041] ivopts results in extra instruction in simple loop

2023-05-30 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108041

--- Comment #4 from Jeffrey A. Law  ---
Patch was for a different problem.  Sorry.

[Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension

2023-05-30 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #9 from Jeffrey A. Law  ---
Weird, I don't see the attachment either.  I'll extract & upload it again.

WRT costing.  fwprop and combine will both query the target rtx costs and will
reject when the target costing model indicates the change isn't actually
profitable.

As you'd noted before, combine will internally transform a sign/zero extension
into a pair of shifts.  The whole point of that internal canonicalization is to
expose cases where the shifts can combine with other nearby operations.  So
there's no significant risk to detecting and creating the extension form
earlier.

[Bug tree-optimization/108041] ivopts results in extra instruction in simple loop

2023-05-29 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108041

--- Comment #3 from Jeffrey A. Law  ---
Created attachment 55185
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55185=edit
(Incomplete) Patch

[Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension

2023-05-29 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #7 from Jeffrey A. Law  ---
Attached is what I cobbled together.  It doesn't use magic numbers.  But it
doesn't yet handle zero extensions in the simplify-rtx code.  But I think it
shows the overall direction fairly well.

[Bug tree-optimization/106888] [RISCV] Negative optimization that excess andi instructions are generated in gcc.dg/pr90838.c

2023-05-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106888

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #12 from Jeffrey A. Law  ---
Should be fixed with Raphael's patch on the trunk.

[Bug tree-optimization/109848] New: [14 Regression] Recent change causing testsuite ICE on csky port

2023-05-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109848

Bug ID: 109848
   Summary: [14 Regression] Recent change causing testsuite ICE on
csky port
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This patch:

commit cc0e22b3f25d4b2a326322bce711179c02377e6c
Author: Richard Biener 
Date:   Fri May 12 13:43:27 2023 +0200

tree-optimization/64731 - extend store-from CTOR lowering to TARGET_MEM_REF

The following also covers TARGET_MEM_REF when decomposing stores from
CTORs to supported elementwise operations.  This avoids spilling
and cleans up after vector lowering which doesn't touch loads or
stores.  It also mimics what we already do for loads.

PR tree-optimization/64731
* tree-ssa-forwprop.cc (pass_forwprop::execute): Also
handle TARGET_MEM_REF destinations of stores from vector
CTORs.

* gcc.target/i386/pr64731.c: New testcase.


Is causing the csky port to abort in forwprop with an verify_ssa failure
FAIL: gcc.dg/torture/pr52407.c   -O2  (internal compiler error: verify_ssa
failed)
FAIL: gcc.dg/torture/pr52407.c   -O2  (test for excess errors)
Excess errors:
/home/jlaw/test/gcc/gcc/testsuite/gcc.dg/torture/pr52407.c:22:1: error:
definition in block 3 follows the use
for SSA_NAME: _38 in statement:
_24 = [(vl_t *)_38];
during GIMPLE pass: forwprop
/home/jlaw/test/gcc/gcc/testsuite/gcc.dg/torture/pr52407.c:22:1: internal
compiler error: verify_ssa failed
0x11a93bf verify_ssa(bool, bool)
/home/jlaw/test/gcc/gcc/tree-ssa.cc:1203
0xe5f8a5 execute_function_todo
/home/jlaw/test/gcc/gcc/passes.cc:2105
0xe5e4de do_per_function
/home/jlaw/test/gcc/gcc/passes.cc:1694
0xe5fa4e execute_todo
/home/jlaw/test/gcc/gcc/passes.cc:2152


Testsuite is gcc.dg/torture/pr52407 can can be seen with just a cross compiler.

[Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension

2023-05-11 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #6 from Jeffrey A. Law  ---
I would still rather not introduce special cases for SUBREGs if we can avoid
it.  I think the question remains whether or not patching simplify-rtx's
canonicalize_shift is sufficient to fix this problem (perhaps with the
adjustment to fwprop as well).  If they are, then they would be much preferred
over the original patch which special cased SUBREGs.

[Bug target/109777] [14 regression] Compare-debug failure after recent changes

2023-05-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109777

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P4

--- Comment #4 from Jeffrey A. Law  ---
If it's inside the bfin bundling code, let's just mark it as a p4 and we can
chase it down whenever it's convenient.   My primary motivation is to catch
generic issues.  A target specific issue on a barely used target just isn't
that interesting IMHO.

[Bug testsuite/109776] [14 Regression] pr81192 fails on some targets after recent propagator changes

2023-05-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109776

--- Comment #7 from Jeffrey A. Law  ---
Thanks.  That took care of the xstormy16 issues.

[Bug tree-optimization/109777] New: [14 regression] Compare-debug failure after recent changes

2023-05-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109777

Bug ID: 109777
   Summary: [14 regression] Compare-debug failure after recent
changes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This change:

commit 21e2ef2dc25de318de29ec32d5390350c6717c6a (refs/bisect/bad)
Author: Andrew Pinski 
Date:   Tue May 2 00:10:46 2023 -0700

Move substitute_and_fold over to use simple_dce_from_worklist

While looking into a different issue, I noticed that it
would take until the second forwprop pass to do some
forward proping and it was because the ssa name was
used more than once but the second statement was
"dead" and we don't remove that until much later.

So this uses simple_dce_from_worklist instead of manually
removing of the known unused statements instead.
Propagate engine does not do a cleanupcfg afterwards either but manually
cleans up possible EH edges so simple_dce_from_worklist
needs to communicate that back to the propagate engine.

Some testcases needed to be updated/changed even because of better
optimization.
gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it
would
be less fragile in the future too.
gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being
matched
but in those cases, the result was not being used so both __atomic_fetch_
and
__atomic_x_and_fetch_ are valid choices and would not make a code
generation difference.
evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the
removal message
is different slightly.
kernels-alias-8.c: ccp1 is able to remove an unused load which causes
ealias to have
one less load to analysis so update the expected scan #.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/109691
* tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
argument.
If the removed statement can throw, have need_eh_cleanup
include the bb of that statement.
* tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
* tree-ssa-propagate.cc (struct prop_stats_d): Remove
num_dce.
(substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
Initialize dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
Destore dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::before_dom_children):
Set dceworklist instead of adding to stmts_to_remove.
(substitute_and_fold_engine::substitute_and_fold):
Call simple_dce_from_worklist instead of poping
from the list.
Don't update the stat on removal statements.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/evrp7.c: Update for output change.
* gcc.dg/tree-ssa/evrp8.c: Likewise.
* gcc.dg/tree-ssa/vrp35.c: Likewise.
* gcc.dg/tree-ssa/vrp36.c: Likewise.
* gcc.dg/tree-ssa/pr98737-1.c: Update scan-tree-dump-not
to check for assignment too instead of just a call.
* c-c++-common/goacc/kernels-alias-8.c: Update test
for removal of load.
* gcc.dg/pr81192.c: Rewrite testcase in gimple based test.


Is triggering a compare-debug failure on the bfin-elf port:
bfin-sim: gcc.dg/pr44023.c (test for excess errors)

If you dig into the log file:

xgcc: error: /home/jlaw/test/gcc/gcc/testsuite/gcc.dg/pr44023.c:
'-fcompare-debug' failure (length)

[Bug testsuite/109776] New: [14 Regression] pr81192 fails on some targets after recent propagator changes

2023-05-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109776

Bug ID: 109776
   Summary: [14 Regression] pr81192 fails on some targets after
recent propagator changes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

pr81192 is failing on some targets (xstormy16-elf for example) after this
change:

commit 21e2ef2dc25de318de29ec32d5390350c6717c6a
Author: Andrew Pinski 
Date:   Tue May 2 00:10:46 2023 -0700

Move substitute_and_fold over to use simple_dce_from_worklist

While looking into a different issue, I noticed that it
would take until the second forwprop pass to do some
forward proping and it was because the ssa name was
used more than once but the second statement was
"dead" and we don't remove that until much later.

So this uses simple_dce_from_worklist instead of manually
removing of the known unused statements instead.
Propagate engine does not do a cleanupcfg afterwards either but manually
cleans up possible EH edges so simple_dce_from_worklist
needs to communicate that back to the propagate engine.

Some testcases needed to be updated/changed even because of better
optimization.
gcc.dg/pr81192.c even had to be changed to be using the gimple FE so it
would
be less fragile in the future too.
gcc.dg/tree-ssa/pr98737-1.c was failing because __atomic_fetch_ was being
matched
but in those cases, the result was not being used so both __atomic_fetch_
and
__atomic_x_and_fetch_ are valid choices and would not make a code
generation difference.
evrp7.c, evrp8.c, vrp35.c, vrp36.c: just needed a slightly change as the
removal message
is different slightly.
kernels-alias-8.c: ccp1 is able to remove an unused load which causes
ealias to have
one less load to analysis so update the expected scan #.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

PR tree-optimization/109691
* tree-ssa-dce.cc (simple_dce_from_worklist): Add need_eh_cleanup
argument.
If the removed statement can throw, have need_eh_cleanup
include the bb of that statement.
* tree-ssa-dce.h (simple_dce_from_worklist): Update declaration.
* tree-ssa-propagate.cc (struct prop_stats_d): Remove
num_dce.
(substitute_and_fold_dom_walker::substitute_and_fold_dom_walker):
Initialize dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::~substitute_and_fold_dom_walker):
Destore dceworklist instead of stmts_to_remove.
(substitute_and_fold_dom_walker::before_dom_children):
Set dceworklist instead of adding to stmts_to_remove.
(substitute_and_fold_engine::substitute_and_fold):
Call simple_dce_from_worklist instead of poping
from the list.
Don't update the stat on removal statements.

[ ... ]

The compiler is complaining with this message:

/home/jlaw/test/gcc/gcc/testsuite/gcc.dg/pr81192.c: In function 'fn2':^M
/home/jlaw/test/gcc/gcc/testsuite/gcc.dg/pr81192.c:50:1: error: type mismatch
in binary expression^M
long int^M
^M
long int^M
^M
int^M
^M
iftmp2_8_14 = j_6(D) + 1;^M
/home/jlaw/test/gcc/gcc/testsuite/gcc.dg/pr81192.c:50:1: error: mismatching
comparison operand types^M
long int^M
int^M
if (c0_1_13 != 0)^M
compiler exited with status 1


I suspect the testsuite needs further twiddling to work on 16bit int targets.

[Bug tree-optimization/109721] New: [14 Regression] predcom-2 fails after recent changes

2023-05-03 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109721

Bug ID: 109721
   Summary: [14 Regression] predcom-2 fails after recent changes
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

arc-elf target.

FAIL: gcc.dg/tree-ssa/predcom-2.c scan-tree-dump-times pcom "Unrolling 2
times." 2

Bisection points to:

f385252b2336a4a57a30fddf82e558c73bcc85cc is the first bad commit
commit f385252b2336a4a57a30fddf82e558c73bcc85cc
Author: Richard Biener 
Date:   Tue May 2 10:34:48 2023 +0200

tree-optimization/109672 - properly check emulated plus during vect

The following refactors the check for emulated vector support for
the cases of plus, minus and negate.  In the PR we end up with
a SImode plus, supported by the target but emulated and in this
context fail to verify we are dealing with exactly word_mode.

PR tree-optimization/109672
* tree-vect-stmts.cc (vectorizable_operation): For plus,
minus and negate always check the vector mode is word mode.

It should be visible with a cross-compiler.  No need for a full toolchain
stack.

[Bug tree-optimization/109672] [14 regression] many ICEs after r14-323-g977a43f5ba778b

2023-04-29 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109672

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-04-29
 CC||law at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #1 from Jeffrey A. Law  ---
Similar failures on arc-elf:

arc-sim: gcc.c-torture/execute/pr36691.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  (test for excess
errors)
arc-sim: gcc.c-torture/execute/pr36691.c   -O3 -g  (test for excess errors)
arc-sim: gcc.dg/pr53749.c (test for excess errors)
arc-sim: gcc.dg/pr83480.c (test for excess errors)
arc-sim: gcc.dg/torture/pr98117.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
arc-sim: gcc.dg/torture/pr98117.c   -O3 -fomit-frame-pointer -funroll-loops
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
arc-sim: gcc.dg/torture/pr98117.c   -O3 -g  (test for excess errors)
arc-sim: gcc.dg/torture/pr98117.c   -O3 -g  (test for excess errors)

If you dig inside, they're tripping the new checking for conversions too.

[Bug testsuite/109549] [14 Regression] Conditional move regressions after r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a

2023-04-29 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109549

Jeffrey A. Law  changed:

   What|Removed |Added

 Target|x86_64-*-*  |s390
Summary|[14 Regression] cmov6.c |[14 Regression] Conditional
   |test fail after commit  |move regressions after
   |r14-53-g675b1a7f113adb1d737 |r14-53-g675b1a7f113adb1d737
   |adaf78b4fd90be7a0ed1a   |adaf78b4fd90be7a0ed1a

--- Comment #9 from Jeffrey A. Law  ---
WRT the s390 failures:

gcc.target/s390/arch13/sel-1.c scan-assembler-times \tselgr(?:h|le)\t 1
gcc.target/s390/arch13/sel-1.c scan-assembler-times \tselr(?:h|le)\t 1
gcc.target/s390/ifcvt-one-insn-bool.c scan-assembler lochinh\t%r.?,1
gcc.target/s390/ifcvt-one-insn-char.c scan-assembler locrnh\t%r.?,%r.?
gcc.target/s390/loc-1.c scan-assembler \tlochine\t%r2,-1
gcc.target/s390/loc-1.c scan-assembler \tlocrne\t%r2,%r4
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
eq:\n[^:]*\twfcdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochie\t%r2,1
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
ge:\n[^:]*\twfkdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochihe\t%r2,1
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
gt:\n[^:]*\twfkdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochih\t%r2,1
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
le:\n[^:]*\twfkdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochile\t%r2,1
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
lt:\n[^:]*\twfkdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochil\t%r2,1
gcc.target/s390/vector/vec-scalar-cmp-1.c scan-assembler
ne:\n[^:]*\twfcdb\t%v[0-9]*,%v[0-9]*\n\t[^:]+\tlochine\t%r2,1

These are also cases where the s390 cost model says these particular
if-conversion opportunities aren't profitable.   Basically the backend has no
costing model for (set (if_then_else ...))  so it recursively computes the cost
of all the sub-rtxs which ultimately turns out to be higher than the cost of
the branchy code.

I'm not qualified to address this problem as I have no sense of s390 costing. 
I'm going to have the tester regenerate new baselines for s390, but I'm not
going to actively try to fix this problem.  Also note my testing was s390, not
s390x which may be behaving differently.

[Bug target/106585] RISC-V: Mis-optimized code gen for zbs

2023-04-28 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585

--- Comment #11 from Jeffrey A. Law  ---
Coming back to this.

WRT extension elimination.  I've been pondering if we want a late pass to do a
bit of this that can't be handled by REE.

So let's take the case of a Zbs instruction operating on a variable bit in
RV64.

I think we can probably agree that in the absence of additional information we
can't do those kind of bit manipulations because we could potentially change
bit 31 and have the result escape as a parameter to a function call, return
value or get used in a compare type instruction.


So to make use of the Zbs instructions that manipulate a variable bit we could
could emit a suitable sign extension after each such operation.  That, of
course, has the potential to be expensive.

But if we chase down the uses we can probably eliminate a lot of these
extensions.  Essentially we need to know if the extension reaches a comparison,
one of the ABI escape points or a real 64bit operation.  If not, then the
extension is unnecessary and can be dropped.

Ideally we'd find that a significant number of extensions could be dropped.

We're not actively working on this, but it is something rattling around in the
empty space between my ears.

[Bug rtl-optimization/109592] Failure to recognize shifts as sign/zero extension

2023-04-28 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

--- Comment #4 from Jeffrey A. Law  ---
If we need to handle subregs here, I would suggest something like this

if (SUBREG_P (XEXP (op0, 0))
&& subreg_lowpart_p (op0)
... other tests ...

That way we know we're extracting the low word of the subreg.  But I'm not sure
at all why we need to handle them in this code.  I would expect generic
optimizers to strip away the subregs in the result if they are extraneous.

It's not clear why you check the size of the subreg modes.  It seems like this
optimization should work even for a paradoxical subreg (bitsize of inner will
be smaller than bitsize of outer).  

In general if you only have one statement in an arm of an IF-THEN-ELSE, then it
need not be inside a { } block.

Rather than using magic numbers like

INTVAL (op1) + 8 == 32

Instead use mode information.

INTVAL (op) + GET_MODE_BITSIZE (QImode) == GET_MODE_BITSIZE (SImode)
// code for QI->SI expansion

Then repeat for the other mode combinations.

Note that we probably should go ahead and support QI->HI.  While it doesn't
happen for RISC-V, it could likely happen on other architectures.  So you end
up wanting to supprot

QI->HI, QI->SI QI->DI
HI->SI, HI->DI
SI->DI

I don't know if it happens in practice, so check first to see what we do for a
zero extension variant of your original test.  If we need to handle that too,
it can be easily done by changing the shifts we recognize.

Anyway, it looks like you're on the right track.  I would suggest further
discussions happen on gcc-patches.


Anyway, it definitely looks like you're on the right track.

[Bug tree-optimization/106888] [RISCV] Negative optimization that excess andi instructions are generated in gcc.dg/pr90838.c

2023-04-21 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106888

--- Comment #10 from Jeffrey A. Law  ---
The sign_extend later gets turned into zero_extend.  Presumably because we know
the value is never negative.  That in and of itself wouldn't be a big deal as
it should be easily recognizable using any_extend.  But combine steps in and
scrambles the RTL in various unhelpful ways.

[Bug rtl-optimization/109592] New: Failure to recognize shifts as sign/zero extension

2023-04-21 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109592

Bug ID: 109592
   Summary: Failure to recognize shifts as sign/zero extension
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: law at gcc dot gnu.org
  Target Milestone: ---

This is a trivial sign extension:

int sextb32(int x)
{ return (x << 24) >> 24; }

Yet on RV64 with ZBB enabled we get:

sextb32:
slliw   a0,a0,24# 6 [c=4 l=4]  ashlsi3
sraiw   a0,a0,24# 13[c=8 l=4]  *ashrsi3_extend
ret # 21[c=0 l=4]  simple_return

We actually get a good form to optimize in simplify_binary_operation_1:

> #0  simplify_context::simplify_binary_operation (this=0x7fffda68, 
> code=ASHIFTRT, mode=E_SImode, op0=0x7fffea11eb40, op1=0x7fffea009610) at 
> /home/jlaw/riscv-persist/ventana/gcc/gcc/simplify-rtx.cc:2558
> 2558  gcc_assert (GET_RTX_CLASS (code) != RTX_COMPARE);
> (gdb) p code
> $24 = ASHIFTRT
> (gdb) p mode
> $25 = E_SImode
> (gdb) p debug_rtx (op0)
> (ashift:SI (subreg/s/u:SI (reg/v:DI 74 [ x ]) 0)
> (const_int 24 [0x18]))
> $26 = void
> (gdb) p debug_rtx (op1)
> (const_int 24 [0x18])
> $27 = void

So that's (ashiftrt (ashift (object) 24) 24), ie sign extension. 

I suspect if we fix simplify_binary_operation_1 then we'll see this get
simplified by fwprop.  I also suspect we could construct a zero extension
variant.

[Bug tree-optimization/106888] [RISCV] Negative optimization that excess andi instructions are generated in gcc.dg/pr90838.c

2023-04-21 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106888

--- Comment #8 from Jeffrey A. Law  ---
So coming back to this after a couple months, I'm confident the match.pd change
is unnecessary and in fact wrong.  So we definitely want to set that aside.

[Bug tree-optimization/106888] [RISCV] Negative optimization that excess andi instructions are generated in gcc.dg/pr90838.c

2023-04-21 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106888

--- Comment #6 from Jeffrey A. Law  ---
Comment on attachment 54905
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54905
proposed patch

So that's a subset of what we've done.  We initially thought that was going to
be enough to solve this class of problems.   But it's actually deeper than just
having a zero_extension variant of this pattern. 

I'll officially submit the zero_extension pattern and the match.pd bits.  The
other pattern we wrote is fugly and I'd like to look at it one more time.

[Bug tree-optimization/106888] [RISCV] Negative optimization that excess andi instructions are generated in gcc.dg/pr90838.c

2023-04-20 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106888

--- Comment #4 from Jeffrey A. Law  ---
Vineet, we've got some bits here you might want to play with.  I'm about to
leave for the evening, but I'll put you in touch with Raphael tomorrow
afternoon.

[Bug target/108247] Missed opportunity to generate shNadd on risc-v

2023-04-20 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108247

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #3 from Jeffrey A. Law  ---
Per c#1 and c#2.

[Bug target/108248] Some insns in the risc-v backend do not have mappings to functional units

2023-04-20 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108248

Jeffrey A. Law  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #8 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/109549] [14 Regression] cmov6.c test fail after commit r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a

2023-04-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109549

--- Comment #6 from Jeffrey A. Law  ---
And just an FYI, the tester is flagging conditional move failures for mips64-*
rx-elf and s390-linux-gnu.  Most likely these are additional cases where the
hook is indicating the transformation isn't profitable, but the target tests
are expecting the transformation to happen.

I'll debug those other targets and take appropriate action.  At this point I
don't see anything that would strongly suggest reversion of the patch, just
that we need a bit of testsuite adjustment.

[Bug target/109549] [14 Regression] cmov6.c test fail after commit r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a

2023-04-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109549

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-04-19

--- Comment #5 from Jeffrey A. Law  ---
Yea, that's exactly what's kicking in here.  The converted sequence looks like
this:

(insn 29 0 28 (set (reg:SI 86)
(const_int 10 [0xa])) 83 {*movsi_internal}
 (nil))

(insn 28 29 30 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 83 [ c ])
(const_int 0 [0]))) 7 {*cmpsi_ccno_1}
 (nil))

(insn 30 28 32 (set (reg/v:SI 85 [ e ])
(if_then_else:SI (eq (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 85 [ e ])
(reg:SI 86))) 1318 {*movsicc_noc}
 (nil))

(insn 32 30 31 (set (reg:SI 87)
(const_int 20 [0x14])) 83 {*movsi_internal}
 (nil))

(insn 31 32 33 (set (reg:CCZ 17 flags)
(compare:CCZ (reg/v:SI 83 [ c ])
(const_int 0 [0]))) 7 {*cmpsi_ccno_1}
 (nil))

(insn 33 31 0 (set (reg/v:SI 84 [ d ])
(if_then_else:SI (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(reg/v:SI 84 [ d ])
(reg:SI 87))) 1318 {*movsicc_noc}
 (nil))


Note the two movsicc_* patterns.

So the question now is what to do about it.  It looks like things are behaving
as expected, so my first inclination would be to adjust the test.  Actually
splitting it into two would likely be even better.  One would verify that by
default we do not generate a pair of cmovs for this code, the other would turn
the tuning bit off and verify that we do generate the pair of cmovs.

Happy to do whatever the x86 maintainers want here.

[Bug target/109549] [14 Regression] cmov6.c test fail after commit r14-53-g675b1a7f113adb1d737adaf78b4fd90be7a0ed1a

2023-04-19 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109549

--- Comment #4 from Jeffrey A. Law  ---
x86's tuning does have some support for avoiding multiple cmovs in a single
if-converted sequence.  I'll double check if that's kicking in here.

[Bug target/109508] [13 Regression] ICE: in extract_insn, at recog.cc:2791 with -mcpu=sifive-s76 on riscv64

2023-04-16 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109508

Jeffrey A. Law  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #3 from Jeffrey A. Law  ---
Fixed on the trunk.

[Bug target/108807] [11/12 regression] gcc.target/powerpc/vsx-builtin-10d.c fails after r11-6857-gb29225597584b6 on power 9 BE

2023-04-15 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108807

--- Comment #7 from Jeffrey A. Law  ---
Once you've committed to the active release branches where this bug is active
(11/12 in this case), you can just close the bug as resolved/fixed.  No need to
update the summary/title in that case.

Thanks,
Jeff

[Bug target/108807] [11/12/13 regression] gcc.target/powerpc/vsx-builtin-10d.c fails after r11-6857-gb29225597584b6 on power 9 BE

2023-04-14 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108807

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org

--- Comment #5 from Jeffrey A. Law  ---
Kewen, is this BZ fixed on the trunk?  If so we should update the title by
dropping the "/13" so that's not flagged as a gcc-13 regression.

[Bug target/109508] [13 Regression] ICE: in extract_insn, at recog.cc:2791 with -mcpu=sifive-s76 on riscv64

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109508

Jeffrey A. Law  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |law at gcc dot gnu.org
   Priority|P3  |P4

--- Comment #1 from Jeffrey A. Law  ---
Trivial issue in the riscv backend.  We just need to fix the operand on the
movXXcc pattern.

[Bug target/109508] [13 Regression] ICE: in extract_insn, at recog.cc:2791 with -mcpu=sifive-s76 on riscv64

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109508

Jeffrey A. Law  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Last reconfirmed||2023-04-14
 Status|UNCONFIRMED |NEW

[Bug analyzer/103602] [11/12/13 regression] Analyzer takes excessive amount of memory and time linking GNU grep with LTO

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103602

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug middle-end/103637] [12/13 Regression] missing warning writing past the end of one of multiple elements of the same array

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103637

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug rtl-optimization/103829] [10/11/12/13 Regression] missing shrink wrapping for simple/obvious code

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103829

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug analyzer/107943] [11/12/13 Regression] gcc -fanalyzer hangs in openssl curve25519.c since r11-3840-gaf66094d03779377

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107943

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug analyzer/109027] [13 Regression] ICE: SIGSEGV (infinite recursion in ana::constraint_manager::eval_condition / ana::constraint_manager::impossible_derived_conditions_p) with -fanalyzer since r13-

2023-04-13 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109027

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug middle-end/109478] FAIL: g++.dg/other/pr104989.C -std=gnu++14 (internal compiler error: Segmentation fault)

2023-04-12 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109478

--- Comment #2 from Jeffrey A. Law  ---
The pa.cc bits look reasonable.  It's been forever since I looked at this code,
but clearly using a HOST_WIDE_INT is the right thing to be doing.  While it may
not fix this bug completely, consider it pre-approved.

My PA-fu isn't what it used to be, but I strongly suspect we can't add that
constant directly.  It'd need to be broken down into a multi-instruction
sequence.  Not sure if ldil+ldo is sufficient there or not.

[Bug c/105628] [12/13 Regression] False positive with -Waddress

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105628

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug rtl-optimization/105715] [13 Regression] missed RTL if-conversion with COND_EXPR change

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105715

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug tree-optimization/105832] [13 Regression] Dead Code Elimination Regression at -O3 (trunk vs. 12.1.0)

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105832

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/105834] [13 Regression] Dead Code Elimination Regression at -O2 (trunk vs. 12.1.0)

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105834

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug target/106240] [13 Regression] missed vectorization opportunity (cond move) on mips since r13-707-g68e0063397ba82

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106240

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2

[Bug tree-optimization/106511] [13 Regression] New -Werror=maybe-uninitialized since r13-1268-g8c99e307b20c502e

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106511

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug target/107270] [10/11/12/13 Regression] return for structure is not as good as before

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107270

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/107823] [13 Regression] Dead Code Elimination Regression at -Os (trunk vs. 12.2.0)

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107823

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/108197] [12/13 Regression] -Wstringop-overread emitted on simple boost small_vector code

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108197

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/108351] [13 Regression] Dead Code Elimination Regression at -O3 since r13-4240-gfeeb0d68f1c708

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108351

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug tree-optimization/108355] [13 Regression] Dead Code Elimination Regression at -O2 since r13-2772-g9baee6181b4e42

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108355

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/108358] [13 Regression] Dead Code Elimination Regression at -Os since r13-3378-gf6c168f8c06047

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108358

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P2

[Bug tree-optimization/108360] [13 Regression] Dead Code Elimination Regression at -Os since r13-2048-g418b71c0d535bf

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108360

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P2
 CC||law at gcc dot gnu.org

[Bug libgomp/108895] [13.0.1 (exp)] Fortran + gfx90a !$acc update device produces a segfault.

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108895

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P4
 CC||law at gcc dot gnu.org

[Bug target/108947] [13 Regression] wrong code with -O2 -fno-forward-propagate and vector compare on riscv64

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108947

Jeffrey A. Law  changed:

   What|Removed |Added

   Priority|P3  |P1

--- Comment #4 from Jeffrey A. Law  ---
P1 as this look like a latent issue in combine or simplification routines.

[Bug target/109104] [13 Regression] ICE: in gen_reg_rtx, at emit-rtl.cc:1171 with -fzero-call-used-regs=all -march=rv64gv

2023-04-08 Thread law at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109104

Jeffrey A. Law  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P4

<    1   2   3   4   5   6   7   8   9   10   >