[Bug c/113973] New: Pleas issue a warning when using plain character values in bitwise operations

2024-02-17 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113973

Bug ID: 113973
   Summary: Pleas issue a warning when using plain character
values in bitwise operations
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
  Target Milestone: ---

This example program compiles without any kind of warning in gcc:

static char x = 0xD8;

int main(void)
{
return 0x1200 | x;
}

The value returned from main is 0xFFD8 on architectures with 32-bit int and
signed characters by default. After just fixing a bug that was caused by an
unexpected sign expansion when building an int from individual bytes, I'd
rather have a warning if

1) A variable of type char is promoted to int.
2) The int value is used in an bitwise expression
3) More than 8 bits of the results are actually used
4) More than 8 bits may be non-zero

Because of condition 3, this will yiels no warning on "char y = x | 0x40;" (top
bits truncated, so condition 3 fails) and no warning on "int y = x & 0x40;"
(all high bits are guaranteed to be zero, so condition 4 fails).

The real-world bug that motivates this enhancement proposal is
https://github.com/hfst/hfst-ospell/issues/43

[Bug c/108483] gcc warns about suspicious constructs for unevaluted ?: operand

2023-01-20 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483

--- Comment #3 from Michael Karcher  ---
Thanks for the pointer to #4210. Note that 4210 is slightly different, though.
In that report, the condition and the warnable expression are in different
statements, and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=4210#c13
explicitly mentioned using a ternary expression to enable gcc to see the
deadness of the code, using a flag called "skip_evaluation".

This PR concerns a case that uses ?:, so I wonder whether skip_evaluation still
exists, and could be used to gate the sizeof-pointer-div warning.

[Bug c/108483] New: gcc warns about suspicious constructs for unevaluted ?: operand

2023-01-20 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108483

Bug ID: 108483
   Summary: gcc warns about suspicious constructs for unevaluted
?: operand
   Product: gcc
   Version: 10.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
  Target Milestone: ---

Created attachment 54318
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=54318=edit
minimal example

A well-known construct to determine array sizes at compile time is

#define ARRAY_SIZE(x) (sizeof(x)/sizeof(*(x)))

gcc helpfully warns for dangerous mis-use of this macro, as it only works with
real arrays, not with pointer, for example. Assuming NULL is defined as
((void*)0), ARRAY_SIZE(NULL) yields a valid C expression, as long as we use the
gcc extension that sizeof(void) equals one:

ARRAY_SIZE(NULL) is expanded to essentially sizeof(void*)/sizeof(void)

which yields 8 on usual 64-bit systems and 4 on usual 32-bit system. While this
expression is valid, the result of this expression is likely not what the
programmer intended, so the gcc warning "division ‘sizeof (void *) / sizeof
(void)’ does not compute the number of array elements" is warranted.

The Linux kernel contains a macro essentially being

#define ARRAY_SIZE_MAYBENULL(x)  ( __builtin_types_compatible_p(typeof(x),
void*) ? 0 : (sizeof(x)/sizeof(*x)) )

which is intended to be invocable using actual array operands (returning the
array size) or the compile-time constant NULL (returning zero).

gcc correctly evaluates ARRAY_SIZE_MAYBENULL(NULL) to zero, but emits about the
suspicious pattern in the third operand of the ternary operator. This is not
helpful for the programmer, and breaks builds using -Wall -Werror.

This is a feature request to omit warnings about dubious constructs like this
if it can be statically determined that they are not evaluated.

The example in the attachment compiles correctly and initializes x to 1, but
emits the spurious warning about the unevaluated sizeof pattern.

[Bug target/94504] On powerpc, -ffunction-sections -fdata-sections is not as effective as expected for non-PIE executables.

2020-04-07 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94504

--- Comment #5 from Michael Karcher  ---
I got the command line of gcc wrong. "-pie" just sets the linker flags for PIE
linking, but it does *not* compile source code as PIE. If I use "-fpie",
garbage collection does what it is supposed to do.

As I found out, the acutal problem I have is completely unrelated to gcc (sorry
for the noise here), because it concerns object files created by rustc (which
has an llvm backend) linked by binutil's ld tool. At the moment I was able to
reproduce the same symptom I have with those rust objects with a ten-line C
program, I considered fixing the problem (if possible) for C first a good idea.

As it stands now, the issue in gcc (no effective garbage collection for non-PIE
executables with function pointers in global data structures) still stands, so
I am not closing the bug right away.

[Bug target/94504] On powerpc, -ffunction-sections -fdata-sections is not as effective as expected for PIE executables.

2020-04-07 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94504

--- Comment #3 from Michael Karcher  ---
(In reply to Richard Biener from comment #2)
> Huh, looking at the assembly & the object file this seems to be fully a linker
> issue who seems to be responsible for building the GOT.  I suggest to move
> this over to sourceware.  Alan?

I'm not so sure about this. I think it might actually be a limitation of ELF on
ppc32:

If I compare the code generated by gcc on ppc32, gcc does output a GOT fragment
in the .got2 section that references all globals that are used in the current
object:

$ objdump -r -j .got2 bla32.o

bla32.o: file format elf32-powerpc

RELOCATION RECORDS FOR [.got2]:
OFFSET   TYPE  VALUE
 R_PPC_ADDR32  fptr
0004 R_PPC_ADDR32  x

This section can not be elided, because it is referenced from main:

$ objdump -r -j .text.main bla32.o

bla32.o: file format elf32-powerpc

RELOCATION RECORDS FOR [.text.main]:
OFFSET   TYPE  VALUE
0022 R_PPC_REL16_HA.got2+0x8006
0026 R_PPC_REL16_LO.got2+0x800a

The linker has no obvious way to detect which GOT slots are actually used
inside main, because the offset(s) of the slot(s) inside .got2 used by main are
hardcoded inside main.

On the other hand, on ppc64, there is no GOT fragment generated by the
compiler, but instead the compiler just asks the linker to create a GOT slot
and fill in the necessary information:

$ objdump -r -j .text.main bla64.o

bla64.o: file format elf64-powerpc

RELOCATION RECORDS FOR [.text.main]:
OFFSET   TYPE  VALUE
000e R_PPC64_TOC16_HA  x
0012 R_PPC64_TOC16_LO  x

[Bug target/94504] New: On powerpc, -ffunction-sections -fdata-sections is not as effective as expected for PIE executables.

2020-04-06 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94504

Bug ID: 94504
   Summary: On powerpc, -ffunction-sections -fdata-sections is not
as effective as expected for PIE executables.
   Product: gcc
   Version: 9.3.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
  Target Milestone: ---

I try to compile the following test program using

gcc -ffunction-sections -fdata-sections -pie -Wl,--gc-sections input.c

The linking step fails, because g is not defined. On most architectures except
powerpc (32 bit), the garbage collection is able to discard both fptr and f,
and so the reference to g vanishes. This is not the case on powerpc, where f is
referencing fptr through a GOT entry in the .got2 section.

This issue (I don't dare to call it "bug" yet) is the cause of of
https://bugs.debian.org/955845. The librsvg build process is quirky and
building the tests only works if garbage collection is able to collect a hughe
amount of dead code. Garbage collection is able to do that on all architectures
Debian tried it on except for powerpc (and possibly ppc64, see
https://bugs.debian.org/895723). The attached example program does compile fine
on ppc64, though.

/* dead code */
extern void g(int x, ...);
extern void (*fptr)();
void f()
{
/* using fptr creates a GOT entry for fptr */
g(0, fptr);
}
/* fptr is reference from the GOT. Let's reference f from fptr */
void (*fptr)() = f;

/* Non-dead code */
int x = 5;
int main(void)
{
/* using x goes through the GOT. This prevents the GC to kill it */
return x;
}

[Bug target/88592] New: Passing of packed structures on sparc64 different than in clang

2018-12-25 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88592

Bug ID: 88592
   Summary: Passing of packed structures on sparc64 different than
in clang
   Product: gcc
   Version: 8.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
  Target Milestone: ---

As noted while filing a bug on rustc because rustc does not correctly implement
the Sparc64 ABI regarding floating point fields of "small" structures, I
stumbled across a difference between gcc an clang, as shown in
https://github.com/rust-lang/rust/issues/57103 (the important piece is quoted
below):

struct str3 {
  float f;  // passed in most-significant half of %o0 (gcc) or in %f0 (clang)
  int i;// passed in least-significant half of %o0
} __attribute__((packed));

Without __attribute__((packed)), clang, gcc and the ABI standard agree to pass
f in %f0. The ABI standard doesn't contain anything about packed structures, so
I don't see a way to decide whether gcc or clang is right. I report a bug on
both products to raise awareness.

[Bug go/79037] gccgo: Binaries crash with parforsetup: pos is not aligned on m68k

2017-01-20 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79037

--- Comment #10 from Michael Karcher  ---
OK, I got it. I retract my last comment.

[Bug go/79037] gccgo: Binaries crash with parforsetup: pos is not aligned on m68k

2017-01-19 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79037

--- Comment #8 from Michael Karcher  ---
The patch looks like it should work fine, I guess John Paul Adrian Glaubitz is
going to test it soon. But I wonder whether the determination of alignment is
in types.cc really needed, as user-specified alignment directives can only make
alignment stricter. So always passing 4 to implicit variable should do no harm
and keep the code simpler.

[Bug go/79037] gccgo: Binaries crash with parforsetup: pos is not aligned on m68k

2017-01-18 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79037

--- Comment #5 from Michael Karcher  ---
The root issue now is that the ABI gcc implements on m68k is incompatible with
the Go runtime shipped with gcc.

The Go runtime uses the lowest two bits in the type information pointer as
flags (called PRECISE and LOOP) and relies on the fact that the actual type
information structure is aligned to a 32-bit boundary. The type descriptors are
Go literals created by the Go frontend as standard Go structures.

In the gcc backend for Go, the layout and alignment rules of Go structures
match the rules for C structures (which is most likely necessary for
interoperability). On m68k this means we get 16-bit alignment for historical
reasons. The current interface between the go frontend and its backend has no
interface to request stricter alignment, so the Go frontend is unable to ensure
that type descriptors are aligned to 32-bit boundaries.

A possible way of solving this would be to extend
Backend::struct_type(vector<...> fields) by an (optional?) second parameter to
communicate alignment requests, such that Gcc_backend::struct_type can use
DECL_ALIGN before calling fill_in_struct. This enables the Go frontend to
request the required alignment for its type descriptors. Requesting increased
alignment for Go type descriptors is impossible to break legacy ABIs on m68k as
there is no ABI involving Go type descriptors yet.

This change should possibly go align with adding "__attribute__((aligned(4)))"
to several types in libgo/runtime/go-type.h if type descriptors are ever
created from the C side of things.

Proof for the issue:

(sid-m68)root:~# nm --dynamic /usr/lib/m68k-linux-gnu/libgo.so.9 | grep __go_td
 | head -n 6
00bcc410 V __go_td_AAAN5_uint8ee1e
00bcc440 V __go_td_AAAN5_uint8ee1e$gc
00bd0058 V __go_td_AAAN5_uint8eee
00bd0080 V __go_td_AAAN5_uint8eee$gc
00bfc4e6 V __go_td_AAAN6_uint329e3e16e
00bfc5d2 V __go_td_AAAN6_uint329e3e16e$gc

shows unaligned type descriptors.

[Bug tree-optimization/68008] New: Pessimization of simple non-tail-recursive functions

2015-10-18 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68008

Bug ID: 68008
   Summary: Pessimization of simple non-tail-recursive functions
   Product: gcc
   Version: 5.1.1
   URL: http://stackoverflow.com/questions/32974625
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
  Target Milestone: ---

Created attachment 36537
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36537=edit
preprocessed source code

As discussed on Stack Overflow in http://stackoverflow.com/questions/32974625
(Misleading title: "Is gcc's asm volatile equivalent to the gfortran default
setting for recursions?"), gcc pessimizes the example fibonacci function, which
can be avoided by "-fno-optimize-sibling-calls" (which is the reason I assigned
the bug to tree-optimization)

The non-optimal code is generated at least by
  gcc 4.5.3 (Debian 4.5.3-12)
  gcc version 4.9.2 (Debian 4.9.2-10)
  gcc version 5.1.1 20150507 (Debian 5.1.1-5)

I will focus on gcc 5.1.1 in the remaining report.

Configured with: ../src/configure -v --with-pkgversion='Debian 5.1.1-5'
--with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr
--program-suffix=-5 --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=c++98 --disable-libstdcxx-dual-abi
--enable-gnu-unique-object --disable-vtable-verify --enable-libmpx
--enable-plugin --with-system-zlib --disable-browser-plugin
--enable-java-awt=gtk --enable-gtk-cairo
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-amd64/jre --enable-java-home
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-amd64
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-amd64
--with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--enable-objc-gc --enable-multiarch --with-arch-32=i586 --with-abi=m64
--with-multilib-list=m32,m64,mx32 --enable-multilib --with-tune=generic
--enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu

gcc is invoked as: g++-5 -O3 -march=core2 -std=c++0x
stack-overflow-32974625.cc; ./a.out
compared to g++-5 -O3 -march=core2 -std=c++0x -fno-optimize-sibling-calls
stack-overflow-32974625.cc; ./a.out


There are no error or warning messages printed by the compiler or linker (even
if warning would be enabled by -Wall -Wextra)

The file stack-overflow-32974625.ii is attached.

Timing with g++ 5.1.1 on a Core2Duo T7200 under 64-bit linux (cpu frequency
scaling set to "performance"):
With -fno-optimize-sibling-calls: 34.5us/iteration
Without -fno-optimize-sibling-calls: 68us/iteration


[Bug target/67002] [5] [SH]: Bootstrap stage 2/3 comparison failure - gcc/real.o differs

2015-08-05 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67002

Michael Karcher gcc-bugzilla at mkarcher dot dialup.fu-berlin.de changed:

   What|Removed |Added

 CC||gcc-bugzilla at mkarcher dot 
dialu
   ||p.fu-berlin.de

--- Comment #15 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Created attachment 36134
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=36134action=edit
even further reduced test case

I manually reduced the example even more. Interestingly it doesn't contain any
C++ specific statements any more, yet it only fails -fcompare-debug with -x
c++, but passes with -x c


[Bug target/67002] [5] [SH]: Bootstrap stage 2/3 comparison failure - gcc/real.o differs

2015-08-05 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67002

--- Comment #16 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
The bug seems to be quite similar to the infamous sloth that was dropped on
the head as a baby-bug Linus discovered (https://lkml.org/lkml/2014/7/24/584 ,
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61904). -fcompare-debug passes if
-fno-var-tracking-assignments is passed as compiler option.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-22 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #18 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
As I said, I did not try your patch, but just read the source. The assembly you
quoted convinces me that there is no problem in the code actually produced by
your patch, which is great. This is caused by the pattern
or-then-SImode-compare, as you explained.

The or-then-SImode-compare optimization has an adverse effect on the test
coverage, it seems. In both cases, GET_MODE(src_reg) and GET_MODE(dst_reg) are
SImode, so the DImode output branch is not tested by any of your two example
source files. Furthermore, it looks like make_not_reg_insn will actually
produce bad code if it were ever called with GET_MODE(src_reg) == DImode. So I
would strongly suggest to narrow it to only accept SImode input operands, so it
fails to apply instead of generate bad code if something manages to call it
with DImode input.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-22 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #20 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
(In reply to Oleg Endo from comment #19)
  The or-then-SImode-compare optimization has an adverse effect on the test
  coverage, it seems. In both cases, GET_MODE(src_reg) and GET_MODE(dst_reg)
  are SImode, so the DImode output branch is not tested by any of your two
  example source files.
 That is true as it stands now.  However, we already anticipate that there
 might be something going on with DImode stuff, so just adding the test might
 help debugging in the future.  Even if it doesn't add any value now, it
 doesn't hurt anyone either.

The test case is not a problem - but it would be helpful to have a testcase
that actually tests the DImode output case. I understand that it likely is not
possible with today's gcc to reach that branch, so it seems this has to stay
the way it is now. I am fine with it.

  Furthermore, it looks like make_not_reg_insn will
  actually produce bad code if it were ever called with GET_MODE(src_reg) ==
  DImode.
 Please do explain.

Of course. The instructions involving src_reg in make_not_reg_insn dealing with
src_reg are completely quoted here:

+  // On SH we can do only SImode and DImode comparisons.
+  if (! (GET_MODE (src_reg) == SImode || GET_MODE (src_reg) == DImode))
+return NULL;

In this fragment, you accept DImode source operands. So that code may be used
to replace a DImode compare.

+  emit_insn (gen_rtx_SET (VOIDmode, m_ccreg,
+  gen_rtx_fmt_ee (EQ, SImode, src_reg, const0_rtx)));

In this fragment, you are generating the replacement instruction, which is
always an SImode compare. Maybe I miss the point, but I fail to undestand how
an SImode compare might be acceptable on an DImode operand. Possibly, this even
ICEs, I don't know enough about gcc internals to know what happens if src_reg
is DImode which is passed to EQ in SImode.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-22 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #22 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
OK, in that case I retract my objections and I think the patch is fine. I am
sorry for that mistake.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-21 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #15 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
I did not get around to test your proposed patch yet, but it seems like the new
logical not operation always compares only the low 32 bit against zero, even
if there is a 64 bit operand. If my analysis is correct, the long long test
program should fail if you replace decision = 1; by decision =
0x1LL;


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-16 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #7 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
(In reply to Oleg Endo from comment #6)
  For the transformation to be valid, you would need a logical not instruction
  instead of the bitwise not instruction that sets the desination register to
  zero if the source register contains any non-zero value, not just if the
  source register contains -1.
 
 There is a 2 insn sequence to do that on SH:
tst  rm,rm
movt rn
 
 As far as I can see it, emitting this sequence instead of the bitwise not
 insn would fix the problem.  This can be done by changing the function
 sh_treg_combine::make_not_reg_insn.  It should be safe to emit that sequence
 (which clobbers the T reg) instead of the bitwise not.

This seems definitely true. But I wonder whether this is acutally needed, or
already caught perfectly in try_eliminate_cstores. Let's have a look:

The buggy branch in try_combine_comparisons applies only if (first loop)
 - All comparisons are the same type
 - The second operands are either all the same or all registers
 - Removal of those instruction is safe
Furthermore, there need to be mixed cstores and
 - All minority comparisons need to be comparisons
 - All minority comparisons need to be against the integral constant zero

So if one combines what applies to minority stores only in the test for
applicability of the make_not_reg_insn idea with the general applicability
check of the try_combine_comparisons pass, the end result is:

The buggy branch only applies if
 - All comparisons are equality comparisons
 - The second operand needs to be the integral constant zero.

Now this is exactly what you correctly commented in example 1:

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (reg:SI 167) (xor:SI (reg:SI 147 t) (const_int 1)))
 - bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 177) (const_int 0)))
 (set (reg:SI 167) (reg:SI 147 t))
 - bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 167) (const_int 0)))
 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0))
(label_ref:SI 50) (pc)))

You are going to transform this to, assuming bb 4 is considered minority. (The
first RTL instruction in BB4 gets transformed to test rm, rm, the second RTL
instruction gets transformed to movt rn.)

 [bb 3]
 (set (reg:SI 167) (reg:SI 173))
 - bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (reg:SI 167) (xor:SI (reg:SI 147 t) (const_int 1)))
 - bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 167) (const_int 0)))
 (set (pc) (if_then_else (ne (reg:SI 147 t) (const_int 0)))
 (label_ref:SI 50) (pc)))

It is afterwards quite likely that the set instruction in bb3 can be elminated
by renaming reg 167 to reg 173.

If this transformation is removed, though, the transformation of example 4
applies instead, which will result in

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 - bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 147 t) (xor:SI (reg:SI 147 t) (const_int 1)))
 - bb 5

 [bb 5]
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
 (label_ref:SI 50) (pc)))

and this gets (except SH2A with nott) transformed to (by define_insn_and_split
nott in the machine definition)

 [bb 3]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 - bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 200) (xor:SI (reg:SI 147 t) (const_int 1)))
 (set (reg:SI 147 t) (eq:SI (reg:SI 200) (const_int 0)))
 - bb 5

 [bb 5]
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
 (label_ref:SI 50) (pc)))

which, at least in my example, seems to be transformed by register renaming and
common code elimination (combining the to treg setting instructions at the end
of bb3 and bb4) to something like

 [bb 3]
 - bb 5

 [bb 4]
 (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
 (set (reg:SI 173) (xor:SI (reg:SI 147 t) (const_int 1)))
 - bb 5

 [bb 5]
 (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
 (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
 (label_ref:SI 50) (pc)))

which is exactly the result you also get with special-casing the
compare-to-zero case.

 OK, thanks for the confirmation.  I had a closer look at the current code
 and your patch.  I think that the bitwise-not code paths should not be
 removed entirely, as they handle the mixed normal and inverting cstores
 cases.  I will prepare a patch in a couple of days.

If I am not mistaken in my analysis above, the mixed cstores case handling in
try_eliminate_cstores does already what you need, while pushing the desicision
whether those are comparisons to zero and can be further optimized on common
subexpression elimination, instead of hand-written code checking

[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-16 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #8 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Actually, the whole issue got me curious - I will try prepare a different patch
along your suggestions and compare the compiler output. If I don't report back
today, I probably won't do that in time, so don't hold your breath, though.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-16 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #10 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Created attachment 33991
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33991action=edit
Fix logical negation of registers, SImode only

In fact, it turns out, you were right. I implemented the solution you suggest
(quite hacky, as I only implemented it for SImode, I was too lazy to look up
whether there are nice patterns for testing DImode operands into T), and
compared assembler outputs. In fact, the duplicate test instruction is NOT
eliminated in my first patch, but it is with your code after fixing.

So the approach of special-casing compare-to-zero seems to have a positive
effect.

Assembler output:

buggy:
tst r4,r4
bf  .L2  ; initial if, jumps if flag is nonzero
; Branch for zero flag
mov.l   .L10,r1  ; loads address of val
mov.l   @r1,r1   ; loads value of val
; Common code for zero or non-zero flag
.L3:
tst r1,r1; combined test
bt  .L8  ; if zero, skip increment
mov.l   .L11,r2  ; load address of true_count
mov.l   @r2,r1   ; load, increment, store true_count
add #1,r1
mov.l   r1,@r2
; Label for skipping increment of true_count
.L8:
rts 
.align 1
; branch for non-zero flag
.L2: 
mov.l   .L12,r1  ; load address of decision_result
mov.l   @r1,r1   ; load data of decision_result
tst r1,r1; tst/movt logical negation before
bra .L3  ; jumping to common code
movtr1


With my earlier patch removing the compare-to-zero logic completely, the
assembler output changes as following (CAPS comments on changes):

...
mov.l   @r1,r1   ; loads value of val
tst r1,r1; TEST ONLY FOR FLAG == 0
; Common code for zero or non-zero flag
.L3:
bt  .L8  ; if zero, skip increment
...
tst r1,r1; tst/movt logical negation before
movtr1
bra .L3  ; jumping to common code
tst r1,r1; TEST ONLY FOR FLAG != 0


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-16 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #11 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Putting things straight after trying it out:

(In reply to Michael Karcher from comment #7)
[...]
 and this gets (except SH2A with nott) transformed to (by
 define_insn_and_split nott in the machine definition)
 
  [bb 3]
  (set (reg:SI 147 t) (eq:SI (reg:SI 173) (const_int 0)))
  - bb 5
 
  [bb 4]
  (set (reg:SI 147 t) (eq:SI (reg:SI 176) (const_int 0)))
  (set (reg:SI 200) (xor:SI (reg:SI 147 t) (const_int 1)))
This is WRONG. It should read (set (reg:SI 200) (reg:SI 147 t)) instead!

  (set (reg:SI 147 t) (eq:SI (reg:SI 200) (const_int 0)))
  - bb 5
 
  [bb 5]
  (set (pc) (if_then_else (eq (reg:SI 147 t) (const_int 0))  // inverted
  (label_ref:SI 50) (pc)))
 
 which, at least in my example, seems to be transformed by register renaming
 and common code elimination (combining the to treg setting instructions at
 the end of bb3 and bb4) to something like
This turns out to be wrong too. The treg setting instructions are NOT combined.
See the previous comment.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-16 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #12 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Further digging into this showed that there actually is a pass that would merge
the two tst r1,r1 instructions - the jump2 pass in cfgclenup.c.

The optimization is called crossjumping in gcc, also known as tail merging.
For some reason[1], gcc is reluctant to perform crossjumping on small common
parts. In this case, the common part is just one instruction, and the parameter
min-crossjump-insns is five by default unless optimizing for size. With
-fparam-min-crossjump-insns=1 or -Os, my first patch removing all the
special-casing of zero compares produces the same result as the patch fixing
the logical negations. In the current situation, I see no advantage of not
cross-jumping in this case, so the minimally invasive solution is definitely
the second patch to fix logical negation, but still it somehow feels ugly to me
to have a limited reimplementation of crossjumping in the sh-treg-combine pass.
Replacing stuff of that pass by improving other optimizations is IMHO beyond
the scope of this bug, though.

[1] These are actually performance reasons:
https://gcc.gnu.org/ml/gcc-patches/2004-07/msg00495.html


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-15 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

--- Comment #5 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
(In reply to Oleg Endo from comment #4)
 I'm not sure about this.  The first hunk of your patch that removes the
 example in the top comment block should be valid, as far as I can see at the
 moment.  This is because the result of the bitwise not is then again tested
 for zero before the conditional jump.

As far as I understand the SH4 architecture, TST r0,r0 sets the T bit if r0
is zero, and clears it otherwise. So
  MOV   r1,r0; store r1 to r0, for exposition only
  TST   r0,r0; T bit set if r1 == 0, cleared if r1 != 0
  MOVT  r0   ; r0 set to 1 if r1 == 0, cleared if r1 != 0
  TST   r0,r0; T bit set if r1 != 0, cleared if r1 == 0
while on the other hand
  NOT   r1,r0; bitwise negate r1 to r0 (so r0 gets 0 if r1 == -1)
  TST   r0,r0; T bit set if r1 == -1, cleared if r1 != -1

The first block clear T only if r1 equals zero, the second block clears T for
all values of r1 except -1. So they are only equivalent if you can prove that
r1 is either zero or -1, which you typically can't, especially as using 1 for
true is typical C semantic. Because it seems that it is rare you can
successfully prove that r1 either 0 or -1, I decided to remove that
transformation alltogether. And to keep the comment accurate, I had to remove
it from there, too.

For the transformation to be valid, you would need a logical not instruction
instead of the bitwise not instruction that sets the desination register to
zero if the source register contains any non-zero value, not just if the source
register contains -1.

Possibly your confusion is created by NOTT (if it were available on SH4)
being in fact valid in that example to negate the condition. This is due to the
fact, that for 1-bit values (only!) bitwise negation and boolean negation are
the same.

 I haven't looked at the details (RTL dumps etc), but looking at the
 problematic code in the description, the problem could be that the variable
 'condition' gets the wrong value assigned.  'condition' should have a value
 of either 0 or 1.  But the sh_treg_combine pass changes the value.  If the
 conditional jump is eliminated and the 'condition' value is then added to
 'truecount' directly, it will produce wrong values. 

Looking at the generated assembly code, I verified that indeed the pattern
above is used, so condition gets stored in the T bit, and a conditional
branch (BF) is used to skip the incrementation if it is false, i.e. cleared.
After applying the transformation shown above, the value 1 in decision_result
gets negated to -2 (0xFFFE), which is non-zero, so truecount gets increased
if even though decision_result should be false after boolean negation.


[Bug target/63783] [4.9/5 Regression] [SH] Miscompilation of boolean negation on SH4 using -O2

2014-11-12 Thread gcc-bugzilla at mkarcher dot dialup.fu-berlin.de
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63783

Michael Karcher gcc-bugzilla at mkarcher dot dialup.fu-berlin.de changed:

   What|Removed |Added

 CC||gcc-bugzilla at mkarcher dot 
dialu
   ||p.fu-berlin.de

--- Comment #3 from Michael Karcher gcc-bugzilla at mkarcher dot 
dialup.fu-berlin.de ---
Created attachment 33954
  -- https://gcc.gnu.org/bugzilla/attachment.cgi?id=33954action=edit
Remove assumption that not is logical negate

Indeed the treg combine pass is broken. Thanks to the nicely documented code,
it could easily be determined, that this optimization pass contains the wrong
assumption that not is an assembler instruction that can be used for logical
negation of a register. As this is a bitwise negate instruction, it can not be
used that way.

As I was unable to find a machine instruction that performs logical negation, I
prepared a patch that completely removes the parts that rely on a logical
negate instruction, while keeping all other aspects on the optimization pass
intact.