[Bug target/102783] New: [powerpc] FPSCR manipulations cannot be relied upon

2021-10-15 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102783

Bug ID: 102783
   Summary: [powerpc] FPSCR manipulations cannot be relied upon
   Product: gcc
   Version: 4.9.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

On all Power targets which support hardware floating-point, there are a few
manipulations of the Floating-Point Status and Control Register (FPSCR) that
have side-effects for subsequent floating-point computation. For example,
changing the floating-point rounding mode, or changing whether floating-point
exceptions are enabled.

There are many ways to effect those manipulations:
- The set of fenv(1) calls
- A handful of builtins:
  __builtin_fpscr_set_rn
  __builtin_mtfsf
  __builtin_mtfsb{0,1}
- Inline asm using the appropriate instructions (mffsce, mffscdrn{i},
mffscrn{i}, mtfsf{i}, mtfsb{0,1})

The problem is that if any of the above methods are not effected in an
out-of-line function, there is no way at present to restrict instruction
scheduling such that nearby floating-point computations are prevented from
moving before or after the FPSCR changes. (Possibly resulting in computation
using a wrong rounding mode, or unexpected FP exceptions.)

With asm statements, one could add artificial read and write dependencies to
the  input or output (if any) of the FPSCR manipulations and
previous/subsequent FP computations, but this is not always practicable.
(Current glibc is an example.)

[Bug target/102485] -Wa,-many no longer has any effect

2021-10-04 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102485

--- Comment #2 from Paul Clarke  ---
GCC putting the base ".machine" directive at the beginning of the file makes
any command-line use of "-many" (-Wa,-many) be ignored.  Is that OK?  "-many"
is supposed to make those black boxes just work.  This worked before recent
changes to binutils/GCC.  Is there any valid use of "-Wa,-many" now?

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-09-27 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

Paul Clarke  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #23 from Paul Clarke  ---
Tested (trunk), works for me.

[Bug target/102485] New: -Wa,-many no longer has any effect

2021-09-25 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102485

Bug ID: 102485
   Summary: -Wa,-many no longer has any effect
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

The assembler option "-many" tells the assembler to support assembly of
instructions from any vintage of processor. This can be passed through the GCC
compiler using the command line option "-Wa,-many".

The "-many" functionality has been under attack of late. ;-)

It once was that GCC would always pass "-many" to the assembler.  This was
stopped with commit e154242724b084380e3221df7c08fcdbd8460674 "Don't pass -many
to the assembler".

A recent change to binutils, commit b25f942e18d6ecd7ec3e2d2e9930eb4f996c258a
"ignore sticky options for .machine" stopped preserving "sticky"
options across a base `.machine` directive. This change caused sequences like:
.machine altivec
.machine power5
...to disable AltiVec instructions afterward, because "power5" did not support
AltiVec, and "power5" is a base ".machine" directive.

A perhaps unintended consequence is that using GCC to pass "-many" to the
assembler (via "-Wa,-many") has no effect because GCC adds a base ".machine"
directive to every(?) assembler file given to the assembler, but only passes
"-many" (no ".machine" directives are added). The assember sees the "-many"
parameter, then sees the base ".machine" directive, and suppresses any impact
of the "-many" parameter.

-- mfppr32.c:
long f () {
  long ppr;
  asm volatile ("mfppr32 %0" : "=r"(ppr));
  return ppr;
}
--
$ gcc -c ./mfppr32.c
gcc -c ./mfppr32.c
/tmp/ccAShoDb.s: Assembler messages:
/tmp/ccAShoDb.s:18: Error: unrecognized opcode: `mfppr32'
$ gcc -Wa,-many ./mfppr32.c
/tmp/cc0tRDPx.s: Assembler messages:
/tmp/cc0tRDPx.s:18: Error: unrecognized opcode: `mfppr32'
$ gcc -S -Wa,-many -O ./mfppr32.c
$ cat mfppr32.s
[edited for brevity]
.file   "mfppr32.c"
.machine ppc
.section".text"
.globl f
.type   f, @function
f:
mfppr32 3
blr
$ as mfppr32.s
mfppr32.s: Assembler messages:
mfppr32.s:12: Error: unrecognized opcode: `mfppr32'

With older binutils, this worked:
$ older-as mfppr32.s
$
--

If binutils assembler (as) is doing the right thing now with respect to the
base ".machine" directives and sticky ".machine" directives, then it would
perhaps be GCCs responsibility to build an assembler file that allows for
passing the "-many" assembler command line option through GCC and have that
continue to work as likely expected.

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-31 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #11 from Paul Clarke  ---
This does produce the issue for me:
--
$ git checkout remotes/vendors/ibm/gcc-11-branch gcc-AT
$ mkdir gcc-AT-build
$ cd gcc-AT-build
$ ../gcc-AT/configure --enable-languages=c,c++ --disable-libada
--disable-libsanitizer --disable-libssp --disable-libgomp --disable-libvtv
--disable-nls --prefix=/home/pc/gcc-AT-install
$ make
$ make install
$ ~/gcc-AT-install/bin/gcc -S -O3 -mcpu=power10 -fverbose-asm r12test2.c
$ grep --before-context=15 bctr r12test2.s
mtctr 12 # func, func
 # r12test2.c:3030: }
lwz 12,8(1)  #,
 # r12test2.c:3013: ++*p_format;
addi 9,9,1   # tmp251, *p_format_31(D),
std 9,0(31)  # *p_format_31(D), tmp251
 # r12test2.c:3030: }
ld 31,-8(1)  #,
mtcrf 8,12   #,
.cfi_restore 72
.cfi_restore 31
.cfi_restore 30
.cfi_restore 28
.cfi_restore 27
 # r12test2.c:3014: return (*func)();
bctr # func
--

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-30 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #8 from Paul Clarke  ---
$ /opt/at15.0/bin/gcc --version
gcc (GCC) 11.2.1 20210802 (Advance-Toolchain 15.0-0) [ebcfb7a665c2]

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-30 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #5 from Paul Clarke  ---
Fails with "-mcpu=power10" and "-O2" or "-O3".

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-28 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #4 from Paul Clarke  ---
Created attachment 51372
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51372=edit
preprocessed source (yet a bit smaller)

I was able to remove one of the cases of the switch statement in the function
which exhibits the issue. Interestingly, removing any of the others hides the
issue.

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-28 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #3 from Paul Clarke  ---
Created attachment 51371
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51371=edit
preprocessed source (a bit smaller)

I was able to cut out a bit more than half of the original code. It gets more
difficult from here. If this is still "too big", I can hack at it some more.

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-27 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

--- Comment #2 from Paul Clarke  ---
Created attachment 51369
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51369=edit
creduced version (tiny, but ugly)

[Bug target/102107] protocol register (r12) corrupted before a tail call

2021-08-27 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

Paul Clarke  changed:

   What|Removed |Added

  Attachment #51367|0   |1
is obsolete||

--- Comment #1 from Paul Clarke  ---
Created attachment 51368
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51368=edit
preprocessed source (large)

Attach correct file. :-/

[Bug target/102107] New: protocol register (r12) corrupted before a tail call

2021-08-27 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107

Bug ID: 102107
   Summary: protocol register (r12) corrupted before a tail call
   Product: gcc
   Version: 11.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

Created attachment 51367
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51367=edit
preprocessed source (large)

I've been working on an effort to improve Python performance, and hit an issue
when running with a libpython.so that was built with "-mcpu=power10". The
problem appears to be not correctly setting up (and preserving) register 12
before calling into a dynamically loaded, non-PCrel Python module in the form
of a shared object.

GDB shows the following instruction stream:
=> 0x77d25014 :ld  r12,0(r9)
=> 0x77d25018 :addir1,r1,112
r120x7fffe921af60  140737104686944
=> 0x77d2501c :std r10,0(r30)
=> 0x77d25020 :ld  r3,8(r9)
=> 0x77d25024 :ld  r9,0(r31)
=> 0x77d25028 :ld  r29,-24(r1)
=> 0x77d2502c :ld  r30,-16(r1)
=> 0x77d25030 :mtctr   r12
=> 0x77d25034 :lwz r12,8(r1)
r120x4000  16384
=> 0x77d25038 :addir9,r9,1
=> 0x77d2503c :std r9,0(r31)
=> 0x77d25040 :ld  r31,-8(r1)
=> 0x77d25044 :mtocrf  8,r12
=> 0x77d25048 :bctr
=> 0x7fffe921af60 :addis   r2,r12,4  
=> 0x7fffe921af64 :  addir2,r2,-12384
=> 0x7fffe921af68 :  nop
=> 0x7fffe921af6c : ld  r3,-32728(r2)
Program received signal SIGSEGV, Segmentation fault.
0x7fffe921af6c in _Py_INCREF (op=) at
../Python-3.9.6/Include/object.h:408
408 op->ob_refcnt++;

After setting r12 to the address of the caller (0x77d25014), the load at
0x77d25034 overwrites it with the CR save value just before the tail call
(bctr) at 0x77d25048, resulting in the badness when setting up and using
the TOC.

I suspect some sort of instruction scheduling issue?

I've attached a rather large pre-processed C file. It's complicated to reduce
because of functions calling other functions. I gave "creduce" a shot at it,
but it's challenging (for me, at least) to craft a script that knows what to
look for. I'll also attach the best I could get from creduce, but shield your
eyes before looking at it.

[Bug target/101893] There is no vgbbd on p7

2021-08-12 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101893

--- Comment #1 from Paul Clarke  ---
I'll take ownership of this, except I'm not sure how to effect that.

The fix has been posted
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577023.html, and awaits
reviews/approval.

[Bug debug/98875] DWARF5 as default causes perf probe to hang

2021-02-18 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98875

--- Comment #3 from Paul Clarke  ---
The IBM Advance Toolchain supports SLES 15, where the latest version of libdw
is 0.168. We'll work around the issue by reverting the commit for the version
of GCC included with the Advance Toolchain.

I didn't see any update to the GCC documentation regarding the disruptive
nature of the change causing the problem other than "[DWARF] Version 5 requires
GDB 8.0 or higher".

Should there be something about libdw as well?  Anything else?

[Bug debug/98875] New: DWARF5 as default causes perf probe to hang

2021-01-28 Thread pc at us dot ibm.com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98875

Bug ID: 98875
   Summary: DWARF5 as default causes perf probe to hang
   Product: gcc
   Version: 11.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: debug
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

I sent this to gcc-patches, but realized I should open a bug report:
--
The subject commit, 3804e937b0e252a7e42632fe6d9f898f1851a49c, causes a
failure in the test suite for the IBM Advance Toolchain.  The test in
question uses "perf probe" to set a tracepoint at "main" in a newly built
(with GCC 11) binary of "/bin/ld".  With the patch applied, the command
enters an infinte loop, calling libdw1 functions but making no progress.

The infinite loop can be found in the Linux kernel
tools/perf/utils/probe-finder.c:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/util/probe-finder.c?h=v5.11-rc5#n1190

Reverting this patch permits the command to succeed.
--
$ grep VERSION= /etc/os-release
VERSION="15-SP2"
$ uname -r
5.3.18-22-default
$ perf --version
perf version 5.3.18

Top of the GCC tree used: ATSRC_PACKAGE_REV=eb9883c1312c

Reversion patch:
--
$ cat ~/projects/gcc/gcc/gcc-revert-dwarf-5.patch 
diff --git a/gcc/common.opt b/gcc/common.opt
index a8a2b67a99d..7aff4ac6079 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3179,7 +3179,7 @@ Common Driver JoinedOrMissing Negative(gdwarf-)
 Generate debug information in default version of DWARF format.

 gdwarf-
-Common Driver Joined UInteger Var(dwarf_version) Init(5) Negative(gstabs)
+Common Driver Joined UInteger Var(dwarf_version) Init(4) Negative(gstabs)
 Generate debug information in DWARF v2 (or later) format.

 gdwarf32
--

Failing command:
$ perf probe -v -x /path/to/AT/at-next-15.0-0-alpha/bin/ld ldmain=main

[Bug target/95082] LE implementations of vec_cnttz_lsbb and vec_cntlz_lsbb are wrong

2020-05-14 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95082

Paul Clarke  changed:

   What|Removed |Added

 CC||pc at us dot ibm.com

--- Comment #2 from Paul Clarke  ---
This is a dup of bug 95070.
(I am unable to mark it as such.)

[Bug target/95070] New: vec_cntlz_lsbb implementation uses BE semantics on LE

2020-05-11 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95070

Bug ID: 95070
   Summary: vec_cntlz_lsbb implementation uses BE semantics on LE
   Product: gcc
   Version: 8.3.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

Created attachment 48512
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48512=edit
test case

This:
--
vector unsigned char a = { 0xFF, 0xFF, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00 };
int r = vec_cntlz_lsbb (a);
--
returns 14 on LE and 0 on BE. It should return 0 on both.

vec_cntlz_lsbb counts bytes with least significant bits of 0 *starting from the
lowest element number*.  In the above code, a[0] == 0xFF, so the count should
find 0 bytes.

The same issue occurs with vec_cnttz_lsbb (which should find 14 bytes in the
above example on both LE and BE, but finds 0 and 14, respectively).

[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32

2018-04-04 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402

--- Comment #8 from Paul Clarke  ---
(In reply to Steven Munroe from comment #7)
> Please try the same test with AT11 gcc7. I know I hit this!

voila!

$ /opt/at11.0/bin/gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall
-mcpu=power8 -O3
In file included from
/opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h:62:0,
 from 83402.c:2:
/opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h: In
function ‘main’:
/opt/at11.0/lib/gcc/powerpc64le-linux-gnu/7.3.1/include/emmintrin.h:1513:20:
error: argument 1 must be a 5-bit signed literal
  lshift = (__v4su) vec_splat_s32(__B);
^
$ rpm -q advance-toolchain-at11.0-devel
advance-toolchain-at11.0-devel-11.0-3.ppc64le

Now the question is whether to bother fixing this:
1. in GCC 8's rs6000/emmintrin.h, since it's not really "broken" there, and
backport that AT 11 (sounds a little silly)
2. backport the GCC 8 change that fixes this to AT 11 (sounds hard)
3. change only AT 11's emmintrin.h

Any strong opinions?  Otherwise, I'm leaning toward option (3).

[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32

2018-04-04 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402

--- Comment #6 from Paul Clarke  ---
(In reply to Steven Munroe from comment #5)
> You need to look at the generated asm code. And see what the compiler is
> doing.
> 
> Basically it should be generating a vspltisw vr,si for vec_splat_s32.
> 
> But if the immediate signed int (si) is greater than 15, should failure with:
> 
> error: argument 1 must be a 5-bit signed literal

I was hoping you'd tell me the scenario with which you saw that error.  :-)

> The vec_splats should work for any value as it will load a const vector from
> storage.
> 
> Perhaps the compiler is generating bad code and not reporting it.
> 
> Or the compiler is too smart and converting the vec_splat_s32 to the more
> general vec_splats under the covers.

I think the compiler is doing this.  Here's an extract from a (new) simple test
case:
--
out(a);
a = _mm_slli_epi32( a, 7 );
out(a);
a = _mm_slli_epi32( a, 31 );
out(a);
--
 li  r0,32
 stvxv31,r1,r0
 bl  1628 
 addis   r9,r2,-2
 vspltisw v0,7
 addir9,r9,-30976
 lvx v31,0,r9
 vslwv31,v31,v0
 xxlor   vs34,vs63,vs63
 bl  1628 
 addis   r9,r2,-2
 addir9,r9,-30960
 lvx v2,0,r9
 vslwv2,v31,v2
 bl  1628 
--
So, if the shift value is < 16, it uses vspltisw.
If the shift value is >= 16, it loads a const vector from memory.

Is this issue now moot?

[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32

2018-04-03 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402

--- Comment #4 from Paul Clarke  ---
Created attachment 43829
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=43829=edit
unhelpful testcase

$ gcc --version
gcc (GCC) 8.0.1 20180402 (experimental)
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall -O3 -mcpu=power8
$ gcc -o 83402 83402.c -DNO_WARN_X86_INTRINSICS -Wall -mcpu=power8
$

[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32

2018-04-03 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402

--- Comment #3 from Paul Clarke  ---
(In reply to Steven Munroe from comment #0)
> The rs6000/emmintrin.h implementation of _mm_slli_epi32 reports:
>   error: argument 1 must be a 5-bit signed literal
> 
> For constant shift values > 15.

I thought this would be trivial to reproduce, but not able to provoke it.  Do
you have a testcase?  I will attach the one I tried.

[Bug target/83402] PPC64 implementation of ./rs6000/emmintrin.h gives out of range for _mm_slli_epi32

2017-12-14 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83402

--- Comment #2 from Paul Clarke  ---
I'd like to take a stab at fixing this.

[Bug tree-optimization/53991] _mm_popcnt_u64 fails with -O3 -fgnu-tm

2016-09-21 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53991

--- Comment #11 from Paul Clarke  ---
We use TM for a multi-producer-multi-consumer queue implementation, and ran
into the issue reported in this bug.  (I had opened bug 77681 before
discovering this report.)  This report is surprisingly old.  Is there any
chance this could get bumped to higher priority?

[Bug c++/77681] failing to inline simple function when using -fgnu-tm

2016-09-21 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681

Paul Clarke  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Paul Clarke  ---
I'll move over to the already-reported bug.

*** This bug has been marked as a duplicate of bug 53991 ***

[Bug tree-optimization/53991] _mm_popcnt_u64 fails with -O3 -fgnu-tm

2016-09-21 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53991

Paul Clarke  changed:

   What|Removed |Added

 CC||pc at us dot ibm.com

--- Comment #10 from Paul Clarke  ---
*** Bug 77681 has been marked as a duplicate of this bug. ***

[Bug c++/77681] failing to inline simple function when using -fgnu-tm

2016-09-21 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681

--- Comment #1 from Paul Clarke  ---
shoot. this may be a dup of bug 53991

[Bug c++/77681] New: failing to inline simple function when using -fgnu-tm

2016-09-21 Thread pc at us dot ibm.com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77681

Bug ID: 77681
   Summary: failing to inline simple function when using -fgnu-tm
   Product: gcc
   Version: 6.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pc at us dot ibm.com
  Target Milestone: ---

Created attachment 39671
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=39671=edit
testcase where always_inline fails with -fgnu-tm

I've used several different versions of GCC, including 4.8.5 and 6.2.1, on
different architectures (x86_64 and ppc64le).

I first noticed that a very simple "static inline" function was not being
inlined.  When I added "__attribute__((always_inline))", an error was produced
(see below). In narrowing down the testcase, I also tried to narrow down the
command line, and discovered that the error is only produced when "-fgnu-tm" is
present.  The narrowed-down testcase makes no use of transactional memory, so
there appears to be some inlining interference caused by "-fgnu-tm".
--
$ g++ -O3 -c always-inline.cpp -fgnu-tm -o /dev/null
always-inline.cpp: In member function ‘T* spsc::pop() [with T = int]’:
always-inline.cpp:9:1: error: inlining failed in call to always_inline ‘void*
_ZL13SPHGetFreePtrPv.constprop.0()’: 
 SPHGetFreePtr (void *H) {
 ^
always-inline.cpp:19:32: error: called from here
   T** p = (T**) SPHGetFreePtr(0);
$ g++ -O3 -c always-inline.cpp -o /dev/null
$
--