Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]
On 17/07/24 14:00, Andrew Pinski wrote: > On Wed, Jul 17, 2024 at 5:24 AM Richard Biener > wrote: >> >> On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta >> wrote: >>> >>> This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`. >>> In forwprop1 pass, depending on the type of `a` and `b`, GCC produces >>> `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is >>> TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`. >>> >>> The patch adds match pattern for a, b: >>> (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE : FALSE >>> (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE >>> (a ? x : y) != (b ? y : x) --> (a^b) ? TRUE : FALSE >>> (a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE >> >> OK. > > Pushed as r15-2106-g44fcc1ca11e7ea (with one small change to the > commit message in the changelog where tabs should be used before the > *; most likely a copy and paste error). It seems that this change triggered with Linaro CI on arm 32 bit [1]: -- Executing on host: /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++ /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 -mfloat-abi=hard -mfpu=auto -fdiagnostics-plain-output -nostdinc++ -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++ -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi -S -o pr50.s(timeout = 600) spawn -ignore SIGHUP /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/destdir/x86_64-pc-linux-gnu/bin/arm-eabi-g++ /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C -mthumb -march=armv8.1-m.main+mve.fp+fp.dp -mtune=cortex-m55 -mfloat-abi=hard -mfpu=auto -fdiagnostics-plain-output -nostdinc++ -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include/arm-eabi -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/builds/x86_64-pc-linux-gnu/arm-eabi/gcc-gcc.git~master-stage2/arm-eabi/libstdc++-v3/include -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/libsupc++ -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/include/backward -I/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libstdc++-v3/testsuite/util -fmessage-length=0 -std=gnu++98 -O1 -fdump-tree-forwprop1 -Wno-psabi -S -o pr50.s /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C: In function 'v4si f1_(v4si, v4si, v4si, v4si, v4si, v4si)': /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1: error: unrecognizable insn: (insn 22 21 26 2 (set (reg:V4SI 120 [ ]) (unspec:V4SI [ (reg:V4SI 136) (reg:V4SI 137) (subreg:V4BI (reg:HI 135) 0) ] VPSELQ_S)) "/home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C":12:17 -1 (nil)) during RTL pass: vregs /home/tcwg-buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/gcc/testsuite/g++.dg/tree-ssa/pr50.C:13:1: internal compiler error: in extract_insn, at recog.cc:2848 0x21fd635 internal_error(char const*, ...) ../../../../../../gcc/gcc/diagnostic-global-context.cc:491 0x9a0958 fancy_abort(char const*, int, char const*) ../../../../../../gcc/gcc/diagnostic.cc:1725 0x840e4d _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../../../../../../gcc/gcc/rtl-error.cc:108 0x840e6f _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../../../../../../gcc/gcc/rtl-error.cc:116 0x83f76b extract_insn(rtx_insn*) ../../../../../../gcc/gcc/recog.cc:2848 0xf1a805 instantiate_virtual_regs_in_insn ../../../../../../gcc/gcc/function.cc:1612 0xf1a805 instantiate_virtual_regs ../../../../../../gcc/gcc/function.cc:1995 0xf1a805 execute ../../../../../../gcc/gcc/function.cc:2042 -- Should I open a bug report? [1] https://ci.linaro.org/job/tcwg_gnu_embed_check_gcc--master-thumb_m55_hard_eabi-build/517/artifact/artifacts/00-sumfiles/
Re: [PATCH] arm: Update fp16-aapcs-[24].c after insn_propagation patch
On 19/07/24 11:25, Richard Earnshaw (lists) wrote: > On 11/07/2024 19:31, Richard Sandiford wrote: >> These tests used to generate: >> >> bl swap >> ldr r2, [sp, #4] >> mov r0, r2 @ __fp16 >> >> but g:9d20529d94b23275885f380d155fe8671ab5353a means that we can >> load directly into r0: >> >> bl swap >> ldrhr0, [sp, #4]@ __fp16 >> >> This patch updates the tests to "defend" this change. >> >> While there, the scans include: >> >> mov\tr1, r[03]} >> >> But if the spill of r2 occurs first, there's no real reason why >> r2 couldn't be used as the temporary, instead r3. >> >> The patch tries to update the scans while preserving the spirit >> of the originals. >> >> Spot-checked with armv8l-unknown-linux-gnueabihf. OK to install? >> >> Richard > > OK. > > I'm not sure that these tests are really doing very much; it would probably > be better if they could be rewritten using the gcc.target/arm/aapcs > framework. But that's for another day. > > R. Hi Richard, It seems that this not fully fixed on all configurations, I am still seeing: FAIL: gcc.target/arm/fp16-aapcs-1.c scan-assembler vmov\\.f32\\ts1, s0 FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler-times mov\\tr[01], r[0-9]+ 2 FAIL: gcc.target/arm/fp16-aapcs-2.c scan-assembler str\\tr2, ([^\\n]*).*ldrh\\tr0, \\1 FAIL: gcc.target/arm/fp16-aapcs-3.c scan-assembler vmov\\.f32\\ts1, s0 FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler-times mov\\tr[01], r[0-9]+ 2 FAIL: gcc.target/arm/fp16-aapcs-4.c scan-assembler str\\tr2, ([^\\n]*).*ldrh\\tr0, \\1 The gcc is configured with --with-float=hard --with-fpu=vfpv3-d16 --with-mode=thumb --with-tune=cortex-a9 --with-arch=armv7-a https://ci.linaro.org/job/tcwg_gnu_cross_check_gcc--master-arm-build/1561/artifact/artifacts/00-sumfiles/
Re: [to-be-committed][RISC-V][V3] DCE analysis for extension elimination
On 02/07/24 16:40, Jeff Law wrote: > [ Actually attaching the patch this time... ] > > The pre-commit testing showed that making ext-dce only active at -O2 and > above would require minor edits to the tests. In some cases we had specified > -O1 in the test or specified no optimization level at all. Those need to be > bumped to -O2. In one test we had one set of dg-options overriding another. > > The other approach that could have been taken would be to drop the -On > argument, add an explicit -fext-dce and add dg-skip-if options. I originally > thought that was going to be way to go, but the dg-skip-if aspect was going > to get ugly as things like interaction between unrolling, peeling and > -ftracer would have to be accounted for and would likely need semi-regular > adjustment. > > > > Changes since V2: > Testsuite changes to deal with pass only being enabled at -O2 or > higher. > Hi Jeff, It seems that this commit has triggered an issue on 32 bit ARM [1] FAIL: gcc.dg/plugin/must-tail-call-1.c -fplugin=./must_tail_call_plugin.so (internal compiler error: in df_refs_verify, at df-scan.cc:4013) Should I open a bug report for this? [1] https://git-us.linaro.org/toolchain/ci/interesting-commits.git/plain/gcc/sha1/98914f9eba5f19d3eb93fbce8726b5264631cba0/tcwg_bootstrap_check/master-arm-check_bootstrap/details.txt
Re: New TLS usage in libgcc_s.so.1, compatibility impact
On 15/01/24 09:46, Szabolcs Nagy wrote: > The 01/13/2024 13:49, Florian Weimer wrote: >> This commit >> >> commit 8abddb187b33480d8827f44ec655f45734a1749d >> Author: Andrew Burgess >> Date: Sat Aug 5 14:31:06 2023 +0200 >> >> libgcc: support heap-based trampolines >> >> Add support for heap-based trampolines on x86_64-linux, aarch64-linux, >> and x86_64-darwin. Implement the __builtin_nested_func_ptr_created and >> __builtin_nested_func_ptr_deleted functions for these targets. >> >> Co-Authored-By: Maxim Blinov >> Co-Authored-By: Iain Sandoe >> Co-Authored-By: Francois-Xavier Coudert >> >> added TLS usage to libgcc_s.so.1. The way that libgcc_s is currently >> built, it ends up using a dynamic TLS variant on the Linux targets. >> This means that there is no up-front TLS allocation with glibc (but >> there would be one with musl). >> >> There is still a compatibility impact because glibc assigns a TLS module >> ID upfront. This seems to be what causes the >> ust/libc-wrapper/test_libc-wrapper test in lttng-tools to fail. We end >> up with an infinite regress during process termination because >> libgcc_s.so.1 has been loaded, resulting in a DTV update. When this >> happens, the bottom of the stack looks like this: >> >> #4447 0x77f288f0 in free () from >> /lib64/liblttng-ust-libc-wrapper.so.1 >> #4448 0x77fdb142 in free (ptr=) >> at ../include/rtld-malloc.h:50 >> #4449 _dl_update_slotinfo (req_modid=3, new_gen=2) at ../elf/dl-tls.c:822 >> #4450 0x77fdb214 in update_get_addr (ti=0x77f2bfc0, >> gen=) at ../elf/dl-tls.c:916 >> #4451 0x77fddccc in __tls_get_addr () >> at ../sysdeps/x86_64/tls_get_addr.S:55 >> #4452 0x77f288f0 in free () from >> /lib64/liblttng-ust-libc-wrapper.so.1 >> #4453 0x77fdb142 in free (ptr=) >> at ../include/rtld-malloc.h:50 >> #4454 _dl_update_slotinfo (req_modid=2, new_gen=2) at ../elf/dl-tls.c:822 >> #4455 0x77fdb214 in update_get_addr (ti=0x77f39fa0, >> gen=) at ../elf/dl-tls.c:916 >> #4456 0x77fddccc in __tls_get_addr () >> at ../sysdeps/x86_64/tls_get_addr.S:55 >> #4457 0x77f36113 in lttng_ust_cancelstate_disable_push () >>from /lib64/liblttng-ust-common.so.1 >> #4458 0x77f4c2e8 in ust_lock_nocheck () from /lib64/liblttng-ust.so.1 >> #4459 0x77f5175a in lttng_ust_cleanup () from >> /lib64/liblttng-ust.so.1 >> #4460 0x77fca0f2 in _dl_call_fini ( >> closure_map=closure_map@entry=0x77fbe000) at dl-call_fini.c:43 >> #4461 0x77fce06e in _dl_fini () at dl-fini.c:114 >> #4462 0x77d82fe6 in __run_exit_handlers () from /lib64/libc.so.6 >> >> Cc:ing for awareness. >> >> The issue also requires a recent glibc with changes to DTV management: >> commit d2123d68275acc0f061e73d5f86ca504e0d5a344 ("elf: Fix slow tls >> access after dlopen [BZ #19924]"). If I understand things correctly, >> before this glibc change, we didn't deallocate the old DTV, so there was >> no call to the free function. > > with 19924 fixed, after a dlopen or dlclose every thread updates > its dtv on the next dynamic tls access. > > before that, dtv was only updated up to the generation of the > module being accessed for a particular tls access. > > so hitting the free in the dtv update path is now more likely > but the free is not new, it was there before. > > also note that this is unlikely to happen on aarch64 since > tlsdesc only does dynamic tls access after a 512byte static > tls reservation runs out. > >> >> On the glibc side, we should recommend that intercepting mallocs and its >> dependencies use initial-exec TLS because that kind of TLS does not use >> malloc. If intercepting mallocs using dynamic TLS work at all, that's >> totally by accident, and was in the past helped by glibc bug 19924. (I > > right. > >> don't think there is anything special about libgcc_s.so.1 that triggers >> the test failure above, it is just an object with dynamic TLS that is >> implicitly loaded via dlopen at the right stage of the test.) In this >> particular case, we can also paper over the test failure in glibc by not >> call free at all because the argument is a null pointer: >> >> diff --git a/elf/dl-tls.c b/elf/dl-tls.c >> index 7b3dd9ab60..14c71cbd06 100644 >> --- a/elf/dl-tls.c >> +++ b/elf/dl-tls.c >> @@ -819,7 +819,8 @@ _dl_update_slotinfo (unsigned long int req_modid, size_t >> new_gen) >> dtv entry free it. Note: this is not AS-safe. */ >>/* XXX Ideally we will at some point create a memory >> pool. */ >> - free (dtv[modid].pointer.to_free); >> + if (dtv[modid].pointer.to_free != NULL) >> +free (dtv[modid].pointer.to_free); >>dtv[modid].pointer.val = TLS_DTV_UNALLOCATED; >>dtv[modid].pointer.to_free = NULL; > > can be done, but !=NULL is more likely since we do modid reuse > after dlclose. > > there is
Re: [PATCH] longlong.h: Do no use asm input cast for clang
On 12/12/22 20:52, Segher Boessenkool wrote: > On Mon, Dec 12, 2022 at 02:10:16PM -0300, Adhemerval Zanella Netto wrote: >> On 30/11/22 20:24, Segher Boessenkool wrote: >>> I understand that the casts should be no-ops on the asm side (maybe they >>> change the sign) and they are present as type-checking. Can we implement >>> this type-checking in a different (portable) way? I think the macro you use >>> should be named like __asm_output_check_type (..) or so to indicate the >>> intended purpose. > > I didn't write that. Please quote correctly. Thanks! > >> I do not think trying to leverage it on clang side would yield much, it >> seems that it really does not want to support this extension. I am not >> sure we can really make it portable, best option I can think of would to >> add a mix of __builtin_classify_type and typeof prior asm call (we do >> something similar to powerp64 syscall code on glibc), although it would >> still require some gcc specific builtins. >> >> I am open for ideas, since to get this header to be clang-compatible on >> glibc it requires to get it first on gcc. > > How do you intend to modify all the existing copies of the header that > haven't been updated for over a decade already?> > If you think changing all user code that uses longlong.h is a good idea, > please change it to not use inline asm, use builtins in some cases but > mostly just rewrite things in plain C. But GCC cannot rewrite user code > (not preemptively anyway ;-) ) -- and longlong.h as encountered in the > wild (not the one in our libgcc source code) is user code. > > If you think changing the copy in libgcc is a good idea, please change > the original in glibc first? That's my original intention [1], but Joseph stated that GCC is the upstream source of this file. Joseph, would you be ok for a similar patch to glibc since gcc is reluctant to accept it? [1] https://sourceware.org/pipermail/libc-alpha/2022-October/143050.html
Re: [PATCH] longlong.h: Do no use asm input cast for clang
On 30/11/22 20:24, Segher Boessenkool wrote: > Hi! > > On Wed, Nov 30, 2022 at 03:16:25PM -0300, Adhemerval Zanella via Gcc-patches > wrote: >> clang by default rejects the input casts with: >> >> error: invalid use of a cast in a inline asm context requiring an >> lvalue: remove the cast or build with -fheinous-gnu-extensions >> >> And even with -fheinous-gnu-extensions clang still throws an warning >> and also states that this option might be removed in the future. >> For gcc the cast are still useful somewhat [1], so just remove it >> clang is used. > > This is one of the things in inline asm that is tightly tied to GCC > internals. You should emulate GCC's behaviour faithfully if you want > to claim you implement the inline asm GNU C extension. Agree, that's why I just make it a no-op for clang which indicates that it does not seem much use of this extension. > I understand that the casts should be no-ops on the asm side (maybe they > change the sign) and they are present as type-checking. Can we implement > this type-checking in a different (portable) way? I think the macro you use > should be named like __asm_output_check_type (..) or so to indicate the > intended purpose. I do not think trying to leverage it on clang side would yield much, it seems that it really does not want to support this extension. I am not sure we can really make it portable, best option I can think of would to add a mix of __builtin_classify_type and typeof prior asm call (we do something similar to powerp64 syscall code on glibc), although it would still require some gcc specific builtins. I am open for ideas, since to get this header to be clang-compatible on glibc it requires to get it first on gcc. > >> --- a/include/ChangeLog >> +++ b/include/ChangeLog > > That should not be part of the patch? Changelog entries should be > verbatim in the message you send. > > The size of this patch already makes clear this is a bad idea, imo. > This code is already hard enough to read. Indeed, I forgot that CL entries were not part of the commit.
[PATCH] longlong.h: Do no use asm input cast for clang
clang by default rejects the input casts with: error: invalid use of a cast in a inline asm context requiring an lvalue: remove the cast or build with -fheinous-gnu-extensions And even with -fheinous-gnu-extensions clang still throws an warning and also states that this option might be removed in the future. For gcc the cast are still useful somewhat [1], so just remove it clang is used. [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581722.html --- include/ChangeLog | 60 ++ include/longlong.h | 524 +++-- 2 files changed, 325 insertions(+), 259 deletions(-) diff --git a/include/ChangeLog b/include/ChangeLog index dda005335c0..747fc923ef5 100644 --- a/include/ChangeLog +++ b/include/ChangeLog @@ -1,3 +1,63 @@ +2022-11-30 Adhemerval Zanella + + * include/longlong.h: Modified. + [(__GNUC__) && ! NO_ASM][( (__i386__) || (__i486__)) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][( (__i386__) || (__i486__)) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][( (__i386__) || (__i486__)) && W_TYPE_SIZE == 32](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][( (__i386__) || (__i486__)) && W_TYPE_SIZE == 32](udiv_qrnnd): Modified. + [(__GNUC__) && ! NO_ASM][(( (__sparc__) && (__arch64__)) || (__sparcv9)) && W_TYPE_SIZE == 64](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(( (__sparc__) && (__arch64__)) || (__sparcv9)) && W_TYPE_SIZE == 64](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(( (__sparc__) && (__arch64__)) || (__sparcv9)) && W_TYPE_SIZE == 64](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__M32R__) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__M32R__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__arc__) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__arc__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__) && W_TYPE_SIZE == 32][(__ARM_ARCH_2__) || (__ARM_ARCH_2A__) || (__ARM_ARCH_3__)](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__arm__) && ( (__thumb2__) || ! __thumb__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__hppa) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__hppa) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__i960__) && W_TYPE_SIZE == 32](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__i960__) && W_TYPE_SIZE == 32](__umulsidi3): Modified. + [(__GNUC__) && ! NO_ASM][(__ibm032__) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__ibm032__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__ibm032__) && W_TYPE_SIZE == 32](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__ibm032__) && W_TYPE_SIZE == 32](count_leading_zeros): Modified. + [(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32][(__mc88110__)](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32][(__mc88110__)](udiv_qrnnd): Modified. + [(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](add_ss): Modified. + [(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](sub_ddmmss): Modified. + [(__GNUC__) && ! NO_ASM][(__m88000__) && W_TYPE_SIZE == 32](count_leading_zeros): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][!((__mcoldfire__))](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](udiv_qrnnd): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][( (__mc68020__) && ! __mc68060__)](sdiv_qrnnd): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32][(__mcoldfire__)](umul_ppmm): Modified. + [(__GNUC__) && ! NO_ASM][(__mc68000__) && W_TYPE_SIZE == 32](add_ss
Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function
On 17/11/2021 10:40, Florian Weimer wrote: > * Adhemerval Zanella via Libc-alpha: > >> However the code is somewhat complex and I would like to have some feedback >> if gcc will be willing to accept this change (I assume it would require >> this code merge on glibc beforehand). > > There's a long review queue on the GCC side due to the stage1 close. > It may still be considered for GCC 12. Jakub has also requested that > we hold off committing the glibc side until the GCC side is reviewed. > > I'll flesh out the commit message and NEWS entry once we have agreed > upon the interface. > >>> new file mode 100644 >>> index 00..c7313c122d >>> --- /dev/null >>> +++ b/elf/dl-find_eh_frame.c > >>> +/* Data for the main executable. There is usually a large gap between >>> + the main executable and initially loaded shared objects. Record >>> + the main executable separately, to increase the chance that the >>> + range for the non-closeable mappings below covers only the shared >>> + objects (and not also the gap between main executable and shared >>> + objects). */ >>> +static uintptr_t _dl_eh_main_map_start attribute_relro; >>> +static struct dl_eh_frame_info _dl_eh_main_info attribute_relro; >>> + >>> +/* Data for initally loaded shared objects that cannot be unlaoded. >> >> s/initally/initially and s/unlaoded/unloaded. > > Fixed. > >> >>> + The mapping base addresses are stored in address order in the >>> + _dl_eh_nodelete_mappings_bases array (containing >>> + _dl_eh_nodelete_mappings_size elements). The EH data for a base >>> + address is stored in the parallel _dl_eh_nodelete_mappings_infos. >>> + These arrays are not modified after initialization. */ >>> +static uintptr_t _dl_eh_nodelete_mappings_end attribute_relro; >>> +static size_t _dl_eh_nodelete_mappings_size attribute_relro; >>> +static uintptr_t *_dl_eh_nodelete_mappings_bases attribute_relro; >>> +static struct dl_eh_frame_info *_dl_eh_nodelete_mappings_infos >>> + attribute_relro; >>> + >>> +/* Mappings created by dlopen can go away with dlclose, so a data >>> + dynamic data structure with some synchronization is needed. >> >> This sounds strange ("a data dynamic data"). > > I dropped the first data. > >> >>> + Individual segments are similar to the _dl_eh_nodelete_mappings >> >> Maybe use _dl_eh_nodelete_mappings_*, because '_dl_eh_nodelete_mappings' >> itself if not defined anywhere. > > Right. > >>> + Adding new elements to this data structure is another source of >>> + quadratic behavior for dlopen. If the other causes of quadratic >>> + behavior are eliminated, a more complicated data structure will be >>> + needed. */ >> >> This worries me, specially we have reports that python and other dynamic >> environments do use a lot of plugin and generates a lot of dlopen() calls. >> What kind of performance implication do you foresee here? > > The additional overhead is not disproportionate to the other sources of > quadratic behavior. With 1,000 dlopen'ed objects, overall run-time > seems to be comparable to the strcmp time required soname matching, for > example, and is quite difficult to measure. So we could fix the > performance regression if we used a hash table for that … > > It's just an undesirable complexity class. The implementation is not > actually slow because it's a mostly-linear copy (although a backwards > one). Other parts of dlopen involve pointer chasing and are much > slower. Right, I agree this should probably won't incur in performance issues, I was curious if you have any numbers about it. > >>> +/* Allocate an empty segment that is at least SIZE large. PREVIOUS */ >> >> What this PREVIOUS refer to? > > Oops, it's now: > > /* Allocate an empty segment that is at least SIZE large. PREVIOUS >points to the chain of previously allocated segments and can be >NULL. */ > >>> +/* Update the version to reflect that an update is happening. This >>> + does not change the bit that controls the active segment chain. >>> + Returns the index of the currently active segment chain. */ >>> +static inline unsigned int >>> +_dl_eh_mappings_begin_update (void) >>> +{ >>> + unsigned int v >>> += __atomic_wide_counter_fetch_add_relaxed >>> (&_dl_eh_loaded_mappings_version
Re: [PATCH 3/3] elf: Add _dl_find_eh_frame function
On 03/11/2021 13:28, Florian Weimer via Gcc-patches wrote: > This function is similar to __gnu_Unwind_Find_exidx as used on arm. > It can be used to speed up the libgcc unwinder. Besides the terse patch description, the design seems ok to accomplish the lock-free read and update. There are some question and remarks below, and I still need to revise the tests. However the code is somewhat complex and I would like to have some feedback if gcc will be willing to accept this change (I assume it would require this code merge on glibc beforehand). > --- > NEWS | 4 + > bits/dlfcn_eh_frame.h | 33 + > dlfcn/Makefile| 2 +- > dlfcn/dlfcn.h | 2 + > elf/Makefile | 31 +- > elf/Versions | 3 + > elf/dl-close.c| 4 + > elf/dl-find_eh_frame.c| 864 ++ > elf/dl-find_eh_frame.h| 90 ++ > elf/dl-find_eh_frame_slow.h | 55 ++ > elf/dl-libc_freeres.c | 2 + > elf/dl-open.c | 5 + > elf/rtld.c| 7 + > elf/tst-dl_find_eh_frame-mod1.c | 10 + > elf/tst-dl_find_eh_frame-mod2.c | 10 + > elf/tst-dl_find_eh_frame-mod3.c | 10 + > elf/tst-dl_find_eh_frame-mod4.c | 10 + > elf/tst-dl_find_eh_frame-mod5.c | 11 + > elf/tst-dl_find_eh_frame-mod6.c | 11 + > elf/tst-dl_find_eh_frame-mod7.c | 10 + > elf/tst-dl_find_eh_frame-mod8.c | 10 + > elf/tst-dl_find_eh_frame-mod9.c | 10 + > elf/tst-dl_find_eh_frame-threads.c| 237 + > elf/tst-dl_find_eh_frame.c| 179 > include/atomic_wide_counter.h | 14 + > include/bits/dlfcn_eh_frame.h | 1 + > include/link.h| 3 + > manual/Makefile | 2 +- > manual/dynlink.texi | 69 ++ > manual/libdl.texi | 10 - > manual/probes.texi| 2 +- > manual/threads.texi | 2 +- > sysdeps/i386/bits/dlfcn_eh_frame.h| 34 + > sysdeps/mach/hurd/i386/ld.abilist | 1 + > sysdeps/nios2/bits/dlfcn_eh_frame.h | 34 + > sysdeps/unix/sysv/linux/aarch64/ld.abilist| 1 + > sysdeps/unix/sysv/linux/alpha/ld.abilist | 1 + > sysdeps/unix/sysv/linux/arc/ld.abilist| 1 + > sysdeps/unix/sysv/linux/arm/be/ld.abilist | 1 + > sysdeps/unix/sysv/linux/arm/le/ld.abilist | 1 + > sysdeps/unix/sysv/linux/csky/ld.abilist | 1 + > sysdeps/unix/sysv/linux/hppa/ld.abilist | 1 + > sysdeps/unix/sysv/linux/i386/ld.abilist | 1 + > sysdeps/unix/sysv/linux/ia64/ld.abilist | 1 + > .../unix/sysv/linux/m68k/coldfire/ld.abilist | 1 + > .../unix/sysv/linux/m68k/m680x0/ld.abilist| 1 + > sysdeps/unix/sysv/linux/microblaze/ld.abilist | 1 + > .../unix/sysv/linux/mips/mips32/ld.abilist| 1 + > .../sysv/linux/mips/mips64/n32/ld.abilist | 1 + > .../sysv/linux/mips/mips64/n64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/nios2/ld.abilist | 1 + > .../sysv/linux/powerpc/powerpc32/ld.abilist | 1 + > .../linux/powerpc/powerpc64/be/ld.abilist | 1 + > .../linux/powerpc/powerpc64/le/ld.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/rv32/ld.abilist | 1 + > sysdeps/unix/sysv/linux/riscv/rv64/ld.abilist | 1 + > .../unix/sysv/linux/s390/s390-32/ld.abilist | 1 + > .../unix/sysv/linux/s390/s390-64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/sh/be/ld.abilist | 1 + > sysdeps/unix/sysv/linux/sh/le/ld.abilist | 1 + > .../unix/sysv/linux/sparc/sparc32/ld.abilist | 1 + > .../unix/sysv/linux/sparc/sparc64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/x86_64/64/ld.abilist | 1 + > sysdeps/unix/sysv/linux/x86_64/x32/ld.abilist | 1 + > 64 files changed, 1795 insertions(+), 16 deletions(-) > create mode 100644 bits/dlfcn_eh_frame.h > create mode 100644 elf/dl-find_eh_frame.c > create mode 100644 elf/dl-find_eh_frame.h > create mode 100644 elf/dl-find_eh_frame_slow.h > create mode 100644 elf/tst-dl_find_eh_frame-mod1.c > create mode 100644 elf/tst-dl_find_eh_frame-mod2.c > create mode 100644 elf/tst-dl_find_eh_frame-mod3.c > create mode 100644 elf/tst-dl_find_eh_frame-mod4.c > create mode 100644 elf/tst-dl_find_eh_frame-mod5.c > create mode 100644 elf/tst-dl_find_eh_frame-mod6.c > create mode 100644 elf/tst-dl_find_eh_frame-mod7.c > create mode 100644 elf/tst-dl_find_eh_frame-mod8.c > create mode 100644 elf/tst-dl_find_eh_frame-mod9.c > create mode 100644
Re: [PATCH 2/3] elf: Introduce GLRO (dl_libc_freeres), called from __libc_freeres
Maybe add a comment why this is will be used. Reviewed-by: Adhemerval Zanella On 03/11/2021 13:27, Florian Weimer via Gcc-patches wrote: > --- > elf/Makefile | 2 +- > elf/dl-libc_freeres.c | 24 > elf/rtld.c | 1 + > malloc/set-freeres.c | 5 + > sysdeps/generic/ldsodefs.h | 7 +++ > 5 files changed, 38 insertions(+), 1 deletion(-) > create mode 100644 elf/dl-libc_freeres.c > > diff --git a/elf/Makefile b/elf/Makefile > index cb9bcfb799..1c768bdf47 100644 > --- a/elf/Makefile > +++ b/elf/Makefile > @@ -68,7 +68,7 @@ elide-routines.os = $(all-dl-routines) dl-support > enbl-secure dl-origin \ > rtld-routines= rtld $(all-dl-routines) dl-sysdep dl-environ > dl-minimal \ >dl-error-minimal dl-conflict dl-hwcaps dl-hwcaps_split dl-hwcaps-subdirs \ >dl-usage dl-diagnostics dl-diagnostics-kernel dl-diagnostics-cpu \ > - dl-mutex > + dl-mutex dl-libc_freeres > all-rtld-routines = $(rtld-routines) $(sysdep-rtld-routines) > > CFLAGS-dl-runtime.c += -fexceptions -fasynchronous-unwind-tables Ok. > diff --git a/elf/dl-libc_freeres.c b/elf/dl-libc_freeres.c > new file mode 100644 > index 00..68f305a6f9 > --- /dev/null > +++ b/elf/dl-libc_freeres.c > @@ -0,0 +1,24 @@ > +/* Deallocating malloc'ed memory from the dynamic loader. > + Copyright (C) 2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#include > + > +void > +__rtld_libc_freeres (void) > +{ > +} Ok. > diff --git a/elf/rtld.c b/elf/rtld.c > index be2d5d8e74..847141e21d 100644 > --- a/elf/rtld.c > +++ b/elf/rtld.c > @@ -378,6 +378,7 @@ struct rtld_global_ro _rtld_global_ro attribute_relro = > ._dl_catch_error = _rtld_catch_error, > ._dl_error_free = _dl_error_free, > ._dl_tls_get_addr_soft = _dl_tls_get_addr_soft, > +._dl_libc_freeres = __rtld_libc_freeres, > #ifdef HAVE_DL_DISCOVER_OSVERSION > ._dl_discover_osversion = _dl_discover_osversion > #endif Ok. > diff --git a/malloc/set-freeres.c b/malloc/set-freeres.c > index 5c19a2725c..856ff7831f 100644 > --- a/malloc/set-freeres.c > +++ b/malloc/set-freeres.c > @@ -21,6 +21,7 @@ > #include > #include > #include > +#include > > #include "../nss/nsswitch.h" > #include "../libio/libioP.h" > @@ -67,6 +68,10 @@ __libc_freeres (void) > >call_function_static_weak (__libc_dlerror_result_free); > > +#ifdef SHARED > + GLRO (dl_libc_freeres) (); > +#endif > + >for (p = symbol_set_first_element (__libc_freeres_ptrs); > !symbol_set_end_p (__libc_freeres_ptrs, p); ++p) > free (*p); OK. > diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h > index 1318c36dce..c26860430c 100644 > --- a/sysdeps/generic/ldsodefs.h > +++ b/sysdeps/generic/ldsodefs.h > @@ -712,6 +712,10 @@ struct rtld_global_ro > namespace. */ >void (*_dl_error_free) (void *); >void *(*_dl_tls_get_addr_soft) (struct link_map *); > + > + /* Called from __libc_shared to deallocate malloc'ed memory. */ > + void (*_dl_libc_freeres) (void); > + > #ifdef HAVE_DL_DISCOVER_OSVERSION >int (*_dl_discover_osversion) (void); > #endif > @@ -1416,6 +1420,9 @@ __rtld_mutex_init (void) > } > #endif /* !PTHREAD_IN_LIBC */ > > +/* Implementation of GL (dl_libc_freeres). */ > +void __rtld_libc_freeres (void) attribute_hidden; > + > void __thread_gscope_wait (void) attribute_hidden; > # define THREAD_GSCOPE_WAIT() __thread_gscope_wait () > > OK.
Re: [PATCH 1/3] nptl: Extract from pthread_cond_common.c
On 03/11/2021 13:27, Florian Weimer via Libc-alpha wrote: > And make it an installed header. This addresses a few aliasing > violations (which do not seem to result in miscompilation due to > the use of atomics), and also enables use of wide counters in other > parts of the library. > > The debug output in nptl/tst-cond22 has been adjusted to print > the 32-bit values instead because it avoids a big-endian/little-endian > difference. LGTM, thanks. Reviewed-by: Adhemerval Zanella > --- > bits/atomic_wide_counter.h | 35 > include/atomic_wide_counter.h | 89 +++ > include/bits/atomic_wide_counter.h | 1 + > misc/Makefile | 3 +- > misc/atomic_wide_counter.c | 127 +++ > nptl/Makefile | 13 +- > nptl/pthread_cond_common.c | 204 > nptl/tst-cond22.c | 14 +- > sysdeps/nptl/bits/thread-shared-types.h | 22 +-- > 9 files changed, 310 insertions(+), 198 deletions(-) > create mode 100644 bits/atomic_wide_counter.h > create mode 100644 include/atomic_wide_counter.h > create mode 100644 include/bits/atomic_wide_counter.h > create mode 100644 misc/atomic_wide_counter.c > > diff --git a/bits/atomic_wide_counter.h b/bits/atomic_wide_counter.h > new file mode 100644 > index 00..0687eb554e > --- /dev/null > +++ b/bits/atomic_wide_counter.h > @@ -0,0 +1,35 @@ > +/* Monotonically increasing wide counters (at least 62 bits). > + Copyright (C) 2016-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#ifndef _BITS_ATOMIC_WIDE_COUNTER_H > +#define _BITS_ATOMIC_WIDE_COUNTER_H > + > +/* Counter that is monotonically increasing (by less than 2**31 per > + increment), with a single writer, and an arbitrary number of > + readers. */ > +typedef union > +{ > + __extension__ unsigned long long int __value64; > + struct > + { > +unsigned int __low; > +unsigned int __high; > + } __value32; > +} __atomic_wide_counter; > + > +#endif /* _BITS_ATOMIC_WIDE_COUNTER_H */ Ok, it would be included in multiple places so we can't tie to a specific header. > diff --git a/include/atomic_wide_counter.h b/include/atomic_wide_counter.h > new file mode 100644 > index 00..31f009d5e6 > --- /dev/null > +++ b/include/atomic_wide_counter.h > @@ -0,0 +1,89 @@ > +/* Monotonically increasing wide counters (at least 62 bits). > + Copyright (C) 2016-2021 Free Software Foundation, Inc. > + This file is part of the GNU C Library. > + > + The GNU C Library is free software; you can redistribute it and/or > + modify it under the terms of the GNU Lesser General Public > + License as published by the Free Software Foundation; either > + version 2.1 of the License, or (at your option) any later version. > + > + The GNU C Library is distributed in the hope that it will be useful, > + but WITHOUT ANY WARRANTY; without even the implied warranty of > + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > + Lesser General Public License for more details. > + > + You should have received a copy of the GNU Lesser General Public > + License along with the GNU C Library; if not, see > + <https://www.gnu.org/licenses/>. */ > + > +#ifndef _ATOMIC_WIDE_COUNTER_H > +#define _ATOMIC_WIDE_COUNTER_H > + > +#include > +#include > + > +#if __HAVE_64B_ATOMICS > + > +static inline uint64_t > +__atomic_wide_counter_load_relaxed (__atomic_wide_counter *c) > +{ > + return atomic_load_relaxed (>__value64); > +} > + > +static inline uint64_t > +__atomic_wide_counter_fetch_add_relaxed (__atomic_wide_counter *c, > + unsigned int val) > +{ > + return atomic_fetch_add_relaxed (>__value64, val); > +} > + > +static inline uint64_t > +__atomic_wide_counter_fetch_a
Re: GCC association with the FSF
On 12/04/2021 14:52, Alexandre Oliva wrote: > On Apr 12, 2021, Adhemerval Zanella wrote: > >> No, you are insinuating that the glibc community both as maintainer >> and contributors acted in a hateful way regarding the 'joke' >> removal. Sorry, but this is not true; > > Easy to say for someone who hasn't been the target of hate, but it's > just that it was there right then, it's *remains* there. Not exclusive > among glibc maintainers, and certainly not unanimous among them, but > there. I may even have earned it myself. But the one that Richard got > over incorrect assumptions that he commanded the reversal, that's just > another false piece of evidence often used to support the hate campaign. There were no "hate" campaign from glibc developers and maintainers, keep stating it does not make it true. Since libc-alpha is non moderated list, there were a lot of unfriendly message from undisclosed or non-representative people. What happened is some glibc developers were *really* annoyed in the way *you* acted, not RMS; and they vocalized it. And you, instead of work toward to create consensus by making some concession (as the currently we try to run the glibc community), keep arguing to exhaustion that you acted in the benefit or the project. So the aforementioned 'hate' is just because we did not agreed in the way *you* acted, which caused a lot of distress. > >> The main idea, which I was vocal about and shared with some glibc >> developers and maintainers, was that the "joke" has no place in a >> technical manual. > > I understand there is consensus about that now, but back then there were > too many unsettled policy issues to make that call consensually among > all relevant parties. > > The main disagreement was not over the issue proper, though. It was > about procedure, and then it was about whose opinions as much as > counted. No, the disagreement is the way *you* did it. I haven't seen such contention and disarray you started since I have started to work on the project, about a decade ago. So, please stop put the blame of that episode on the glibc community as a whole. > > > It was a really trivial issue, but sufficiently hot-button and > triggering enough underlying issues that it got to be exploited > politically in several ugly ways. > > It can't really be understood without looking into broader contexts that > had long been mounting, and that again quite explicit in this list too. > > > But I hope we can all agree that it was a horrible mess. >
Re: GCC association with the FSF
On Sun, Apr 11, 2021 at 10:43 PM Alexandre Oliva wrote: > > On Apr 11, 2021, Adhemerval Zanella wrote: > > > All the other active maintainers suggested you shouldn't have done that, > > but you > > ignored it anyway. > > How could I possibly have ignored something that hadn't happened yet? > > > *we* glibc maintainers were fully aware that it was *you* that decided > > to act in that way > > There have been plenty of insinuations that contradict that assumption > and attempted to somehow blame it on RMS, but whether the record has > been set straight on this point now, or if it was straight already, the > point stands. No, you are insinuating that the glibc community both as maintainer and contributors acted in a hateful way regarding the 'joke' removal. Sorry, but this is not true; there were messages that might be characterized as such but they did not come from either of main glibc developers or maintainers. > > As recently as a couple of weeks ago someone referred, in this list, to > RMS's voicing his objection to the removal of one of the many pieces he > wrote for the glibc manual, and then setting out to propose and discuss > policies that incided on the matter, as if those were horrible actions. > > That was almost as abhorrent as his asking a GNU developer a question > that he could have answered by just downloading the subproject's source > code and looking for the answer himself! Oh, the horror! > > > If that's not hatred, I don't really wish to know what is :-/ The main idea, which I was vocal about and shared with some glibc developers and maintainers, was that the "joke" has no place in a technical manual. You might disagree ideological and politically from this assessment, but this it is not "hatred" and this very rhetoric is trying to characterize it as such is what made me see that discussion as frustrating and disappointing.
Re: GCC association with the FSF
On Sun, Apr 11, 2021 at 8:06 PM Alexandre Oliva wrote: > > On Apr 11, 2021, Adhemerval Zanella wrote: > > > It was clear to me and others glibc maintainers that it was *you* who > > bypass the consensus to *not* reinstate the “joke”. > > I think you wrote it backwards: what I did was to revert the commit that > the person who put it in agreed shouldn't have been made at that point, > so that the debate about whether or not to install the patch could be > carried out without the fait accompli. To my surprise, it stopped. > > Then, a year or so later, when most of the GNU policies that incided on > that matter had already been discussed and approved, and they suggested > (at least to me) that the conclusion was likely that the patch was in > line with them, some other situation came up that reminded people of the > patch, it was discussed under the heat of the unrelated situation (which > I also found inappropriate), but it got applied AFAICT in accordance > with GNU and GLIBC policies. RMS briefly stated that he did not want the change to be applied, we considered his input back then but we decided to remove the joke *regardless* of what he thought about the subject. And you used this to state the change had no consensus to reinstate it in a way that we haven't done in the project for a couple of years and which caused a lot of disarray. The problem was not that you did it, but how you did it. You then spent a lot of days trying to convince other glibc maintainers about your actions to the point that Torvald and Siddhesh were fed up with your rhetoric. > > > maintainers said explicitly you shouldn’t do it. > > I do not see nor recall any such responses or reactions to my offer to > revert the patch in case the installer wouldn't do it, except the > installer saying they wouldn't do the reversal. Eventually I did it. > After the fact, some said I shouldn't have done it. > > > That's my recollection of the events. All the other active maintainers suggested you shouldn't have done that, but you ignored it anyway. And we did not want to start a potential contention of patch applying and reversion from that petty discussion. But this is done and I don't want to dig into this. My point is *we* glibc maintainers were fully aware that it was *you* that decided to act in that way and it was not my feelings that it was *hate* the dominant response, but rather a lot of frustration and disappointment from how you acted.
Re: GCC association with the FSF
> Il giorno 11 apr 2021, alle ore 17:45, Alexandre Oliva via Gcc > ha scritto: > > Remember how much hate RMS got in glibc land for something I did? I > said I did it out of my own volition, I explained my why I did it, but > people wouldn't believe he had nothing to do with it! It was clear to me and others glibc maintainers that it was *you* who bypass the consensus to *not* reinstate the “joke”. And there was no hate (at least not from my side) only *disappointment* that you used your status to do it even though most of senior developers and maintainers said explicitly you shouldn’t do it.
Re: unnormal Intel 80-bit long doubles and isnanl
On 24/11/2020 10:59, Siddhesh Poyarekar wrote: > On 11/24/20 7:11 PM, Szabolcs Nagy wrote: >> ideally fpclassify (and other classification macros) would >> handle all representations. >> >> architecturally invalid or trap representations can be a >> non-standard class but i think classifying them as FP_NAN >> would break the least amount of code. > > That's my impression too. > >>> glibc evaluates the bit pattern of the 80-bit long double and in the >>> process, ignores the integer bit, i.e. bit 63. As a result, it considers >>> the unnormal number as a valid long double and isnanl returns 0. >> >> i think m68k and x86 are different here. >> >>> >>> gcc on the other hand, simply uses the number in a floating point comparison >>> and uses the parity flag (which indicates an unordered compare, signalling a >>> NaN) to decide if the number is a NaN. The unnormal numbers behave like >>> NaNs in this respect, in that they set the parity flag and with >>> -fsignalling-nans, would result in an invalid-operation exception. As a >>> result, __builtin_isnanl returns 1 for an unnormal number. >> >> compiling isnanl to a quiet fp compare is wrong with >> -fsignalling-nans: classification is not supposed to >> signal exceptions for snan. > > I agree, but I think that issue with __builtin_isnanl is orthogonal to the > question about unnormals. Once that is fixed in gcc, we could actually use > __builtin_isnanl all the time in glibc for isnanl. > > Siddhesh Which is the currently take from gcc developers on this semantic change of __builtin_isnanl? Are they considering current behavior of non classifying the 'unnormal' as NAN the expected behavior and waiting glibc to follow it or are they willing to align with glibc behavior?
Re: [libgomp] Ask for help on an improvement for synchronization overhead
On 30/04/2020 18:12, Jakub Jelinek wrote: > On Thu, Apr 30, 2020 at 05:37:26PM -0300, Adhemerval Zanella via Gcc wrote: >> Hi all, I would like to check if someone could help me figure out >> an issue I am chasing on a libgomp patch intended to partially >> address the issue described at BZ#79784. >> >> I have identified that one of the bottlenecks is the global barrier >> used on both thread pool and team which causes a lof of cache ping-pong >> in high-core count machines. And it seems not be an aarch64 specific >> issue as hinted by the bugzilla. > > This has been a topic of GSoC last year, but the student didn't deliver it > in usable form and disappeared. > See e.g. thread with "Work-stealing task scheduling" in subject from > last year on gcc-patches and other mails on the topic. In my understanding what I am working is not exactly related to OMP tasking, although I see that the global barrier is still an issue on omp task scheduling. What I am trying to optimize in this specific case is the barrier used on gomp_thread_pool used on constructs like parallel for and maybe a per-thread barrier could be extended to other libgomp places. > > So if you'd have time and motivation to do it properly, it would be greatly > appreciated. >
[libgomp] Ask for help on an improvement for synchronization overhead
Hi all, I would like to check if someone could help me figure out an issue I am chasing on a libgomp patch intended to partially address the issue described at BZ#79784. I have identified that one of the bottlenecks is the global barrier used on both thread pool and team which causes a lof of cache ping-pong in high-core count machines. And it seems not be an aarch64 specific issue as hinted by the bugzilla. So the optimization I am implementing, which is similar of what LLVM openmp implementation does; is to use a per OMP thread barrier to synchronize team/task creation. The activation I have implemented so far is a simple linear one, where the master scan linearly over the children threads (LLVM openmp implement some fancy ones that I plan to take a look as well). The patch I came up so far is quite simple [2] and required some polish yet (some documentation, code styling, etc.), however there is one regression that is making me scratching my head: cancel-parallel-2. What it does to exercise OpenMP cancellation in a 'omp parallel' construct and the issue I am seeing is falling to understand why the final team barrier (done on gomp_team_end called by GOMP_parallel_end) it not synchronizing correctly with the team barrier in each OpenMP task. So any help on the design is appreciate (even if it would I should re-thinking it for libgomp). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79784 [2] https://github.com/zatrazz/gcc/tree/azanella/libgomp-scalability
Re: [EXTERNAL] Re: How to test aarch64 when building a cross-compiler?
On 25/11/2019 17:28, Andrew Dean via gcc wrote: >>> This completes successfully. However, when I then try to run the gcc tests >>> like >> so: >>> runtest --outdir . --tool gcc --srcdir /path/to/gcc/gcc/testsuite >>> aarch64.exp --target aarch64-linux-gnu --target_board aarch64-sim >>> --tool_exec >>> /path_to/build_dir/install/compilers/aarch64-linux-gnu/bin/aarch64-gli >>> bc-linux-gnu-gcc --verbose -v >>> >>> I get errors like this: >>> >>> aarch64-glibc-linux-gnu-gcc: fatal error: cannot read spec file >>> 'rdimon.specs': No such file or directory >>> >>> I can see that the rdimon.specs flag is added based on this line in aarch64- >> sim.exp: >> >> Where does aarch64-sim.exp comes from? > > /usr/share/dejagnu/baseboards/aarch64-sim.exp > >> >>> >>> set_board_info ldflags "[libgloss_link_flags] [newlib_link_flags] - >> specs=rdimon.specs" >>> >> I think this is for baremetal/newlib targets, ie. aarch64-elf, not for >> aarch64- >> linux-gnu. > > Hmm.. build-many-glibcs.py doesn't like either aarch64-elf or > aarch64-linux-elf... > I get a KeyError in build_compilers and build_glibcs when it tries to look up > the config with either of those values. > Unfortunately the build-many-glibcs.py does not have support for baremetal build yet (since it is a tool created to build cross-compiling toolchain using glibc).
Re: [PATCH v6] aarch64: Add split-stack support
Ping (with Szabolcs remarks fixed). On 07/02/2018 16:07, Adhemerval Zanella wrote: > Changes from previous version: > > - Changed the wait to call __morestack to use use a branch with link > instead of a simple branch. This allows use a call instruction and > avoid possible issues with later optimization passes which might > see a branch outside the instruction block (as noticed in previous > iterations while building a more complex workload as speccpu2006). > > - Change the return address to use the branch with link value and > set x12 to save x30. This simplifies the required instructions > to setup/save the return address. > > -- > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a split-stack field > is allocated before TCB by glibc, a target-specific __morestack implementation > and helper functions are added in libgcc and compiler supported in adjusted > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is to set the final stack adjustments using a 2 instructions > at most (mov/movk) which limits stack allocation to upper limit of 4GB. > The morestack call is non standard with x10 hollding the requested stack > pointer, x11 the argument pointer (if required), and x12 to return > continuation address. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > function: > mrsx9, tpidr_el0 > ldur x9, [x9, -8] > movx10, > movk x10, #0x0, lsl #16 > subx10, sp, x10 > movx11, sp # if function has stacked arguments > cmpx9, x10 > bcc.LX > main_fn_entry: > [function prologue] > LX: > bl __morestack > b main_fn_entry > > Notes: > > 1. Even if a function does not allocate a stack frame, a split-stack prologue >is created. It is to avoid issues with tail call for external symbols >which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldur >to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to >non-split-stack code, it adds 16k (or more) to the value found in > "allocate" >instructions (so non-split-stack code gets a larger stack). The amount is >tunable by a linker option. This feature is only implemented in the GNU >gold linker. > > 4. AArch64 does not handle >4G stack initially and although it is possible >to implement it, limiting to 4G allows to materize the allocation with >only 2 instructions (mov + movk) and thus simplifying the linker >adjustments required. Supporting multiple threads each requiring more >than 4G of stack is probably not that important, and likely to OOM at >run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.28. > > 6. Besides a regression tests I also checked with a SPECcpu2006 run with >-fsplit-stack additional option. I saw no regression besides 416.gamess >which fails on trunk as well (not sure if some misconfiguration in my >environment). > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove > macro. > * gcc/config/aarch64/aarch64-protos.h: Add > aarch64_expand_split_stack_prologue and > aarch64_split_stack_space_check. > * gcc/config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Use > internal argument pointer instead of virtual_incoming_args_rtx. > (morestack_ref): New symbol. > (aarch64_load_split_stack_value): New function. > (aarch64_expand_split_stack_prologue): Likewise. > (aarch64_inte
Re: [PATCH] Add sanitizer support for AArch64 ILP32
On 23/02/2018 11:39, Jakub Jelinek wrote: > On Fri, Feb 23, 2018 at 11:27:19AM -0300, Adhemerval Zanella wrote: >> This patch adds asan support for aarch64 ilp32. It is based on 'official' >> glibc support branch [1] (which is in turn based on Linux branch [2]). >> >> Main changes for libsanitizer is the kernel ABI support for internal >> syscalls. Different than x32, ILP32 tries to leverage 32-bits syscalls >> with kernel ABI for 64 bit argument passing (ftruncate for instance) >> are passed using the new Linux generic way for 32 bits (by splitting >> high and low in two registers). >> >> So instead of adding more adhoc defines to ILP32 this patch extends the >> SANITIZER_USES_CANONICAL_LINUX_SYSCALLS to represent both 64 bits >> argument syscalls (value of 1) and 32 bits (value of 2). >> >> The shadow offset used is similar to the arm one (29). >> >> I did a sanity check on aarch64-linux-gnu and x86_64-linux-gnu without >> any regressions. >> >> PS: I sent this change llvm-commit, but it got stalled because llvm >> lack aarch64 ilp32 support and noone could confirm no breakage due >> missing buildbot. Also the idea is not sync it back to libsanitizer. > > It will be a nightmare for merges from upstream, but because the upstream is > so uncooperative probably there is no other way. What I meant it the idea is to sync it back to libsanitizer, sorry for the misunderstanding. I can try to push it again for compiler-rt, but it took a long way to actually get any reply. > >> gcc/ >> >> * gcc/config/aarch64/aarch64.c (aarch64_asan_shadow_offset): Add > > No gcc/ prefixes in gcc/ChangeLog. Right, fixed locally. > >> TARGET_ILP32 support. >> >> libsanitizer/ > > Nor libsanitizer/ prefixes in libsaniter/ChangeLog. >> --- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h >> +++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h >> @@ -117,9 +117,9 @@ typedef signed long long sptr; // NOLINT >> typedef unsigned long uptr; // NOLINT >> typedef signed long sptr; // NOLINT >> #endif // defined(_WIN64) >> -#if defined(__x86_64__) >> -// Since x32 uses ILP32 data model in 64-bit hardware mode, we must use >> -// 64-bit pointer to unwind stack frame. >> +#if defined(__x86_64__) || SANITIZER_AARCH64_ILP32 >> +// Since x32 adn AArch64 ILP32 use ILP32 data model in 64-bit hardware >> mode, > > s/adn/and/ Ack. > >> @@ -445,11 +467,7 @@ uptr internal_execve(const char *filename, char *const >> argv[], >> // - sanitizer_common.h >> bool FileExists(const char *filename) { >>struct stat st; >> -#if SANITIZER_USES_CANONICAL_LINUX_SYSCALLS >> - if (internal_syscall(SYSCALL(newfstatat), AT_FDCWD, filename, , 0)) >> -#else >>if (internal_stat(filename, )) >> -#endif >> return false; >>// Sanity check: filename is a regular file. >>return S_ISREG(st.st_mode); > > I'm uneasy about this hunk, you change behavior not just on aarch64 ilp32, > but for many other targets too. Please don't. My understanding it a noop since for SANITIZER_USES_CANONICAL_LINUX_SYSCALLS internal_stat would be a be a internal_syscall(newfstatat, ...) (also the internal_* names are exactly to abstract this kind of constructions). > > Do you really want it for GCC8? We are in stage4 and this isn't a > regression... > > Jakub > I don't have a strong opinion about that, it would to good to have on GCC8, but I think I can wait.
[PATCH] Add sanitizer support for AArch64 ILP32
This patch adds asan support for aarch64 ilp32. It is based on 'official' glibc support branch [1] (which is in turn based on Linux branch [2]). Main changes for libsanitizer is the kernel ABI support for internal syscalls. Different than x32, ILP32 tries to leverage 32-bits syscalls with kernel ABI for 64 bit argument passing (ftruncate for instance) are passed using the new Linux generic way for 32 bits (by splitting high and low in two registers). So instead of adding more adhoc defines to ILP32 this patch extends the SANITIZER_USES_CANONICAL_LINUX_SYSCALLS to represent both 64 bits argument syscalls (value of 1) and 32 bits (value of 2). The shadow offset used is similar to the arm one (29). I did a sanity check on aarch64-linux-gnu and x86_64-linux-gnu without any regressions. PS: I sent this change llvm-commit, but it got stalled because llvm lack aarch64 ilp32 support and noone could confirm no breakage due missing buildbot. Also the idea is not sync it back to libsanitizer. gcc/ * gcc/config/aarch64/aarch64.c (aarch64_asan_shadow_offset): Add TARGET_ILP32 support. libsanitizer/ * libsanitizer/sanitizer_common/sanitizer_internal_defs.h (uhwptr, OFF_T, operator_new_size_type): Define for SANITIZER_AARCH64_ILP32. * libsanitizer/sanitizer_common/sanitizer_linux.cc (SYSCALL_LL64): New define: wrap a 64-bit argument for syscall. (internal_ftruncate, internal_stat, internal_lstat, internal_unlink, internal_rename, internal_lseek): Add support for SANITIZER_USES_CANONICAL_LINUX_SYSCALLS equal to 2. (FileExists): Use internal_stat regardless. * libsanitizer/sanitizer_common/sanitizer_platform.h (SANITIZER_AARCH64_ILP32): Define. (SANITIZER_USES_CANONICAL_LINUX_SYSCALLS): Define to 1 or 2 depending of the expected architecture ABI. * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.cc (CHECK_SIZE_AND_OFFSET(ipc_perm,mode): Define for AArch64 ILP32. * libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h (struct_kernel_stat_sz, struct_kernel_stat64_sz, __sanitizer___kernel_uid_t, __sanitizer___kernel_gid_t): Likewise. [__sanitizer_dirent] (d_ino, d_off): Likewise. [1] https://sourceware.org/git/?p=glibc.git;a=shortlog;h=refs/heads/arm/ilp32 [2] git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git --- gcc/config/aarch64/aarch64.c | 5 ++- .../sanitizer_common/sanitizer_internal_defs.h | 12 +++ libsanitizer/sanitizer_common/sanitizer_linux.cc | 40 ++ libsanitizer/sanitizer_common/sanitizer_platform.h | 15 ++-- .../sanitizer_platform_limits_posix.cc | 3 +- .../sanitizer_platform_limits_posix.h | 13 +-- 6 files changed, 69 insertions(+), 19 deletions(-) diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 33c90ef..eebf85b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -16171,7 +16171,10 @@ aarch64_split_dimode_const_store (rtx dst, rtx src) static unsigned HOST_WIDE_INT aarch64_asan_shadow_offset (void) { - return (HOST_WIDE_INT_1 << 36); + if (TARGET_ILP32) +return (HOST_WIDE_INT_1 << 29); + else +return (HOST_WIDE_INT_1 << 36); } static rtx diff --git a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h index edd6a21..aef83c2 100644 --- a/libsanitizer/sanitizer_common/sanitizer_internal_defs.h +++ b/libsanitizer/sanitizer_common/sanitizer_internal_defs.h @@ -117,9 +117,9 @@ typedef signed long long sptr; // NOLINT typedef unsigned long uptr; // NOLINT typedef signed long sptr; // NOLINT #endif // defined(_WIN64) -#if defined(__x86_64__) -// Since x32 uses ILP32 data model in 64-bit hardware mode, we must use -// 64-bit pointer to unwind stack frame. +#if defined(__x86_64__) || SANITIZER_AARCH64_ILP32 +// Since x32 adn AArch64 ILP32 use ILP32 data model in 64-bit hardware mode, +// we must use 64-bit pointer to unwind stack frame. typedef unsigned long long uhwptr; // NOLINT #else typedef uptr uhwptr; // NOLINT @@ -144,7 +144,7 @@ typedef int error_t; typedef int pid_t; #if SANITIZER_FREEBSD || SANITIZER_NETBSD || SANITIZER_MAC || \ -(SANITIZER_LINUX && defined(__x86_64__)) +(SANITIZER_LINUX && (defined(__x86_64__) || SANITIZER_AARCH64_ILP32)) typedef u64 OFF_T; #else typedef uptr OFF_T; @@ -154,8 +154,8 @@ typedef u64 OFF64_T; #if (SANITIZER_WORDSIZE == 64) || SANITIZER_MAC typedef uptr operator_new_size_type; #else -# if defined(__s390__) && !defined(__s390x__) -// Special case: 31-bit s390 has unsigned long as size_t. +# if (defined(__s390__) && !defined(__s390x__)) || SANITIZER_AARCH64_ILP32 +// Special case: 31-bit s390 and AArch64 ILP32 have unsigned long as size_t. typedef unsigned long operator_new_size_type; # else typedef
Re: [PATCH v6] aarch64: Add split-stack support
On 13/02/2018 13:13, Szabolcs Nagy wrote: > On 07/02/18 18:07, Adhemerval Zanella wrote: > 5. The TCB support on GLIBC is meant to be included in version 2.28. >> > ... >> +/* -fsplit-stack uses a TCB field available on glibc-2.27. GLIBC also >> + exports symbol, __tcb_private_ss, to signal it has the field available >> + on TCB bloc. This aims to prevent binaries linked against newer >> + GLIBC to run on non-supported ones. */ > > > i suspect this needs to be updated since the glibc patch > is not committed yet. > > (i'll review the glibc patch, if it looks ok then it can > be committed after the gcc side is accepted.) I fixed the commit message locally, thanks for checking on this. > >> + >> +static bool >> +aarch64_supports_split_stack (bool report ATTRIBUTE_UNUSED, >> + struct gcc_options *opts ATTRIBUTE_UNUSED) >> +{ >> +#ifndef TARGET_GLIBC_MAJOR >> +#define TARGET_GLIBC_MAJOR 0 >> +#endif >> +#ifndef TARGET_GLIBC_MINOR >> +#define TARGET_GLIBC_MINOR 0 >> +#endif >> + /* Note: Can't test DEFAULT_ABI here, it isn't set until later. */ >> + if (TARGET_GLIBC_MAJOR * 1000 + TARGET_GLIBC_MINOR >= 2026) >> + return true; >> + >> + if (report) >> + error ("%<-fsplit-stack%> currently only supported on AArch64 GNU/Linux >> with glibc-2.27 or later"); >> + return false; >> +} >> + >> +#undef TARGET_SUPPORTS_SPLIT_STACK >> +#define TARGET_SUPPORTS_SPLIT_STACK aarch64_supports_split_stack >> +
[PATCH v6] aarch64: Add split-stack support
Changes from previous version: - Changed the wait to call __morestack to use use a branch with link instead of a simple branch. This allows use a call instruction and avoid possible issues with later optimization passes which might see a branch outside the instruction block (as noticed in previous iterations while building a more complex workload as speccpu2006). - Change the return address to use the branch with link value and set x12 to save x30. This simplifies the required instructions to setup/save the return address. -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a split-stack field is allocated before TCB by glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is to set the final stack adjustments using a 2 instructions at most (mov/movk) which limits stack allocation to upper limit of 4GB. The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer (if required), and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 ldur x9, [x9, -8] movx10, movk x10, #0x0, lsl #16 subx10, sp, x10 movx11, sp # if function has stacked arguments cmpx9, x10 bcc.LX main_fn_entry: [function prologue] LX: bl __morestack b main_fn_entry Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldur to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 4G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.28. 6. Besides a regression tests I also checked with a SPECcpu2006 run with -fsplit-stack additional option. I saw no regression besides 416.gamess which fails on trunk as well (not sure if some misconfiguration in my environment). libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_load_split_stack_value): New function. (aarch64_expand_split_stack_prologue): Likewise. (aarch64_internal_arg_pointer): Likewise. (aarch64_file_end): Emit the split-stack note sections. (aarch64_split_stack_space_check): Likewise. (TARGET_ASM_FILE_END): New macro. (TARGET_INTERNAL_ARG_POINTER): Likewise. * gcc/config/aarch64/aarch64.h (aarch64_frame): Add split_stack_arg_pointer to setup the argument pointer when using split-stack. * gcc/config/aarch64/aarch64.md (UNSPECV_STACK_CHECK): New define. (split_stack_prologue): New expand. (split_stack_space_check): Likewise. ---
Re: [PATCH] Enable ifunc attribute by default for ARM GNU/Linux
On 09/10/2017 19:20, Joseph Myers wrote: > On Mon, 9 Oct 2017, Adhemerval Zanella wrote: > >> *-*-linux*) >> case ${target} in >> -aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*) >> +aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-* >> | arm*-*) > > I think the cases here are meant to be in alphabetical order. > Ack, I will change. Is it ok to commit with the change?
[PATCH] Enable ifunc attribute by default for ARM GNU/Linux
Similar to other architectures with IFUNC binutils/glibc support, this patch enables the ifunc attribute for ARM GNU/Linux. Although not required for build master GLIBC, the intention is to allow refactor its assembly implementation to C [1]. Tested compilation of glibc (in conjunction with a glibc patch to support using the attribute on ARM) with build-many-glibcs.py (with a patch to add a armv7 variant which enables multiarch). I have not run the GCC tests for ARM. Adhemerval Zanella <adhemerval.zane...@linaro.org> * config.gcc (default_gnu_indirect_function): Default to yes for arm*-*-linux* with glibc. [1] https://sourceware.org/ml/libc-alpha/2017-10/msg00334.html --- gcc/ChangeLog | 5 + gcc/config.gcc | 2 +- 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/gcc/config.gcc b/gcc/config.gcc index 91a55e8..26aa8f6 100644 --- a/gcc/config.gcc +++ b/gcc/config.gcc @@ -3104,7 +3104,7 @@ case ${target} in ;; *-*-linux*) case ${target} in - aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-*) + aarch64*-* | i[34567]86-* | powerpc*-* | s390*-* | sparc*-* | x86_64-* | arm*-*) default_gnu_indirect_function=yes ;; esac -- 2.7.4
Re: [PATCH v5] aarch64: Add split-stack initial support
Ping. On 26/07/2017 16:08, Adhemerval Zanella wrote: > This is an update patch based on my previous submission [1]. The changes > from previous version are: > > - Update aarch64_supports_split_stack to return true an let the loader > to actually emit an error if it does not provide the required TCB > field. The TCB field support is planed for GLIBC 2.27. > > - Some cleanup on morestack-c.c to avoid code duplication (ss_pointer > function). > > I am still some sporadic failures, but it due missing linker support > for split calls due not enough stack size (on pprof and other go tests > that call backtrace on signal handling). This could be mitigate by > increasing the BACKOFF to a higher value, but to not deviate from other > ports with split-stack support this patch is using the default values > (initial stack size of 16k and backoff of 4k). This should be correctly > handled with proper gold suppor (as for other ports). > > -- > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a split-stack field > is allocated before TCB by glibc, a target-specific __morestack implementation > and helper functions are added in libgcc and compiler supported in adjusted > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is to set the final stack adjustments using a 2 instructions > at most (mov/movk) which limits stack allocation to upper limit of 4GB. > The morestack call is non standard with x10 hollding the requested stack > pointer, x11 the argument pointer (if required), and x12 to return > continuation address. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > function: > mrsx9, tpidr_el0 > movx10, > movk x10, #0x0, lsl #16 > subx10, sp, x10 > movx11, sp # if function has stacked arguments > adrp x12, main_fn_entry > addx12, x12, :lo12:.L2 > cmpx9, x10 > b.lt > b __morestack > main_fn_entry: > [function prologue] > > Notes: > > 1. Even if a function does not allocate a stack frame, a split-stack prologue >is created. It is to avoid issues with tail call for external symbols >which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr >to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to >non-split-stack code, it adds 16k (or more) to the value found in > "allocate" >instructions (so non-split-stack code gets a larger stack). The amount is >tunable by a linker option. The edit means aarch64 does not need to >implement __morestack_non_split, necessary on x86 because insufficient >space is available there to edit the stack comparison code. This feature >is only implemented in the GNU gold linker. > > 4. AArch64 does not handle >4G stack initially and although it is possible >to implement it, limiting to 4G allows to materize the allocation with >only 2 instructions (mov + movk) and thus simplifying the linker >adjustments required. Supporting multiple threads each requiring more >than 4G of stack is probably not that important, and likely to OOM at >run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.26. > > 6. The continuation address materialized on x12 is done using 'adrp' >plus add and a static relocation. Current code uses the >aarch64_expand_mov_immediate function and since a better alternative >would be 'adp', it could be a future optimization (not implemented >in this patch). > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aar
[PATCH v5] aarch64: Add split-stack initial support
This is an update patch based on my previous submission [1]. The changes from previous version are: - Update aarch64_supports_split_stack to return true an let the loader to actually emit an error if it does not provide the required TCB field. The TCB field support is planed for GLIBC 2.27. - Some cleanup on morestack-c.c to avoid code duplication (ss_pointer function). I am still some sporadic failures, but it due missing linker support for split calls due not enough stack size (on pprof and other go tests that call backtrace on signal handling). This could be mitigate by increasing the BACKOFF to a higher value, but to not deviate from other ports with split-stack support this patch is using the default values (initial stack size of 16k and backoff of 4k). This should be correctly handled with proper gold suppor (as for other ports). -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a split-stack field is allocated before TCB by glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is to set the final stack adjustments using a 2 instructions at most (mov/movk) which limits stack allocation to upper limit of 4GB. The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer (if required), and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 movx10, movk x10, #0x0, lsl #16 subx10, sp, x10 movx11, sp # if function has stacked arguments adrp x12, main_fn_entry addx12, x12, :lo12:.L2 cmpx9, x10 b.lt b __morestack main_fn_entry: [function prologue] Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 4G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.26. 6. The continuation address materialized on x12 is done using 'adrp' plus add and a static relocation. Current code uses the aarch64_expand_mov_immediate function and since a better alternative would be 'adp', it could be a future optimization (not implemented in this patch). libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport to emit 'b' instruction to rtx different than LABEL_REF. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_load_split_stack_value): New
[PATCH] sparc: Set noexecstack on mulsi3, divsi3, and modsi3
A recent GLIBC fix for sparc [1] made some configuration to show an executable stack on ld.so (shown on elf/check-execstack testcase failure). It is because with generated sparc toolchain from build-many-glibcs.py (a GLIBC script to produce cross-compiling toolchains) the minimum supported sparc32 version is pre-v9 and it requires a software implementation of '.udiv'. Since now we are using libgcc.a one instead, it must have the '.note.GNU-stack' so linker can properly set the stack non executable. >From a build using a toolchain from build-many-glibcs.py: elf/librtld.os.map [...] .../sparc64-glibc-linux-gnu/6.2.1/32/libgcc.a(_divsi3.o) .../sparc64-glibc-linux-gnu/6.2.1/32/libgcc.a(_udivdi3.o) (.udiv) .../sparc64-glibc-linux-gnu/6.2.1/32/libgcc.a(_clz.o) .../lib/gcc/sparc64-glibc-linux-gnu/6.2.1/32/libgcc.a(_udivdi3.o) (__clz_tab) [...] And dumping _udivdi3.o section headers: [Nr] Name TypeAddr OffSize ES Flg Lk Inf Al [ 0] NULL 00 00 00 0 0 0 [ 1] .text PROGBITS 34 0002b0 00 AX 0 0 4 [ 2] .data PROGBITS 0002e4 00 00 WA 0 0 1 [ 3] .bss NOBITS 0002e4 00 00 WA 0 0 1 [ 4] .debug_line PROGBITS 0002e4 00010d 00 0 0 1 [ 5] .rela.debug_line RELA 0007c0 0c 0c I 12 4 4 [ 6] .debug_info PROGBITS 0003f1 ab 00 0 0 1 [ 7] .rela.debug_info RELA 0007cc 30 0c I 12 6 4 [ 8] .debug_abbrev PROGBITS 00049c 14 00 0 0 1 [ 9] .debug_arangesPROGBITS 0004b0 20 00 0 0 8 [10] .rela.debug_arang RELA 0007fc 18 0c I 12 9 4 [11] .shstrtab STRTAB 000814 70 00 0 0 1 [12] .symtab SYMTAB 0004d0 000220 10 13 32 4 [13] .strtab STRTAB 0006f0 cf 00 0 0 1 I am not seeing this on a native gcc build which I configured with: ' --with-arch-directory=sparc64 --enable-multiarch --enable-targets=all --with-cpu-32=ultrasparc --with-long-double-128 --enable-multilib' Both libgcc's __udivdi3 and __umoddi3 do not pull .udiv since for this libgcc build both are using hardware instructions: elf/librtld.os.map /home/azanella/gcc/install/lib/gcc/sparc64-linux-gnu/6.3.1/32/libgcc.a(_udivdi3.o) /home/azanella/glibc/glibc-git-build-sparcv9/elf/dl-allobjs.os (__udivdi3) /home/azanella/gcc/install/lib/gcc/sparc64-linux-gnu/6.3.1/32/libgcc.a(_umoddi3.o) /home/azanella/glibc/glibc-git-build-sparcv9/elf/dl-allobjs.os (__umoddi3) This patch adds them missing noexectack on sparc assembly implementation. I saw no regression on gcc testsuite and it fixes the regression on GLIBC side. libgcc/ * config/sparc/lb1spc.S [__ELF__ && __linux__]: Emit .note.GNU-stack section for a non-executable stack. [1] https://sourceware.org/git/?p=glibc.git;a=commit;h=bdc543e338281da051b3dc06eae96c330a485ce6 --- libgcc/ChangeLog | 5 + libgcc/config/sparc/lb1spc.S | 6 ++ 2 files changed, 11 insertions(+) diff --git a/libgcc/config/sparc/lb1spc.S b/libgcc/config/sparc/lb1spc.S index b60bd57..e693864 100644 --- a/libgcc/config/sparc/lb1spc.S +++ b/libgcc/config/sparc/lb1spc.S @@ -5,6 +5,12 @@ slightly edited to match the desired calling convention, and also to optimize them for our purposes. */ +/* An executable stack is *not* required for these functions. */ +#if defined(__ELF__) && defined(__linux__) +.section .note.GNU-stack,"",%progbits +.previous +#endif + #ifdef L_mulsi3 .text .align 4 -- 2.7.4
Re: [PATCH v4] aarch64: Add split-stack initial support
Ping. On 15/02/2017 21:23, Adhemerval Zanella wrote: > This is an update patch from my previous version (v3) based mainly on > glibc comments on exported loader symbol [1]. The changes from previous > version are: > >- Remove __tcb_private_ss call on libgcc and emit a data usage on > glibc symbol when split-stack is used. This removes the runtime > errors when running on older glibc and instead make the loader > fails with a missing symbol. > >- Add glibc version check on split-stack support check. > >- Some comments fixes on morestack.S. > >- Remove some compile warnings on morestack-c.c. > > -- > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a split-stack field > is allocated before TCB by glibc, a target-specific __morestack implementation > and helper functions are added in libgcc and compiler supported in adjusted > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is to set the final stack adjustments using a 2 instructions > at most (mov/movk) which limits stack allocation to upper limit of 4GB. > The morestack call is non standard with x10 hollding the requested stack > pointer, x11 the argument pointer (if required), and x12 to return > continuation address. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > function: > mrsx9, tpidr_el0 > movx10, > movk x10, #0x0, lsl #16 > subx10, sp, x10 > movx11, sp # if function has stacked arguments > adrp x12, main_fn_entry > addx12, x12, :lo12:.L2 > cmpx9, x10 > b.lt > b __morestack > main_fn_entry: > [function prologue] > > Notes: > > 1. Even if a function does not allocate a stack frame, a split-stack prologue >is created. It is to avoid issues with tail call for external symbols >which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr >to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to >non-split-stack code, it adds 16k (or more) to the value found in > "allocate" >instructions (so non-split-stack code gets a larger stack). The amount is >tunable by a linker option. The edit means aarch64 does not need to >implement __morestack_non_split, necessary on x86 because insufficient >space is available there to edit the stack comparison code. This feature >is only implemented in the GNU gold linker. > > 4. AArch64 does not handle >4G stack initially and although it is possible >to implement it, limiting to 4G allows to materize the allocation with >only 2 instructions (mov + movk) and thus simplifying the linker >adjustments required. Supporting multiple threads each requiring more >than 4G of stack is probably not that important, and likely to OOM at >run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.26. > > 6. The continuation address materialized on x12 is done using 'adrp' >plus add and a static relocation. Current code uses the >aarch64_expand_mov_immediate function and since a better alternative >would be 'adp', it could be a future optimization (not implemented >in this patch). > > [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove > macro. > * gcc/config/aarch64/aarch64-protos.h: Add > aarch64_expand_split_stack_prologue and > aarch64_split_stack_space_check. > * gcc/con
[PATCH v4] aarch64: Add split-stack initial support
This is an update patch from my previous version (v3) based mainly on glibc comments on exported loader symbol [1]. The changes from previous version are: - Remove __tcb_private_ss call on libgcc and emit a data usage on glibc symbol when split-stack is used. This removes the runtime errors when running on older glibc and instead make the loader fails with a missing symbol. - Add glibc version check on split-stack support check. - Some comments fixes on morestack.S. - Remove some compile warnings on morestack-c.c. -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a split-stack field is allocated before TCB by glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is to set the final stack adjustments using a 2 instructions at most (mov/movk) which limits stack allocation to upper limit of 4GB. The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer (if required), and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 movx10, movk x10, #0x0, lsl #16 subx10, sp, x10 movx11, sp # if function has stacked arguments adrp x12, main_fn_entry addx12, x12, :lo12:.L2 cmpx9, x10 b.lt b __morestack main_fn_entry: [function prologue] Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 4G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.26. 6. The continuation address materialized on x12 is done using 'adrp' plus add and a static relocation. Current code uses the aarch64_expand_mov_immediate function and since a better alternative would be 'adp', it could be a future optimization (not implemented in this patch). [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00272.html libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (aarch64_gen_far_branch): Add suport to emit 'b' instruction to rtx different than LABEL_REF. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_load_split_stack_value): New function. (aarch64_expand_split_stack_prologue): Likewise. (aarch64_internal_arg_pointer): Likewise. (aarch64_split_stack_space_check): Likewise. (aarch64_file_end): Emit the split-stack note sections.
[PATCH v3] aarch64: Add split-stack support
Changes from previous version: - The split-stack are is now allocated before the thread pointer (instead of tcbhead_t which is positioned after it) as decribed in glibc patch [1]. It has the advantage to not require linker changes to take in consideration the new tcbhead_t size and it also fixed the regression I was seeing on the previous version of this patch at runtime/pprof: __splitstack_setcontext updates some internal TLS variables and with split-stack trying to update a larger tcbhead_t field it clobbered the tls variable when using a default static linker (which expected a default tcbhead_t structure as size of 16). - Function aarch64_split_stack_space_check is changed back to previous implementation because a wrong comment in my previous version lead to wrong assumptions on [2]. The function will generate a jump to 'label' variable (which will call __morestack_allocate_stack_space) not the arch-specific __morestack implementation. I corrected the comment as wellA - Change the stack adjustments to always use 2 instructions. [3] I am not seeing any more issue when using split-stack neither check-go regressions with this version. [1] https://sourceware.org/ml/libc-alpha/2017-02/msg00247.html [2] https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00093.html [3] https://gcc.gnu.org/ml/gcc-patches/2017-02/msg00055.html -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is similar to powerpc one: at most 2 GB of stack allocation is support so stack adjustments can be done with 2 instructions (either just a movn plus nop or a movn followed by movk). The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer, and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 movx10, - movk 0x0 addx10, sp, x10 movx11, sp # if function has stacked arguments adrp x12, main_fn_entry addx12, x12, :lo12:.L2 cmpx9, x10 b.lt b __morestack main_fn_entry: [function prologue] Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 4G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.25. 6. The continuation address materialized on x12 is done using 'adrp' plus add and a static relocation. Current code uses the aarch64_expand_mov_immediate function and since a better alternative would be 'adp', it could be a future optimization (not implemented in this patch). libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: PR 67877 * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro.
Re: [PATCH v2] aarch64: Add split-stack initial support
On 25/01/2017 10:10, Jiong Wang wrote: > On 24/01/17 18:05, Adhemerval Zanella wrote: >> >> On 03/01/2017 13:13, Wilco Dijkstra wrote: >> >>> + /* If function uses stacked arguments save the old stack value so >>> morestack >>> + can return it. */ >>> + reg11 = gen_rtx_REG (Pmode, R11_REGNUM); >>> + if (cfun->machine->frame.saved_regs_size >>> + || cfun->machine->frame.saved_varargs_size) >>> +emit_move_insn (reg11, stack_pointer_rtx); >>> >>> This doesn't look right - we could have many arguments even without varargs >>> or >>> saved regs. This would need to check varargs as well as ctrl->args.size (I >>> believe >>> that is the size of the arguments on the stack). It's fine to omit this >>> optimization >>> in the first version - we already emit 2-3 extra instructions for the check >>> anyway. >> I will check for a better solution. > > Hi Adhemerval > > My only concern on this this patch is the initialization of R11 (internal > arg > pointer). The current implementation looks to me is generating wrong code > for a > testcase simply return the sum of ten int param, I see the function body is > using R11 while there is no initialization of it in split prologue, so if the > execution flow is *not* through __morestack, then R11 is not initialized. > As Wilco suggested, I feel using crtl->args.size instead of > cfun->machine->frame.saved_regs_size might be the correct approach after > checking assign_parms in function.c. > Hi Jiong, Indeed the previous version which used 'saved_regs_size' is wrong for stacked parameters. A simple 10 arguments function call shows when the reg11 is evaluated: cfun->machine->frame.saved_regs_size= 0 cfun->machine->frame.saved_varargs_size = 0 crtl->args.size = 16 So indeed 'ctrl->args.size' seems the correct argument to used in this case (which will trigger the correct reg11 set for split-stack case). In this version I also removed some unused variables I left in previous patch. >From 30cbedc303d364dd94f0d35abee1d237a3671cdb Mon Sep 17 00:00:00 2001 From: Adhemerval Zanella <adhemerval.zane...@linaro.org> Date: Wed, 4 May 2016 21:13:39 + Subject: [PATCH] aarch64: Add split-stack initial support This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is similar to powerpc one: at most 2 GB of stack allocation is support so stack adjustments can be done with 2 instructions (either just a movn plus nop or a movn followed by movk). The morestack call is non standard with x10 hollding the requested stack pointer, x11 the argument pointer, and x12 to return continuation address. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): function: mrsx9, tpidr_el0 movx10, - movk 0x0 addx10, sp, x10 movx11, sp # if function has stacked arguments adrp x12, main_fn_entry addx12, x12, :lo12:.L2 cmpx9, x10 b.lt b __morestack main_fn_entry: [function prologue] Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >4G stack initially and although it is possible to implement it, limiting to 4G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each
Re: [PATCH v2] aarch64: Add split-stack initial support
On 03/01/2017 13:13, Wilco Dijkstra wrote: > Adhemerval Zanella wrote: > > Sorry for the late reply - but I think it's getting there. A few more > comments: No worries. > > + /* If function uses stacked arguments save the old stack value so morestack > + can return it. */ > + reg11 = gen_rtx_REG (Pmode, R11_REGNUM); > + if (cfun->machine->frame.saved_regs_size > + || cfun->machine->frame.saved_varargs_size) > +emit_move_insn (reg11, stack_pointer_rtx); > > This doesn't look right - we could have many arguments even without varargs or > saved regs. This would need to check varargs as well as ctrl->args.size (I > believe > that is the size of the arguments on the stack). It's fine to omit this > optimization > in the first version - we already emit 2-3 extra instructions for the check > anyway. I will check for a better solution. > > > +void > +aarch64_split_stack_space_check (rtx size, rtx label) > { > + rtx mem, ssvalue, cc, cmp, jump, temp; > + rtx requested = gen_reg_rtx (Pmode); > + /* Offset from thread pointer to __private_ss. */ > + int psso = 0x10; > + > + /* Load __private_ss from TCB. */ > + ssvalue = gen_rtx_REG (Pmode, R9_REGNUM); > > ssvalue doesn't need to be a hardcoded register. Indeed, and it seems that this was not being triggered. I have fixed it in this version. > > + emit_insn (gen_aarch64_load_tp_hard (ssvalue)); > + mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso)); > + emit_move_insn (ssvalue, mem); > + > + temp = gen_rtx_REG (Pmode, R10_REGNUM); > + > + /* And compare it with frame pointer plus required stack. */ > + size = force_reg (Pmode, size); > + emit_move_insn (requested, gen_rtx_MINUS (Pmode, stack_pointer_rtx, size)); > + > + /* Jump to __morestack call if current __private_ss is not suffice. */ > + cc = aarch64_gen_compare_reg (LT, temp, ssvalue); > > This uses X10, but where is it set??? I fixed it on this version. > > + cmp = gen_rtx_fmt_ee (GEU, VOIDmode, cc, const0_rtx); > + jump = emit_jump_insn (gen_condjump (cmp, cc, label)); > + JUMP_LABEL (jump) = label; > +} > > So neither X10 nor X12 are set before potentially calling __morestack, so I > don't > think it will work. Could this be causing the crash you mentioned? I do not think so, the issue in with the runtime/pprof libgo test that fails with [Switching to LWP 18926] 0x0050c358 in runtime.sigtrampgo (sig=sig@entry=27, info=info@entry=0x7fb63d5da0, ctx=ctx@entry=0x7fb63d5e20) at ../../../gcc-git/libgo/go/runtime/signal_unix.go:221 221 setg(g.m.gsignal) Where g.m is null. Trying to obtain a stackstrace I am not seeing: (gdb) bt #0 0x0050c358 in runtime.sigtrampgo (sig=sig@entry=27, info=info@entry=0x7fb63d5da0, ctx=ctx@entry=0x7fb63d5e20) at ../../../gcc-git/libgo/go/runtime/signal_unix.go:221 #1 0x0056acb4 in runtime.sigtramp (sig=27, info=0x7fb63d5da0, context=0x7fb63d5e20) at ../../../gcc-git/libgo/runtime/go-signal.c:131 #2 #3 pprof_test.cpuHog1 () at pprof_test.go:52 #4 0x0040c814 in pprof_test.cpuHogger (f=f@entry=0x57c560 <runtime_pprof_test.cpuHog1$descriptor>, dur=) at pprof_test.go:37 #5 0x0040c9f8 in pprof_test.$nested1 (dur=) at pprof_test.go:75 #6 0x0040d038 in pprof_test.testCPUProfile (t=t@entry=0x420804e680, need=..., f=f@entry=0x57c600 <runtime_pprof_test.$nested1$descriptor>) at pprof_test.go:144 #7 0x0040c9a8 in runtime_pprof_test.TestCPUProfile (t=0x420804e680) at pprof_test.go:74 #8 0x00543bec in testing.tRunner (param=, fn=) at ../../../gcc-git/libgo/go/testing/testing.go:656 #9 0x00543c84 in testing.$thunk24 (__go_thunk_parameter=) at ../../../gcc-git/libgo/go/testing/testing.go:693 #10 0x0041a7dc in kickoff () at ../../../gcc-git/libgo/runtime/proc.c:258 /build/buildd/gdb-7.9/gdb/dwarf2-frame.c:1732: internal-error: add_cie: Assertion `n < 1 || cie_table->entries[n - 1]->cie_pointer < cie->cie_pointer' failed. A problem internal to GDB has been detected, further debugging may prove unreliable. Quit this debugging session? (y or n) Which maybe the case that morestack.S unwind info not really correct. It could be a case for issue in gdb as well (I will check with a newer gdb). > > Wilco > >From 09b51fb706a8b15ecaea4ec4b8a80d0a7903053d Mon Sep 17 00:00:00 2001 From: Adhemerval Zanella <adhemerval.zane...@linaro.org> Date: Wed, 4 May 2016 21:13:39 + Subject: [PATCH] aarch64: Add split-stack initial support This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc,
Re: [PATCH v2] aarch64: Add split-stack initial support
On 15/11/2016 16:38, Wilco Dijkstra wrote: > > On 07/11/2016 16:59, Adhemerval Zanella wrote: >> On 14/10/2016 15:59, Wilco Dijkstra wrote: > >> There is no limit afaik on gold split stack allocation handling, >> and I think one could be added for each backend (in the method >> override require to implement it). >> >> In fact it is not really required to tie the nop generation with the >> instruction generated by 'aarch64_internal_mov_immediate', it is >> just a matter to simplify linker code. > > If there is no easy limit and you'll still require a nop, I think it is best > then > to emit mov N+movk #0. Then the scheduler won't be able to reorder > them with the add/sub. Good call, I have changed the patch to emit a mov N+mov #0 instead on relying the emit_nop. > >>> Is there any need to detect underflow of x10 or is there a guarantee that >>> stacks are >>> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's >>> safe >>> to do a signed comparison. >> >> I do not think so, at least none of current backend that implements >> split stack do so. > > OK, well a signed comparison like in your new version works for underflow. > > Now to the patch: > > > @@ -3316,6 +3339,28 @@ aarch64_expand_prologue (void) >aarch64_save_callee_saves (DFmode, callee_offset, V0_REGNUM, V31_REGNUM, >callee_adjust != 0 || frame_pointer_needed); >aarch64_sub_sp (IP1_REGNUM, final_adjust, !frame_pointer_needed); > + > + if (split_stack_arg_pointer_used_p ()) > +{ > + /* Setup the argument pointer (x10) for -fsplit-stack code. If > + __morestack was called, it will left the arg pointer to the > + old stack in x28. Otherwise, the argument pointer is the top > + of current frame. */ > + rtx x11 = gen_rtx_REG (Pmode, R11_REGNUM); > + rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM); > + rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); > + > + rtx not_more = gen_label_rtx (); > + > + rtx cmp = gen_rtx_fmt_ee (LT, VOIDmode, cc_reg, const0_rtx); > + rtx jump = emit_jump_insn (gen_condjump (cmp, cc_reg, not_more)); > + JUMP_LABEL (jump) = not_more; > + LABEL_NUSES (not_more) += 1; > + > + emit_move_insn (x11, x28); > + > + emit_label (not_more); > +} > > If you pass the old sp in x11 when called from __morestack you can remove > the above thunk completely. Indeed this snippet does not make sense anymore, I removed it. > > + /* It limits total maximum stack allocation on 2G so its value can be > + materialized using two instructions at most (movn/movk). It might be > + used by the linker to add some extra space for split calling non split > + stack functions. */ > + allocate = cfun->machine->frame.frame_size; > + if (allocate > ((HOST_WIDE_INT) 1 << 31)) > +{ > + sorry ("Stack frame larger than 2G is not supported for > -fsplit-stack"); > + return; > +} > > Note a 2-instruction mov/movk can generate any immediate up to 4GB and if > we need even large sizes, we could round up to a multiple of 64KB so that 2 > instructions are enough for a 48-bit stack size... I think we can set a limit of 4GB (powerpc64 backend limits to 2GB and it seems fine). I corrected the comment. > > + int ninsn = aarch64_internal_mov_immediate (reg10, GEN_INT (-allocate), > + true, Pmode); > + gcc_assert (ninsn == 1 || ninsn == 2); > + if (ninsn == 1) > +emit_insn (gen_nop ()); > > To avoid any issues with the nop being scheduled, it's best to emit an > explicit movk > here (0x if allocate > 0, or 0 if zero) using gen_insv_immdi. Right, I changed to that. > > +void > +aarch64_split_stack_space_check (rtx size, rtx label) > > Isn't very similar code used in aarch64_expand_split_stack_prologue? Any > possibility > to share/reuse? I though about it, but it would require split in two subparts (one to load __private_ss from TCB and another to jump to __morestack) and both are basically 4 lines. In the end I think current approach should be simpler. > > +static void > +aarch64_live_on_entry (bitmap regs) > +{ > + if (flag_split_stack) > +bitmap_set_bit (regs, R11_REGNUM); > +} > > I'm wondering whether you need extra code in aarch64_can_eliminate to deal > with the argument pointer? Also do we need to define a fixed register, or > will GCC > automatically allocate it to a callee-save if necessary? Now that you asked I think we can get rid of this live marking. I used as base for initial patch the powerpc backend
Re: [PATCH v2] aarch64: Add split-stack initial support
Ping. On 07/11/2016 16:59, Adhemerval Zanella wrote: > > > On 14/10/2016 15:59, Wilco Dijkstra wrote: >> Hi, >> > > Thanks for the thoughtful review and sorry for late response. > >>> Split-stack prologue on function entry is as follow (this goes before the >>> usual function prologue): >> >>> mrsx9, tpidr_el0 >>> movx10, - >> >> As Jiong already remarked, the nop won't work. Do we know the maximum >> adjustment >> that the linker is allowed to make? If so, and we can limit the adjustment >> to 16MB in >> most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub >> but that >> should be extremely rare. > > There is no limit afaik on gold split stack allocation handling, > and I think one could be added for each backend (in the method > override require to implement it). > > In fact it is not really required to tie the nop generation with the > instruction generated by 'aarch64_internal_mov_immediate', it is > just a matter to simplify linker code. > > And although 16MB should be rare, nilptr2.go tests allocates 134217824 > so this test fails with this low stack limit. I am not sure how well > is the stack usage on 'go', but I think we should at least support > current testcase scenario. So for current iteration I kept my > current approach, but I am open to suggestions. > > >> >>> nop/movk >> >>> addx10, sp, x10 >>> ldrx9, [x9, 16] >> >> Is there any need to detect underflow of x10 or is there a guarantee that >> stacks are >> never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's >> safe >> to do a signed comparison. > > I do not think so, at least none of current backend that implements > split stack do so. > >> >>> cmpx10, x9 >>> b.csenough >> >> Why save/restore x30 and the call x30+8 trick when we could pass the >> continuation address and use a tailcall? That also avoids emitting extra >> unwind info. >> >>> stpx30, [sp, -16] >>> bl __morestack >>> ldpx30, [sp], 16 >>> ret >> >> This part doesn't make any sense - both x28 and carry flag as an input, and >> spread >> across the prolog - why??? >> >>> enough: >>> mov x10, sp >> [prolog] >>> b.cscontinue >>> mov x10, x28 >> continue: >> [rest of function] >> >> Why not do this? >> >> function: >> mrsx9, tpidr_el0 >> subx10, sp, N & 0xfff000 >> subx10, x10, N & 0xfff >> ldrx9, [x9, 16] >> adr x12, main_fn_entry >> movx11, sp [if function has stacked arguments] >> cmpx10, x9 >> b.gemain_fn_entry >> b __morestack >> main_fn_entry: [x11 is argument pointer] >> [prolog] >> [rest of function] >> >> In __morestack you need to save x8 as well (another argument register!) and >> x12 (the >> continuation address). After returning from the call x8 doesn't need to be >> preserved. > > Indeed this strategy is way better and I adjusted the code follow it. > The only change is I am using a: > > [...] > cmp x9, x10 > b.ltmain_fn_entr > b __morestack. > [...] > > So I can issue a 'cmp , 0' on __morestack to indicate > the function was called. > >> >> There are several issues with unwinding in __morestack. x28 is not described >> as a callee-save >> so will be corrupted if unwinding across a __morestack call. This won't >> unwind correctly after >> the ldp as the unwinder will use the restored frame pointer to try to >> restore x29/x30: >> >> +ldp x29, x30, [x28, STACKFRAME_BASE] >> +ldr x28, [x28, STACKFRAME_BASE + 80] >> + >> +.cfi_remember_state >> +.cfi_restore 30 >> +.cfi_restore 29 >> +.cfi_def_cfa 31, 0 > > Indeed, it misses x28 save/restore. I think I have added the missing bits, > but I > must confess that I am not well versed in CFI directives. I will appreciate > if > you could help me on this new version. > >> >> This stores a random x30 value on the stack, what is the purpose of this? >> Nothing can unwind >> to here: >> >> +# Start using new stack >> +stp x29, x30, [x0, -16]! >> +mov sp, x0 >> >> Also we no longer need split_stack_arg_pointer_used_p () or any code that >> uses it (functions >> that don't have any arguments passed on the stack could omit the mov x11, >> sp). > > Right, we new strategy you proposed to do a branch this is indeed not > really required. I remove it from on this new patch. > >> >> Wilco >>
Re: [PATCH v2] aarch64: Add split-stack initial support
On 14/10/2016 15:59, Wilco Dijkstra wrote: > Hi, > Thanks for the thoughtful review and sorry for late response. >> Split-stack prologue on function entry is as follow (this goes before the >> usual function prologue): > >> mrsx9, tpidr_el0 >> movx10, - > > As Jiong already remarked, the nop won't work. Do we know the maximum > adjustment > that the linker is allowed to make? If so, and we can limit the adjustment to > 16MB in > most cases, emitting 2 subtracts is best. Larger offset need mov/movk/sub but > that > should be extremely rare. There is no limit afaik on gold split stack allocation handling, and I think one could be added for each backend (in the method override require to implement it). In fact it is not really required to tie the nop generation with the instruction generated by 'aarch64_internal_mov_immediate', it is just a matter to simplify linker code. And although 16MB should be rare, nilptr2.go tests allocates 134217824 so this test fails with this low stack limit. I am not sure how well is the stack usage on 'go', but I think we should at least support current testcase scenario. So for current iteration I kept my current approach, but I am open to suggestions. > >> nop/movk > >> addx10, sp, x10 >> ldrx9, [x9, 16] > > Is there any need to detect underflow of x10 or is there a guarantee that > stacks are > never allocated in the low 2GB (given the maximum adjustment is 2GB)? It's > safe > to do a signed comparison. I do not think so, at least none of current backend that implements split stack do so. > >> cmpx10, x9 >> b.csenough > > Why save/restore x30 and the call x30+8 trick when we could pass the > continuation address and use a tailcall? That also avoids emitting extra > unwind info. > >> stpx30, [sp, -16] >> bl __morestack >> ldpx30, [sp], 16 >> ret > > This part doesn't make any sense - both x28 and carry flag as an input, and > spread > across the prolog - why??? > >> enough: >> mov x10, sp > [prolog] >> b.cscontinue >> mov x10, x28 > continue: > [rest of function] > > Why not do this? > > function: > mrsx9, tpidr_el0 > subx10, sp, N & 0xfff000 > subx10, x10, N & 0xfff > ldrx9, [x9, 16] > adr x12, main_fn_entry > movx11, sp [if function has stacked arguments] > cmpx10, x9 > b.gemain_fn_entry > b __morestack > main_fn_entry: [x11 is argument pointer] > [prolog] > [rest of function] > > In __morestack you need to save x8 as well (another argument register!) and > x12 (the > continuation address). After returning from the call x8 doesn't need to be > preserved. Indeed this strategy is way better and I adjusted the code follow it. The only change is I am using a: [...] cmp x9, x10 b.ltmain_fn_entr b __morestack. [...] So I can issue a 'cmp , 0' on __morestack to indicate the function was called. > > There are several issues with unwinding in __morestack. x28 is not described > as a callee-save > so will be corrupted if unwinding across a __morestack call. This won't > unwind correctly after > the ldp as the unwinder will use the restored frame pointer to try to restore > x29/x30: > > + ldp x29, x30, [x28, STACKFRAME_BASE] > + ldr x28, [x28, STACKFRAME_BASE + 80] > + > + .cfi_remember_state > + .cfi_restore 30 > + .cfi_restore 29 > + .cfi_def_cfa 31, 0 Indeed, it misses x28 save/restore. I think I have added the missing bits, but I must confess that I am not well versed in CFI directives. I will appreciate if you could help me on this new version. > > This stores a random x30 value on the stack, what is the purpose of this? > Nothing can unwind > to here: > > + # Start using new stack > + stp x29, x30, [x0, -16]! > + mov sp, x0 > > Also we no longer need split_stack_arg_pointer_used_p () or any code that > uses it (functions > that don't have any arguments passed on the stack could omit the mov x11, sp). Right, we new strategy you proposed to do a branch this is indeed not really required. I remove it from on this new patch. > > Wilco > From dd2927aa5deb8d609c748014f3b566962fb852c5 Mon Sep 17 00:00:00 2001 From: Adhemerval Zanella <adhemerval.zane...@linaro.org> Date: Wed, 4 May 2016 21:13:39 + Subject: [PATCH 2/2] aarch64: Add split-stack initial support This patch adds the split-stack support on aarch64 (PR #67877). A
[PATCH v3 2/2] aarch64: Add split-stack initial support
From: Adhemerval Zanella <adhemerval.zane...@linaro.org> Changes from previous version: * Add missing new function comments; * Using gen_rtx_fmt_ee instead of gen_rtx_IF_THEN_ELSE to create conditional jumps; * Some typos and withspace issues. -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is similar to powerpc one: at most 2 GB of stack allocation is support so stack adjustments can be done with 2 instructions (either just a movn plus nop or a movn followed by movk). The morestack call is non standard with x10 hollding the requested stack pointer and x11 the required stack size to be copied. Also function arguments on the old stack are accessed based on a value relative to the stack pointer, so x10 is used to hold theold stack value. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): mrsx9, tpidr_el0 movx10, - nop/movk addx10, sp, x10 ldrx9, [x9, 16] cmpx10, x9 b.csenough stpx30, [sp, -16]movx11, movx11, bl __morestack ldpx30, [sp], 16 ret enough: mov x10, sp [...] b.csfunction mov x10, x28 function: Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >2G stack initially and although it is possible to implement it, limiting to 2G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 2G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.25. libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (split_stack_arg_pointer_used_p): New function. (aarch64_expand_prologue): Setup the argument pointer (x10) for split-stack. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_expand_split_stack_prologue): New function. (aarch64_file_end): Emit the split-stack note sections. (aarch64_internal_arg_pointer): Likewise. (aarch64_live_on_entry): Set the argument pointer for split-stack. (aarch64_split_stack_space_check): Likewise. (TARGET_ASM_FILE_END): New macro. (TARGET_EXTRA_LIVE_ON_ENTRY): Likewise. (TARGET_INTERNAL_ARG_POINTER): Likewise. * gcc/config/aarch64/aarch64.h (aarch64_frame): Add split_stack_arg_pointer to setup the argument pointer when using split-stack. * gcc/config/aarch64/aarch64.md (UNSPEC_STACK_CHECK): New unspec. (UNSPECV_SPLIT_STAC
[PATCH 1/2] Fix GCC split-stack variadic argument tests
This test adds the expect va_end after va_start on split-stack tests. gcc/ChangeLog: * gcc/testsuite/gcc.dg/split-3.c (down): Call va_end after va_start. * gcc/testsuite/gcc.dg/split-6.c (down): Likewise. --- gcc/ChangeLog | 5 + gcc/testsuite/gcc.dg/split-3.c | 1 + gcc/testsuite/gcc.dg/split-6.c | 1 + 3 files changed, 7 insertions(+) diff --git a/gcc/testsuite/gcc.dg/split-3.c b/gcc/testsuite/gcc.dg/split-3.c index 64bbb8c..5ba7616 100644 --- a/gcc/testsuite/gcc.dg/split-3.c +++ b/gcc/testsuite/gcc.dg/split-3.c @@ -40,6 +40,7 @@ down (int i, ...) || va_arg (ap, int) != 9 || va_arg (ap, int) != 10) abort (); + va_end (ap); if (i > 0) { diff --git a/gcc/testsuite/gcc.dg/split-6.c b/gcc/testsuite/gcc.dg/split-6.c index b32cf8d..b3016ba 100644 --- a/gcc/testsuite/gcc.dg/split-6.c +++ b/gcc/testsuite/gcc.dg/split-6.c @@ -37,6 +37,7 @@ down (int i, ...) || va_arg (ap, int) != 9 || va_arg (ap, int) != 10) abort (); + va_end (ap); if (i > 0) { -- 2.7.4
Re: [PATCH v2] aarch64: Add split-stack initial support
On 07/10/2016 05:28, Kyrill Tkachov wrote: > Hi Adhemerval, > > CC'ing the aarch64 maintainers/reviewers. > I have some comments inline, mostly about the GCC coding conventions. Thanks for the review. >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index df6514d..c689440 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -3149,6 +3149,49 @@ aarch64_restore_callee_saves (machine_mode mode, >> } >> } >> +static bool >> +split_stack_arg_pointer_used_p (void) >> +{ > > New function needs a function comment. > Ack, I will add one. >> + bool using_split_stack = (flag_split_stack >> +&& (lookup_attribute ("no_split_stack", >> + DECL_ATTRIBUTES (cfun->decl)) >> + == NULL)); >> + if (using_split_stack == false) >> +return false; >> + >> + /* If the pseudo holding the arg pointer is no longer a pseudo, >> + then the arg pointer is used. */ >> + if (cfun->machine->frame.split_stack_arg_pointer != NULL_RTX >> + && (!REG_P (cfun->machine->frame.split_stack_arg_pointer) >> + || (REGNO (cfun->machine->frame.split_stack_arg_pointer) >> + < FIRST_PSEUDO_REGISTER))) >> +return true; >> + >> + /* Unfortunately we also need to do some code scanning, since >> + x10 may have been substituted for the pseudo. */ >> + rtx_insn *insn; >> + basic_block bb = ENTRY_BLOCK_PTR_FOR_FN (cfun)->next_bb; >> + FOR_BB_INSNS (bb, insn) >> +if (NONDEBUG_INSN_P (insn)) >> + { >> +df_ref use; >> +FOR_EACH_INSN_USE (use, insn) >> + { >> +rtx x = DF_REF_REG (use); >> +if (REG_P (x) && REGNO (x) == R10_REGNUM) >> + return true; >> + } >> +df_ref def; >> +FOR_EACH_INSN_DEF (def, insn) >> + { >> +rtx x = DF_REF_REG (def); >> +if (REG_P (x) && REGNO (x) == R10_REGNUM) >> + return false; >> + } >> + } >> + return bitmap_bit_p (DF_LR_OUT (bb), R10_REGNUM); >> +} >> + >> /* AArch64 stack frames generated by this compiler look like: >> +---+ >> @@ -3204,6 +3247,7 @@ aarch64_expand_prologue (void) >> unsigned reg1 = cfun->machine->frame.wb_candidate1; >> unsigned reg2 = cfun->machine->frame.wb_candidate2; >> rtx_insn *insn; >> + bool split_stack_arg_pointer_used = split_stack_arg_pointer_used_p (); >> if (flag_stack_usage_info) >> current_function_static_stack_size = frame_size; >> @@ -3220,6 +3264,10 @@ aarch64_expand_prologue (void) >> aarch64_emit_probe_stack_range (STACK_CHECK_PROTECT, frame_size); >> } >> + /* Save split-stack argument pointer before stack adjustment. */ >> + if (split_stack_arg_pointer_used) >> +emit_move_insn (gen_rtx_REG (Pmode, R10_REGNUM), stack_pointer_rtx); >> + >> aarch64_add_constant (Pmode, SP_REGNUM, IP0_REGNUM, -initial_adjust, >> true); >> if (callee_adjust != 0) >> @@ -3243,6 +3291,30 @@ aarch64_expand_prologue (void) >>callee_adjust != 0 || frame_pointer_needed); >> aarch64_add_constant (Pmode, SP_REGNUM, IP1_REGNUM, -final_adjust, >> !frame_pointer_needed); >> + >> + if (split_stack_arg_pointer_used_p ()) >> +{ >> + /* Setup the argument pointer (x10) for -fsplit-stack code. If >> + __morestack was called, it will left the arg pointer to the >> + old stack in x28. Otherwise, the argument pointer is the top >> + of current frame. */ >> + rtx x10 = gen_rtx_REG (Pmode, R10_REGNUM); >> + rtx x28 = gen_rtx_REG (Pmode, R28_REGNUM); >> + rtx not_more = gen_label_rtx (); >> + rtx cc_reg = gen_rtx_REG (CCmode, CC_REGNUM); >> + rtx jump; >> + >> + jump = gen_rtx_IF_THEN_ELSE (VOIDmode, >> + gen_rtx_GEU (VOIDmode, cc_reg, >> +const0_rtx), >> + gen_rtx_LABEL_REF (VOIDmode, not_more), >> + pc_rtx); >> + jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump)); > > Are you trying to emit a conditional jump here? > If so it would be cleaner to use the pattern name you have in mind > directly like so (barring typos ;): > rtx cmp = gen_rtx_fmt_ee (GEU, VOIDmode, cc_reg, const0_rtx); > jump = emit_jump_insn (gen_condjump (cmp, cc_reg, not_more)); > This will avoid constructing the SET and IF_THEN_ELSE rtxes manually. Yes, the idea is to emit a conditional jump. I have no preference here, I used the IF_THEN_ELSE functions mainly because the other arches are using it, but I think your suggestion should be simpler. > >> + JUMP_LABEL (jump) = not_more; >> + LABEL_NUSES (not_more) += 1; >> + emit_move_insn (x10, x28); >> + emit_label (not_more); >> +} >> } >> /* Return TRUE if we can use a simple_return insn. >> @@ -9677,7 +9749,7 @@ aarch64_expand_builtin_va_start (tree valist, rtx
[PATCH v2] aarch64: Add split-stack initial support
From: Adhemerval Zanella <adhemerval.zane...@linaro.org> Changes from previous version: - Rewrite how to setup variadic argument: instead of using the hard_fp_offset value to setup the x10, save a fp value before stack allocation instead. This allows linker/gold to not require scan and change in case of split to non-split call (only the split prologue will require adjustments). -- This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported in adjusted (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is similar to powerpc one: at most 2 GB of stack allocation is support so stack adjustments can be done with 2 instructions (either just a movn plus nop or a movn followed by movk). The morestack call is non standard with x10 hollding the requested stack pointer and x11 the required stack size to be copied. Also function arguments on the old stack are accessed based on a value relative to the stack pointer, so x10 is used to hold theold stack value. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): mrsx9, tpidr_el0 movx10, - nop/movk addx10, sp, x10 ldrx9, [x9, 16] cmpx10, x9 b.csenough stpx30, [sp, -16] bl __morestack ldpx30, [sp], 16 ret enough: mov x10, sp [...] b.csfunction mov x10, x28 function: I checked with a bootstrap build on aarch64 and saw no regression. The go tests pass with the exception of cplx2.go and runtime/pprof/check (both unrelated to the patch afaiu). Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >2G stack initially and although it is possible to implement it, limiting to 2G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 2G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.25. libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (split_stack_arg_pointer_used_p): New function. (aarch64_expand_prologue): Setup the argument pointer (x10) for split-stack. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (morestack_ref): New symbol. (aarch64_expand_split_stack_prologue): New function. (aarch64_file_end): Emit the split-stack note sections. (aarch64_internal_arg_pointer): Likewise. (aarch64_live_on_entry): Set the argument pointer for split-stack. (aarch64_split_stack_space_check): Likewise. (TARGET_ASM_FILE_END): New macro. (TARGET_EXTRA_LIVE_
Re: [PATCH] aarch64: Add split-stack initial support
On 08/08/2016 07:58, Jiong Wang wrote: > > Adhemerval Zanella writes: > >>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >>> index e56398a..2cf239f 100644 >>> --- a/gcc/config/aarch64/aarch64.c >>> +++ b/gcc/config/aarch64/aarch64.c >>> @@ -3227,6 +3227,34 @@ aarch64_expand_prologue (void) >>> + emit_move_insn (x11, GEN_INT (hard_fp_offset)); >>> + emit_insn (gen_add3_insn (x10, x29, x11)); >>> + jump = gen_rtx_IF_THEN_ELSE (VOIDmode, >>> + gen_rtx_GEU (VOIDmode, cc_reg, >>> + const0_rtx), >>> + gen_rtx_LABEL_REF (VOIDmode, not_more), >>> + pc_rtx); >>> + jump = emit_jump_insn (gen_rtx_SET (pc_rtx, jump)); >>> + JUMP_LABEL (jump) = not_more; >>> + LABEL_NUSES (not_more) += 1; >>> + emit_move_insn (x10, x28); >>> + emit_label (not_more); >>> +} >>> } > > This part needs rebase, there are major changes in AArch64 prologue code > recently. > Right, I see that 'hard_fp_offset' is not defined locally anymore. >>> >>> /* Return TRUE if we can use a simple_return insn. >>> @@ -3303,6 +3331,7 @@ aarch64_expand_epilogue (bool for_sibcall) >>>offset = offset - fp_offset; >>> } >>> >>> + > > Unncessary new line. > Ack. >>> + >>> + /* Load __private_ss from TCB. */ >>> + ssvalue = gen_rtx_REG (Pmode, R9_REGNUM); >>> + emit_insn (gen_aarch64_load_tp_hard (ssvalue)); >>> + mem = gen_rtx_MEM (Pmode, plus_constant (Pmode, ssvalue, psso)); >>> + emit_move_insn (ssvalue, mem); >>> + >>> + temp = gen_rtx_REG (Pmode, R10_REGNUM); >>> + >>> + /* Always emit two insns to calculate the requested stack, so the linker >>> + can edit them when adjusting size for calling non-split-stack code. >>> */ >>> + ninsn = aarch64_internal_mov_immediate (temp, GEN_INT (-frame_size), >>> true, >>> + Pmode); >>> + gcc_assert (ninsn == 1 || ninsn == 2); >>> + if (ninsn == 1) >>> +emit_insn (gen_nop ()); > > there will be trouble to linker if the following add is scheduled before > the nop? I theory yes, although I haven't see gcc splitting it. Which would be the correct way to tie the nop generation to be emitted after the mov immediate? >>> + >>> + # Set up for a call to the target function. >>> + #ldpx29, x30, [x28, STACKFRAME_BASE] >>> + ldr x30, [x28, STACKFRAME_BASE + 8] >>> + ldp x0, x1, [x28, STACKFRAME_BASE + 16] >>> + ldp x2, x3, [x28, STACKFRAME_BASE + 32] >>> + ldp x4, x5, [x28, STACKFRAME_BASE + 48] >>> + ldp x6, x7, [x28, STACKFRAME_BASE + 64] >>> + add x9, x30, 8 >>> + cmp x30, x9 > > Can you explain why do we need this "cmp" before jumping to target > function? This is due the function prologue addition for var args handling: mov x11, sub sp, sp, add x10, x29, x11 b.csfunction: mov x10, x28 If __morestack is called it will use the the 'b.cs' to setup the correct var arg pointer. Below it the last iteration patch, however I now seeing some similar issue s390 hit when building libgo: ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: error: flow control insn inside a basic block (jump_insn 90 89 91 14 (set (pc) (if_then_else (geu (reg:CC 66 cc) (const_int 0 [0])) (label_ref 92) (pc))) ../../../gcc-git/libgo/go/syscall/socket_linux.go:90 -1 (nil) -> 92) ../../../gcc-git/libgo/go/syscall/socket_linux.go:90:1: internal compiler error: in rtl_verify_bb_insns, at cfgrtl.c:2658 0xac35af _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) It shows only with -O2, which I think it due how the block is reorganized internally and regarding the pseudo-return instruction inserted by split-stack. I am still debugging the issue and how to proper fix it, so if you have any advice I open to suggestions. diff --git a/gcc/common/config/aarch64/aarch64-common.c b/gcc/common/config/aarch64/aarch64-common.c index 08e7959..01c3239 100644 --- a/gcc/common/config/aarch64/aarch64-common.c +++ b/gcc/common/config/aarch64/aarch64-common.c @@ -106,6 +106,21 @@ aarch64_handle_option (struct gcc_options *opts, } } +/* -fsplit-stack uses a TCB field available on glibc-2.25. GLIBC also + exports symbol, __tcb_private_ss, to signal
Re: [PATCH] aarch64: Add split-stack initial support
Ping. On 28/07/2016 17:36, Adhemerval Zanella wrote: > From: Adhemerval Zanella <adhemerval.zane...@linaro.org> > > This patch adds the split-stack support on aarch64 (PR #67877). As for > other ports this patch should be used along with glibc and gold support. > > The support is done similar to other architectures: a __private_ss field is > added on TCB in glibc, a target-specific __morestack implementation and > helper functions are added in libgcc and compiler supported adjustments > (split-stack prologue, va_start for argument handling). I also plan to > send the gold support to adjust stack allocation acrosss split-stack > and default code calls. > > Current approach is similar to powerpc one: at most 2 GB of stack allocation > is support so stack adjustments can be done with 2 instructions (either just > a movn plus nop or a movn followed by movk). The morestack call is non > standard with x10 hollding the requested stack pointer and x11 the required > stack size to be copied. Also function arguments on the old stack are > accessed based on a value relative to the stack pointer, so x10 is used to > hold theold stack value. Unwinding is handled by a personality routine that > knows how to find stack segments. > > Split-stack prologue on function entry is as follow (this goes before the > usual function prologue): > > mrsx9, tpidr_el0 > movx10, - > nop/movk > addx10, sp, x10 > ldrx9, [x9, 16] > cmpx10, x9 > b.csenough > stpx30, [sp, -16]movx11, > movx11, > bl __morestack > ldpx30, [sp], 16 > ret > enough: > # usual function prologue, modified a little at the end to set up the > # arg_pointer in x10, starts here. The arg_pointer is initialized, > # if it is used, with > mov x11, > add x10, x29, x11 > b.csfunction > mov x10, x28 > function: > > Notes: > 1. Even if a function does not allocate a stack frame, a split-stack prologue > is created. It is to avoid issues with tail call for external symbols > which might require linker adjustment (libgo/runtime/go-varargs.c). > > 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr > to after the required stack calculation. > > 3. Similar to powerpc, When the linker detects a call from split-stack to > non-split-stack code, it adds 16k (or more) to the value found in > "allocate" > instructions (so non-split-stack code gets a larger stack). The amount is > tunable by a linker option. The edit means aarch64 does not need to > implement __morestack_non_split, necessary on x86 because insufficient > space is available there to edit the stack comparison code. This feature > is only implemented in the GNU gold linker. > > 4. AArch64 does not handle >2G stack initially and although it is possible > to implement it, limiting to 2G allows to materize the allocation with > only 2 instructions (mov + movk) and thus simplifying the linker > adjustments required. Supporting multiple threads each requiring more > than 2G of stack is probably not that important, and likely to OOM at > run time. > > 5. The TCB support on GLIBC is meant to be included in version 2.25 [1]. > > I tested bootstrapping on aarch64-linux-gnu and although still digesting > the results I saw no regression. All cgo tests are passing, although based > on previous reports in other archs gold support should be present to avoid > issues on split calling non-split code. > > libgcc/ChangeLog: > > * libgcc/config.host: Use t-stack and t-statck-aarch64 for > aarch64*-*-linux. > * libgcc/config/aarch64/morestack-c.c: New file. > * libgcc/config/aarch64/morestack.S: Likewise. > * libgcc/config/aarch64/t-stack-aarch64: Likewise. > * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific > code. > > gcc/ChangeLog: > > * common/config/aarch64/aarch64-common.c > (aarch64_supports_split_stack): New function. > (TARGET_SUPPORTS_SPLIT_STACK): New macro. > * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove > macro. > * gcc/config/aarch64/aarch64-protos.h: Add > aarch64_expand_split_stack_prologue and > aarch64_split_stack_space_check. > * gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the > argument pointer (x10) for split-stack. > (aarch64_expand_builtin_va_start): Use internal argument pointer > instead of virtual_incoming_args_rtx. > (aarch64_e
[PATCH] aarch64: Add split-stack initial support
From: Adhemerval Zanella <adhemerval.zane...@linaro.org> This patch adds the split-stack support on aarch64 (PR #67877). As for other ports this patch should be used along with glibc and gold support. The support is done similar to other architectures: a __private_ss field is added on TCB in glibc, a target-specific __morestack implementation and helper functions are added in libgcc and compiler supported adjustments (split-stack prologue, va_start for argument handling). I also plan to send the gold support to adjust stack allocation acrosss split-stack and default code calls. Current approach is similar to powerpc one: at most 2 GB of stack allocation is support so stack adjustments can be done with 2 instructions (either just a movn plus nop or a movn followed by movk). The morestack call is non standard with x10 hollding the requested stack pointer and x11 the required stack size to be copied. Also function arguments on the old stack are accessed based on a value relative to the stack pointer, so x10 is used to hold theold stack value. Unwinding is handled by a personality routine that knows how to find stack segments. Split-stack prologue on function entry is as follow (this goes before the usual function prologue): mrsx9, tpidr_el0 movx10, - nop/movk addx10, sp, x10 ldrx9, [x9, 16] cmpx10, x9 b.csenough stpx30, [sp, -16]movx11, movx11, bl __morestack ldpx30, [sp], 16 ret enough: # usual function prologue, modified a little at the end to set up the # arg_pointer in x10, starts here. The arg_pointer is initialized, # if it is used, with mov x11, add x10, x29, x11 b.csfunction mov x10, x28 function: Notes: 1. Even if a function does not allocate a stack frame, a split-stack prologue is created. It is to avoid issues with tail call for external symbols which might require linker adjustment (libgo/runtime/go-varargs.c). 2. Basic-block reordering (enabled with -O2) will move split-stack TCB ldr to after the required stack calculation. 3. Similar to powerpc, When the linker detects a call from split-stack to non-split-stack code, it adds 16k (or more) to the value found in "allocate" instructions (so non-split-stack code gets a larger stack). The amount is tunable by a linker option. The edit means aarch64 does not need to implement __morestack_non_split, necessary on x86 because insufficient space is available there to edit the stack comparison code. This feature is only implemented in the GNU gold linker. 4. AArch64 does not handle >2G stack initially and although it is possible to implement it, limiting to 2G allows to materize the allocation with only 2 instructions (mov + movk) and thus simplifying the linker adjustments required. Supporting multiple threads each requiring more than 2G of stack is probably not that important, and likely to OOM at run time. 5. The TCB support on GLIBC is meant to be included in version 2.25 [1]. I tested bootstrapping on aarch64-linux-gnu and although still digesting the results I saw no regression. All cgo tests are passing, although based on previous reports in other archs gold support should be present to avoid issues on split calling non-split code. libgcc/ChangeLog: * libgcc/config.host: Use t-stack and t-statck-aarch64 for aarch64*-*-linux. * libgcc/config/aarch64/morestack-c.c: New file. * libgcc/config/aarch64/morestack.S: Likewise. * libgcc/config/aarch64/t-stack-aarch64: Likewise. * libgcc/generic-morestack.c (__splitstack_find): Add aarch64-specific code. gcc/ChangeLog: * common/config/aarch64/aarch64-common.c (aarch64_supports_split_stack): New function. (TARGET_SUPPORTS_SPLIT_STACK): New macro. * gcc/config/aarch64/aarch64-linux.h (TARGET_ASM_FILE_END): Remove macro. * gcc/config/aarch64/aarch64-protos.h: Add aarch64_expand_split_stack_prologue and aarch64_split_stack_space_check. * gcc/config/aarch64/aarch64.c (aarch64_expand_prologue): Setup the argument pointer (x10) for split-stack. (aarch64_expand_builtin_va_start): Use internal argument pointer instead of virtual_incoming_args_rtx. (aarch64_expand_split_stack_prologue): New function. (aarch64_file_end): Emit the split-stack note sections. (aarch64_internal_arg_pointer): Likewise. (aarch64_live_on_entry): Set the argument pointer for split-stack. (aarch64_split_stack_space_check): Likewise. (TARGET_ASM_FILE_END): New macro. (TARGET_INTERNAL_ARG_POINTER): Likewise. * gcc/config/aarch64/aarch64.h (aarch64_frame): Add split_stack_arg_pointer to setup the argument
Split-stack support for aarch64
Hi all, Since BZ#67877 [1] does not have much information on it, I would like to ask for some inputs of which is the requirement of implementing split-stack on aarch64 besides 'feature parity'. I am asking it because on PR it states the main use it gccgo and afaik there is some usage in other programs, but the widespread is not really convincing (for instance, rust now allows targets to be built without it [2]). I also noted GO from 1.4 does not use split-stack anymore, stating it suffers from a performance issue ("hot stack split") and some search on the internet describes that for 64-bit targets split-stack is not really an efficient way to manage stack grows. So this is only for gccgo support? It gccgo stuck in pre 1.4 version and/or not willing remove split-stack usage (as go itself)? [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67877 [2] https://github.com/rust-lang/rust/issues/16980
Re: [PATCH 1/7] Libsanitizer merge from upstream r249633.
On 14-10-2015 04:54, Jakub Jelinek wrote: > On Tue, Oct 13, 2015 at 07:54:33PM +0300, Maxim Ostapenko wrote: >> On 13/10/15 14:15, Maxim Ostapenko wrote: >>> This is the raw merge itself. I'm bumping SONAME to libasan.so.3. >>> >>> -Maxim >> >> I have just noticed that I've misused autoconf stuff (used wrong version). >> Here a fixed version of the same patch. Sorry for inconvenience. > > Is libubsan, libtsan backwards compatible, or do we want to change SONAME > there too? > > The aarch64 changes are terrible, not just because it doesn't yet have > runtime decision on what VA to use or that it doesn't support 48-bit VA, > but also that for the 42-bit VA it uses a different shadow offset from > 39-bit VA. But on the compiler side we have just one... > Though, only the 39-bit VA is enabled right now by default, so out of the > box the state is as bad as we had in 5.x - users wanting 42-bit VA or 48-bit > VA have to patch libsanitizer. Yes we are aware with current deficiencies for aarch64 with a 39-bit and 42-bit vma and the lack of support of 48-bit vma. On LLVM side current approach is to built compiler support for either 39 or 42 bit (and again we are aware this is not the ideal approach). This approach was used mainly for validation and buildbot enablement. I am currently working on a way to make the sanitizer (asan and msan) VMA independent on aarch64 (aiming to current support 39/42 and later 48-bit vma) and the approach we decide to use is to change the instrumentation to use a parametrized value to compute the shadow memory regions (based on VMA address) which will be initialized externally by the ibsanitizer. TSAN is somewhat easier since instrumentation does not take in consideration the VMA (only the libsanitizer itself). The idea is to avoid compiler switches a make is transparent to run the binary regardless of the VMA in the system. The downside is instrumentation will require more steps (to lead the parametrized value to compute shadow memory) and thus slower. > > Have you verified libbacktrace sanitization still works properly (that is > something upstream does not test)? > > Do you plan to update the asan tests we have to reflect the changes in > upstream? > > Jakub >
[PATCH] rs6000: Fix HTM tcheck assembly encoding
gcc/ChangeLog: * config/rs6000/htm.md (tcheck): Fix assembly encoding. gcc/testsuite/ChangeLog * gcc.target/powerpc/htm-builtin-1.c: Fix tcheck expect value. --- diff --git a/gcc/config/rs6000/htm.md b/gcc/config/rs6000/htm.md index 2c4689f..79fb740 100644 --- a/gcc/config/rs6000/htm.md +++ b/gcc/config/rs6000/htm.md @@ -252,7 +252,7 @@ (unspec_volatile:CC [(match_operand 0 u3bit_cint_operand n)] UNSPECV_HTM_TCHECK))] TARGET_HTM - tcheck. %0 + tcheck %0 [(set_attr type htm) (set_attr length 4)]) diff --git a/gcc/testsuite/gcc.target/powerpc/htm-builtin-1.c b/gcc/testsuite/gcc.target/powerpc/htm-builtin-1.c index e58816a..62d64e6 100644 --- a/gcc/testsuite/gcc.target/powerpc/htm-builtin-1.c +++ b/gcc/testsuite/gcc.target/powerpc/htm-builtin-1.c @@ -10,7 +10,7 @@ /* { dg-final { scan-assembler-times tabortdci\\. 1 } } */ /* { dg-final { scan-assembler-times tabortwc\\. 1 } } */ /* { dg-final { scan-assembler-times tabortwci\\. 2 } } */ -/* { dg-final { scan-assembler-times tcheck\\. 1 } } */ +/* { dg-final { scan-assembler-times tcheck 1 } } */ /* { dg-final { scan-assembler-times trechkpt\\. 1 } } */ /* { dg-final { scan-assembler-times treclaim\\. 1 } } */ /* { dg-final { scan-assembler-times tsr\\. 3 } } */
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
On 03-09-2014 11:01, Maciej W. Rozycki wrote: On Tue, 2 Sep 2014, Adhemerval Zanella wrote: Ping. On 19-08-2014 13:54, Adhemerval Zanella wrote: Ping. On 06-08-2014 17:21, Adhemerval Zanella wrote: On 01-08-2014 12:31, Joseph S. Myers wrote: On Thu, 31 Jul 2014, David Edelsohn wrote: Thanks for implementing the FENV support. The patch generally looks good to me. My one concern is a detail in the implementation of update. I do not have enough experience with GENERIC to verify the details and it seems like it is missing building an outer COMPOUND_EXPR containing update_mffs and the CALL_EXPR for update mtfsf. I suppose what's actually odd there is that you have + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); so you build a MODIFY_EXPR in void_type_node but then convert it with a VIEW_CONVERT_EXPR. If you'd built the MODIFY_EXPR in double_type_node then the VIEW_CONVERT_EXPR would be meaningful (the value of an assignment a = b being the new value of a), but reinterpreting a void value doesn't make sense. Or you could probably just use call_mffs directly in the VIEW_CONVERT_EXPR without explicitly creating the old_fenv variable. Thanks for the review Josephm. I have changed to avoid the void reinterpretation and use call_mffs directly. I have also removed the the mask generation in 'clear' from your previous message, it is now reusing the mas used in feholdexcept. The testcase patch is the same as before. Checked on both linux-powerpc64/powerpc64le and no regressions found. -- 2014-08-06 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. FWIW I pushed it through regression testing across my usual set of powerpc-linux-gnu multilibs with the results (for c11-atomic-exec-5.c) as follows: -mcpu=603ePASS -mcpu=603e -msoft-float UNSUPPORTED -mcpu=8540 -mfloat-gprs=single -mspe=yes -mabi=speUNSUPPORTED -mcpu=8548 -mfloat-gprs=double -mspe=yes -mabi=speUNSUPPORTED -mcpu=7400 -maltivec -mabi=altivecPASS -mcpu=e6500 -maltivec -mabi=altivec PASS -mcpu=e5500 -m64 PASS -mcpu=e6500 -m64 -maltivec -mabi=altivec PASS Thanks for testing it, I'll to add these permutations on my own testbench. (floating-point environment is of course unsupported for soft-float targets and for the SPE FPU another change is required to implement floating-point environment handling to complement one proposed here). No regressions otherwise. While at it, may I propose another change on top of this? I've noticed the test case is rather slow, it certainly takes much more time than the average one, I've seen elapsed times of well over a minute on reasonably fast hardware and occasionally a timeout midway through even though the test case was otherwise progressing just fine. I think lock contention or unrelated system activity such as hardware interrupts (think a busy network!) may contribute to it for systems using LL/SC loops for atomicity. So I think the default timeout that's used for really quick tests should be extended a bit. I propose a factor of 2, just not to make it too excessive, at least for the beginning (maybe it'll have to be higher eventually). Do you mind if I incorporate this change on my patchset? OK? 2014-09-03 Maciej W. Rozycki ma...@codesourcery.com gcc/testsuite/ * gcc.dg/atomic/c11-atomic-exec-5.c (dg-timeout-factor): New setting. Maciej gcc-test-c11-atomic-exec-5-timeout-factor.diff Index: gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c === --- gcc-fsf-trunk-quilt.orig/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-09-02 17:34:06.718927043 +0100 +++ gcc-fsf-trunk-quilt/gcc/testsuite/gcc.dg/atomic/c11-atomic-exec-5.c 2014-09-03 14:51:12.958927233 +0100 @@ -9,6 +9,7 @@ /* { dg-additional-options -D_XOPEN_SOURCE=600 { target *-*-solaris2.1[0-9]* } } */ /* { dg-require-effective-target fenv_exceptions } */ /* { dg-require-effective-target pthread } */ +/* { dg-timeout-factor 2 } */ #include fenv.h #include float.h
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
Ping. On 19-08-2014 13:54, Adhemerval Zanella wrote: Ping. On 06-08-2014 17:21, Adhemerval Zanella wrote: On 01-08-2014 12:31, Joseph S. Myers wrote: On Thu, 31 Jul 2014, David Edelsohn wrote: Thanks for implementing the FENV support. The patch generally looks good to me. My one concern is a detail in the implementation of update. I do not have enough experience with GENERIC to verify the details and it seems like it is missing building an outer COMPOUND_EXPR containing update_mffs and the CALL_EXPR for update mtfsf. I suppose what's actually odd there is that you have + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); so you build a MODIFY_EXPR in void_type_node but then convert it with a VIEW_CONVERT_EXPR. If you'd built the MODIFY_EXPR in double_type_node then the VIEW_CONVERT_EXPR would be meaningful (the value of an assignment a = b being the new value of a), but reinterpreting a void value doesn't make sense. Or you could probably just use call_mffs directly in the VIEW_CONVERT_EXPR without explicitly creating the old_fenv variable. Thanks for the review Josephm. I have changed to avoid the void reinterpretation and use call_mffs directly. I have also removed the the mask generation in 'clear' from your previous message, it is now reusing the mas used in feholdexcept. The testcase patch is the same as before. Checked on both linux-powerpc64/powerpc64le and no regressions found. -- 2014-08-06 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d088ff6..7d66eb1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1631,6 +1631,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -7,6 +33340,80 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_UC (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Reload the value of fenv_hold to clear the exceptions. */ + + *clear = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, call_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
Ping. On 06-08-2014 17:21, Adhemerval Zanella wrote: On 01-08-2014 12:31, Joseph S. Myers wrote: On Thu, 31 Jul 2014, David Edelsohn wrote: Thanks for implementing the FENV support. The patch generally looks good to me. My one concern is a detail in the implementation of update. I do not have enough experience with GENERIC to verify the details and it seems like it is missing building an outer COMPOUND_EXPR containing update_mffs and the CALL_EXPR for update mtfsf. I suppose what's actually odd there is that you have + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); so you build a MODIFY_EXPR in void_type_node but then convert it with a VIEW_CONVERT_EXPR. If you'd built the MODIFY_EXPR in double_type_node then the VIEW_CONVERT_EXPR would be meaningful (the value of an assignment a = b being the new value of a), but reinterpreting a void value doesn't make sense. Or you could probably just use call_mffs directly in the VIEW_CONVERT_EXPR without explicitly creating the old_fenv variable. Thanks for the review Josephm. I have changed to avoid the void reinterpretation and use call_mffs directly. I have also removed the the mask generation in 'clear' from your previous message, it is now reusing the mas used in feholdexcept. The testcase patch is the same as before. Checked on both linux-powerpc64/powerpc64le and no regressions found. -- 2014-08-06 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d088ff6..7d66eb1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1631,6 +1631,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -7,6 +33340,80 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_UC (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Reload the value of fenv_hold to clear the exceptions. */ + + *clear = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, call_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu, build_int_cst (uint64_type_node, update_exception_mask
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
On 01-08-2014 12:31, Joseph S. Myers wrote: On Thu, 31 Jul 2014, David Edelsohn wrote: Thanks for implementing the FENV support. The patch generally looks good to me. My one concern is a detail in the implementation of update. I do not have enough experience with GENERIC to verify the details and it seems like it is missing building an outer COMPOUND_EXPR containing update_mffs and the CALL_EXPR for update mtfsf. I suppose what's actually odd there is that you have + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); so you build a MODIFY_EXPR in void_type_node but then convert it with a VIEW_CONVERT_EXPR. If you'd built the MODIFY_EXPR in double_type_node then the VIEW_CONVERT_EXPR would be meaningful (the value of an assignment a = b being the new value of a), but reinterpreting a void value doesn't make sense. Or you could probably just use call_mffs directly in the VIEW_CONVERT_EXPR without explicitly creating the old_fenv variable. Thanks for the review Josephm. I have changed to avoid the void reinterpretation and use call_mffs directly. I have also removed the the mask generation in 'clear' from your previous message, it is now reusing the mas used in feholdexcept. The testcase patch is the same as before. Checked on both linux-powerpc64/powerpc64le and no regressions found. -- 2014-08-06 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index d088ff6..7d66eb1 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1631,6 +1631,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -7,6 +33340,80 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_UC (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Reload the value of fenv_hold to clear the exceptions. */ + + *clear = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, call_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu, build_int_cst (uint64_type_node, update_exception_mask)); + + tree new_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, new_exception_mask
[COMMITTED] Add myself to MAINTAINERS file
Hi, This patch adds myself to the MAINTAINERS file. Commmitted as 212652. -- 2014-07-16 Adhemerval Zanella azane...@linux.vnet.ibm.com mailto:azanella%40linux.vnet.ibm.com * MAINTAINERS (Write After Approval): Add myself. --- --- trunk/MAINTAINERS 2014/07/16 14:21:34 212651 +++ trunk/MAINTAINERS 2014/07/16 14:23:03 212652 @@ -583,6 +583,7 @@ David Yustedavid.yu...@gmail.com Kirill Yukhin kirill.yuk...@gmail.com Kenneth Zadeck zad...@naturalbridge.com +Adhemerval Zanella azane...@linux.vnet.ibm.com Yufeng Zhang yufeng.zh...@arm.com Shujing Zhao pearly.z...@oracle.com Jon Zieglerj...@apple.com
Re: [PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
Ping. On 03-07-2014 18:08, Adhemerval Zanella wrote: This patch implements the TARGET_ATOMIC_ASSIGN_EXPAND_FENV for powerpc-fpu. I have to adjust current c11-atomic-exec-5 testcase because for IBM long double 0 += LDBL_MAX might generate overflow/underflow in internal __gcc_qadd calculations. The c11-atomic-exec-5 now passes for linux/powerpc, checked on powerpc32-linux-fpu, powerpc64-linux, and powerpc64le-linux. -- 2014-07-03 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index bf67e72..75a2a45 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1621,6 +1621,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -32991,6 +32994,105 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_C (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Generates the equivalent of feclearexcept (FE_ALL_EXCEPT): + + double fenv_clear = __builtin_mffs (); + *(uint64_t)fenv_clear = 0xLL; + __builtin_mtfsf (0xff, fenv_clear); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT clear_exception_mask = +HOST_WIDE_INT_UC (0x); + + tree fenv_clear = create_tmp_var (double_type_node, NULL); + + tree clear_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_clear, call_mffs); + + tree fenv_clean_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_clear_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +fenv_clean_llu, build_int_cst (uint64_type_node, clear_exception_mask)); + + tree fenv_clear_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, +fenv_clear_llu_and); + + tree clear_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_clear_mtfsf); + + *clear = build2 (COMPOUND_EXPR, void_type_node, clear_mffs, clear_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_fenv = create_tmp_var (double_type_node, NULL); + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu, build_int_cst (uint64_type_node
[PATCH] PowerPC: Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV
This patch implements the TARGET_ATOMIC_ASSIGN_EXPAND_FENV for powerpc-fpu. I have to adjust current c11-atomic-exec-5 testcase because for IBM long double 0 += LDBL_MAX might generate overflow/underflow in internal __gcc_qadd calculations. The c11-atomic-exec-5 now passes for linux/powerpc, checked on powerpc32-linux-fpu, powerpc64-linux, and powerpc64le-linux. -- 2014-07-03 Adhemerval Zanella azane...@linux.vnet.ibm.com gcc: * config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): New function. gcc/testsuite: * gcc.dg/atomic/c11-atomic-exec-5.c (test_main_long_double_add_overflow): Define and run only for LDBL_MANT_DIG != 106. (test_main_complex_long_double_add_overflow): Likewise. (test_main_long_double_sub_overflow): Likewise. (test_main_complex_long_double_sub_overflow): Likewise. --- diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c index bf67e72..75a2a45 100644 --- a/gcc/config/rs6000/rs6000.c +++ b/gcc/config/rs6000/rs6000.c @@ -1621,6 +1621,9 @@ static const struct attribute_spec rs6000_attribute_table[] = #undef TARGET_CAN_USE_DOLOOP_P #define TARGET_CAN_USE_DOLOOP_P can_use_doloop_if_innermost + +#undef TARGET_ATOMIC_ASSIGN_EXPAND_FENV +#define TARGET_ATOMIC_ASSIGN_EXPAND_FENV rs6000_atomic_assign_expand_fenv /* Processor table. */ @@ -32991,6 +32994,105 @@ emit_fusion_gpr_load (rtx *operands) return ; } +/* Implement TARGET_ATOMIC_ASSIGN_EXPAND_FENV hook. */ + +static void +rs6000_atomic_assign_expand_fenv (tree *hold, tree *clear, tree *update) +{ + if (!TARGET_HARD_FLOAT || !TARGET_FPRS) +return; + + tree mffs = rs6000_builtin_decls[RS6000_BUILTIN_MFFS]; + tree mtfsf = rs6000_builtin_decls[RS6000_BUILTIN_MTFSF]; + tree call_mffs = build_call_expr (mffs, 0); + + /* Generates the equivalent of feholdexcept (fenv_var) + + *fenv_var = __builtin_mffs (); + double fenv_hold; + *(uint64_t*)fenv_hold = *(uint64_t*)fenv_var 0x0007LL; + __builtin_mtfsf (0xff, fenv_hold); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT hold_exception_mask = +HOST_WIDE_INT_C (0x0007); + + tree fenv_var = create_tmp_var (double_type_node, NULL); + + tree hold_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_var, call_mffs); + + tree fenv_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, hold_exception_mask)); + + tree fenv_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, fenv_llu_and); + + tree hold_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_mtfsf); + + *hold = build2 (COMPOUND_EXPR, void_type_node, hold_mffs, hold_mtfsf); + + /* Generates the equivalent of feclearexcept (FE_ALL_EXCEPT): + + double fenv_clear = __builtin_mffs (); + *(uint64_t)fenv_clear = 0xLL; + __builtin_mtfsf (0xff, fenv_clear); */ + + /* Mask to clear everything except for the rounding modes and non-IEEE + arithmetic flag. */ + const unsigned HOST_WIDE_INT clear_exception_mask = +HOST_WIDE_INT_UC (0x); + + tree fenv_clear = create_tmp_var (double_type_node, NULL); + + tree clear_mffs = build2 (MODIFY_EXPR, void_type_node, fenv_clear, call_mffs); + + tree fenv_clean_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, fenv_var); + tree fenv_clear_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +fenv_clean_llu, build_int_cst (uint64_type_node, clear_exception_mask)); + + tree fenv_clear_mtfsf = build1 (VIEW_CONVERT_EXPR, double_type_node, +fenv_clear_llu_and); + + tree clear_mtfsf = build_call_expr (mtfsf, 2, +build_int_cst (unsigned_type_node, 0xff), fenv_clear_mtfsf); + + *clear = build2 (COMPOUND_EXPR, void_type_node, clear_mffs, clear_mtfsf); + + /* Generates the equivalent of feupdateenv (fenv_var) + + double old_fenv = __builtin_mffs (); + double fenv_update; + *(uint64_t*)fenv_update = (*(uint64_t*)old 0x1f00LL) | +(*(uint64_t*)fenv_var 0x1ff80fff); + __builtin_mtfsf (0xff, fenv_update); */ + + const unsigned HOST_WIDE_INT update_exception_mask = +HOST_WIDE_INT_UC (0x1f00); + const unsigned HOST_WIDE_INT new_exception_mask = +HOST_WIDE_INT_UC (0x1ff80fff); + + tree old_fenv = create_tmp_var (double_type_node, NULL); + tree update_mffs = build2 (MODIFY_EXPR, void_type_node, old_fenv, call_mffs); + + tree old_llu = build1 (VIEW_CONVERT_EXPR, uint64_type_node, update_mffs); + tree old_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, +old_llu, build_int_cst (uint64_type_node, update_exception_mask)); + + tree new_llu_and = build2 (BIT_AND_EXPR, uint64_type_node, fenv_llu, +build_int_cst (uint64_type_node, new_exception_mask)); + + tree
Re: [PATCH, PR 57363] IBM long double: adding qNaN and number raises inexact exception
Hi, This is a reformatted patch after Ulrich review: --- libgcc/ChangeLog: 2013-11-20 Adhemerval Zanella azane...@linux.vnet.ibm.com * config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and qNaN to not raise an inexact exception. gcc/testsuite/ChangeLog: 2013-11-20 Adhemerval Zanella azane...@linux.vnet.ibm.com * gcc.target/powerpc/pr57363.c: New test. -- diff --git a/gcc/testsuite/gcc.target/powerpc/pr57363.c b/gcc/testsuite/gcc.target/powerpc/pr57363.c new file mode 100644 index 000..45ea3f3 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr57363.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-options -mlong-double-128 } */ + +/* Check if adding a qNAN and a normal long double does not generate a + inexact exception. */ + +#define _GNU_SOURCE +#include fenv.h + +int main(void) +{ + double x = __builtin_nan (); + long double y = 1.1L; + + feenableexcept (FE_INEXACT); + feclearexcept (FE_ALL_EXCEPT); + x = x + y; + return fetestexcept(FE_INEXACT); +} diff --git a/libgcc/config/rs6000/ibm-ldouble.c b/libgcc/config/rs6000/ibm-ldouble.c index 28e02e9..7ca900c 100644 --- a/libgcc/config/rs6000/ibm-ldouble.c +++ b/libgcc/config/rs6000/ibm-ldouble.c @@ -104,6 +104,8 @@ __gcc_qadd (double a, double aa, double c, double cc) if (nonfinite (z)) { + if (fabs (z) != inf()) + return z; z = cc + aa + c + a; if (nonfinite (z)) return z;
Re: [PATCH, PR 57363] IBM long double: adding qNaN and number raises inexact exception
On 20-11-2013 14:23, David Edelsohn wrote: On Wed, Nov 20, 2013 at 9:11 AM, Adhemerval Zanella azane...@linux.vnet.ibm.com wrote: libgcc/ChangeLog: 2013-11-20 Adhemerval Zanella azane...@linux.vnet.ibm.com * config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and qNaN to not raise an inexact exception. gcc/testsuite/ChangeLog: 2013-11-20 Adhemerval Zanella azane...@linux.vnet.ibm.com * gcc.target/powerpc/pr57363.c: New test. This is okay. And the patch is small enough that it does not need a copyright assignment for GCC. Do you need someone to commit the patch for you? Thanks, David Yes I do need, thanks.
Re: [PATCH, PR 57363] IBM long double: adding qNaN and number raises inexact exception
On 15-11-2013 19:54, Ulrich Weigand wrote: Should this be qNaN instead of sNaN here? Yes, indeed. Also, since you already have a test case, I think it would be good to add it to the GCC test suite ... Otherwise, this looks reasonable to me (but I cannot approve the patch): Bye, Ulrich Here it is with updated ChangeLog. Tested on PPC32 and PPC64. The testcase fails at runtime without the fix with a FLoating-Point Exception. --- 2013-11-15 Adhemerval Zanella azane...@linux.vnet.ibm.com * libgcc/config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and qNaN to not raise an inexact exception. * gcc/testsuite/gcc.target/powerpc/pr57363.c: New file: check for PR#57363. -- diff --git a/gcc/testsuite/gcc.target/powerpc/pr57363.c b/gcc/testsuite/gcc.target/powerpc/pr57363.c new file mode 100644 index 000..cc9b1d7 --- /dev/null +++ b/gcc/testsuite/gcc.target/powerpc/pr57363.c @@ -0,0 +1,19 @@ +/* { dg-do run { target { powerpc*-*-linux* } } } */ +/* { dg-options -mlong-double-128 } */ + +/* Check if adding a sNAN and a normal long double does not generate a + inexact exception. */ + +#define _GNU_SOURCE +#include fenv.h + +int main(void) +{ + double x = __builtin_nan (); + long double y = 1.1L; + + feenableexcept (FE_INEXACT); + feclearexcept (FE_ALL_EXCEPT); + x = x + y; + return fetestexcept(FE_INEXACT); +} diff --git a/libgcc/config/rs6000/ibm-ldouble.c b/libgcc/config/rs6000/ibm-ldouble.c index 28e02e9..7ca900c 100644 --- a/libgcc/config/rs6000/ibm-ldouble.c +++ b/libgcc/config/rs6000/ibm-ldouble.c @@ -104,6 +104,8 @@ __gcc_qadd (double a, double aa, double c, double cc) if (nonfinite (z)) { + if (fabs (z) != inf()) + return z; z = cc + aa + c + a; if (nonfinite (z)) return z;
Re: [PATCH, PR 57363] IBM long double: adding qNaN and number raises inexact exception
On 18-11-2013 18:10, Ulrich Weigand wrote: Adhemerval Zanella wrote: 2013-11-15 Adhemerval Zanella azane...@linux.vnet.ibm.com * libgcc/config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and qNaN to not raise an inexact exception. * gcc/testsuite/gcc.target/powerpc/pr57363.c: New file: check for PR#57363. This should go into two separate ChangeLog files, one in libgcc/ChangeLog: * config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and qNaN to not raise an inexact exception. and one in gcc/testsuite/ChangeLog: * gcc.target/powerpc/pr57363.c: New test. Ok, thanks for spot it. +/* Check if adding a sNAN and a normal long double does not generate a + inexact exception. */ qNaN again :-) Yeah... :( Thanks, Ulrich
[PATCH, PR 57363] IBM long double: adding qNaN and number raises inexact exception
Hi all, For IBM long double, adding a normal number to a qNaN raises an inexact exception. Adding any number to qNaN should not raise any exception. The following testcase triggers the issue (the testcase is meant to run a gnu compatible libc): -- $ cat gcc_testcase.c #include math.h #include fenv.h #include stdio.h double sum (double x, long double y) { return x + y; } int main () { feenableexcept (FE_INEXACT); double x = __builtin_nan (); long double y = 1.1L; printf (%e\n, sum (x, y)); return 0; } $ gcc -O3 -m64 -fno-inline gcc_testcase.c -o gcc_testcase -lm $ ./gcc_testcase Floating point exception (core dumped) -- The issue is in __gcc_qadd implementation at libgcc/config/rs6000/ibm-ldouble.c, if the number if non finite, there is not check if it a NaN before actually summing all the components. A possible solution would be to add an extra test and return the first sum if the number if not infinity. This also fixes two GLIBC math issues I'm seeing: TEST_ff_f (nexttoward, qnan_value, 1.1L, qnan_value, NO_INEXACT_EXCEPTION), TEST_ff_f (nexttoward, 1.1L, qnan_value, qnan_value, NO_INEXACT_EXCEPTION), Fix tested on PPC32 and PPC64. --- 2013-11-15 Adhemerval Zanella azane...@linux.vnet.ibm.com * libgcc/config/rs6000/ibm-ldouble.c (__gcc_qadd): Fix add of normal number and sNaN to not raise an inexact exception. --- diff --git a/libgcc/config/rs6000/ibm-ldouble.c b/libgcc/config/rs6000/ibm-ldouble.c index 28e02e9..7ca900c 100644 --- a/libgcc/config/rs6000/ibm-ldouble.c +++ b/libgcc/config/rs6000/ibm-ldouble.c @@ -104,6 +104,8 @@ __gcc_qadd (double a, double aa, double c, double cc) if (nonfinite (z)) { + if (fabs (z) != inf()) + return z; z = cc + aa + c + a; if (nonfinite (z)) return z;