Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t

2022-03-09 Thread Rich Felker
On Wed, Mar 09, 2022 at 08:26:11AM +0100, Sören Tempel wrote:
> Ian Lance Taylor  wrote:
> > Have you tested this in 32-bit mode?  It does not look correct based
> > on the glibc definitions.  Looking at glibc it seems that it ought to
> > be
> 
> As stated in the commit message, I have only tested this on Alpine Linux
> ppc64le (which uses musl libc). Unfortunately, I don't have access to a
> 32-bit PowerPC machine and hence haven't performed any tests with it.
> 
> > reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];
> 
> While this should work with glibc, it doesn't work with musl. In order
> to support both (musl and glibc) on 32-bit PowerPC, we would have to do
> something along the lines of:
> 
>   #ifdef __PPC__
>   #if defined(__PPC64__)   /* ppc64 glibc & musl */
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32]
>   #elif defined(__GLIBC__) /* ppc32 glibc */
>   reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];
>   #else/* ppc32 musl */
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
>   #endif /* __PPC64__ */
>   #endif /* __PPC__ */
> 
> In light of these observations, maybe using asm/ptrace.h and .regs (as
> proposed in the v1 patch) is the "better" (i.e. more readable) solution
> for now? I agree with Rich that using .regs is certainly a "code smell",
> but this gigantic ifdef block also looks pretty smelly to me. That being
> said, I can also send a v4 which uses this ifdef block.

I still prefer the #ifdef chain here, because it's at least using the
intended API. The problem is just that we can't have a matching API
because the only API glibc offers on ppc32 contradicts the POSIX
requirements.

I'm also not understanding how the .regs approach in the v1 patch was
ever supposed to work with musl anyway. The relevant line was:

ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;

and musl has no uc_mcontext.regs member because uc_mcontext is
mcontext_t, not the glibc ppc32 pointer union thing. The only .regs in
musl's ppc32 signal.h/ucontext.h is in struct sigcontext, not anywhere
in ucontext_t.

Rich


Re: [PATCH v3] libgo: Don't use pt_regs member in mcontext_t

2022-03-08 Thread Rich Felker
On Mon, Mar 07, 2022 at 02:59:02PM -0800, Ian Lance Taylor wrote:
> On Sun, Mar 6, 2022 at 11:11 PM  wrote:
> >
> > +#ifdef __PPC64__
> > +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
> > +#else
> > +   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
> > +#endif
> 
> Have you tested this in 32-bit mode?  It does not look correct based
> on the glibc definitions.  Looking at glibc it seems that it ought to
> be
> 
> reg.sigpc = ((ucontext_t*)(context))->uc_mcontext.uc_regs->gregs[32];

Indeed, I think it has to use that conditional on __GLIBC__. I was
thinking the union glibc has was an anon union, but no, it's named
uc_mcontext despite not having type mcontext_t.

Ideally glibc could fix this by doing:

union {
union __ctx(uc_regs_ptr) {
struct __ctx(pt_regs) *__ctx(regs);
mcontext_t *__ctx(uc_regs);
} uc_mcontext;
mcontext_t *__ctx(uc_regs);
};

so that there would be a common API for accessing it that doesn't
conflict with the properties the standard mandates.

Rich


Re: [PATCH v2] libgo: Don't use pt_regs member in mcontext_t

2022-03-06 Thread Rich Felker
On Sun, Mar 06, 2022 at 07:59:24PM +0100, soe...@soeren-tempel.net wrote:
> From: Sören Tempel 
> 
> The .regs member is primarily intended to be used in conjunction with
> ptrace. Since this code is not using ptrace, using .regs is a bad idea.
> Furthermore, the code currently fails to compile on musl since the
> pt_regs type (used by .regs) is in an incomplete type which has to be
> completed by inclusion of the asm/ptrace.h Kernel header. Contrary to
> glibc, this header is not indirectly included by musl through other
> header files.
> 
> This patch fixes compilation of this code with musl libc by accessing
> the register values via .gp_regs/.gregs (depending on 32-bit or 64-bit
> PowerPC) instead of using .regs. For more details, see
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591261.html
> 
> For the offsets in gp_regs refer to the Kernel asm/ptrace.h header.
> 
> This patch has been tested on Alpine Linux ppc64le (uses musl libc).
> 
> Signed-off-by: Sören Tempel 
> 
> ChangeLog:
> 
>   * libgo/runtime/go-signal.c (defined): Use .gp_regs/.gregs
> to access ppc64/ppc32 registers.
>   (dumpregs): Ditto.
> ---
> Changes since v1: Use .gp_regs/.gregs instead of .regs based on
> feedback by Rich Felker, thereby avoiding the need to include
> asm/ptrace.h for struct pt_regs.
> 
>  libgo/runtime/go-signal.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
> index d30d1603adc..647ad606019 100644
> --- a/libgo/runtime/go-signal.c
> +++ b/libgo/runtime/go-signal.c
> @@ -224,7 +224,11 @@ getSiginfo(siginfo_t *info, void *context 
> __attribute__((unused)))
>  #elif defined(__alpha__) && defined(__linux__)
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.sc_pc;
>  #elif defined(__PPC__) && defined(__linux__)
> - ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.regs->nip;
> +#ifdef __PPC64__
> + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gp_regs[32];
> +#else
> + ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.gregs[32];
> +#endif
>  #elif defined(__PPC__) && defined(_AIX)
>   ret.sigpc = ((ucontext_t*)(context))->uc_mcontext.jmp_context.iar;
>  #elif defined(__aarch64__) && defined(__linux__)
> @@ -338,16 +342,21 @@ dumpregs(siginfo_t *info __attribute__((unused)), void 
> *context __attribute__((u
>  #elif defined(__PPC__) && defined(__LITTLE_ENDIAN__) && defined(__linux__)
> {
>   mcontext_t *m = &((ucontext_t*)(context))->uc_mcontext;
> +#ifdef __PPC64__
> + greg_t *gp = &m->gp_regs;
> +#else
> + greg_t *gp = &m->gregs;
> +#endif

Have you tried compiling this? It looks like it's a constraint
violation because gregset_t has array type and & will produce a
pointer to the array type rather than a pointer to the first element.
You should drop the & to get the pointer to the first element.

Rich


Re: [gofrontend-dev] Re: [PATCH] libgo: include asm/ptrace.h for pt_regs definition on PowerPC

2022-03-06 Thread Rich Felker
On Sun, Mar 06, 2022 at 10:22:56AM -0500, Rich Felker wrote:
> On Mon, Feb 21, 2022 at 09:25:43AM -0800, Ian Lance Taylor via Gcc-patches 
> wrote:
> > Note for gofrontend-dev: on gcc-patches only Andreas Schwab suggested
> > using uc_regs instead of regs, which does look correct to me.
> 
> Yes, this is absolutely the correct fix. Having pt_regs appear at all
> in code not using ptrace is a serious code smell.
> 
> The root of this problem is twofold: (1) ancient Linux (2.0.x?) had a
> bad definition of powerpc32 ucontext_t that lacked any mcontext_t,
> instead having a regs member pointing to the storage for the register
> state (as pt_regs). This was ostensibly done for extensibility
> reasons, but was non-POSIX-conforming and broken, and was later fixed.
> 
> And (2) glibc's definition of ucontext_t is also non-conforming,
> making the uc_mcontext member have type anon-union rather than type
> mcontext_t.
> 
> musl does not follow this but puts the uc_mcontext member in the place
> later kernel ABI assigned to it after the kernel mistake was fixed.
> 
> Ideally you would access uc_mcontext.gregs[32] (32==NIP) and be done
> with it, but this won't work on glibc because of (2). However musl
> also supports the old uc_regs pointer (it's in the reserved namespace
> anyway so not a conformance error), making it so uc_regs->gregs[32]
> works on either.

I mistakenly thought this was ppc32 because I wasn't remembering that
a lesser mess was still present on ppc64. The above applies to ppc32.
On ppc64 it would be uc_mcontext.gp_regs[32].

I'm not sure if the code is intended to also work on ppc32, but even
if that's not supported now, when fixing this it should probably
condition the use of gregs/gp_regs name on __WORDSIZE or whatever so
that, if anyone ever does try to add ppc32 support, they don't get
bogged down in this again and get it wrong again...

Rich



> > On Mon, Feb 21, 2022 at 8:47 AM Sören Tempel  
> > wrote:
> > >
> > > Ping.
> > >
> > > Summary: Fix build of libgo on PPC with musl libc and libucontext by
> > > explicitly including the Linux header defining `struct pt_regs` instead of
> > > relying on other libc headers to include it implicitly.
> > >
> > > See: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587520.html
> > >
> > > If the patch needs to be revised further please let me know. This patch 
> > > has
> > > been applied at Alpine Linux downstream (which uses musl libc) for a 
> > > while, I
> > > have not tested it on other systems.
> > >
> > > Greetings,
> > > Sören
> > >
> > > Sören Tempel  wrote:
> > > > Both glibc and musl libc declare pt_regs as an incomplete type. This
> > > > type has to be completed by inclusion of another header. On Linux, the
> > > > asm/ptrace.h header file provides this type definition. Without
> > > > including this header file, it is not possible to access the regs member
> > > > of the mcontext_t struct as done in libgo/runtime/go-signal.c. On glibc,
> > > > other headers (e.g. sys/user.h) include asm/ptrace.h but on musl
> > > > asm/ptrace.h is not included by other headers and thus the
> > > > aforementioned files do not compile without an explicit include of
> > > > asm/ptrace.h:
> > > >
> > > >   libgo/runtime/go-signal.c: In function 'getSiginfo':
> > > >   libgo/runtime/go-signal.c:227:63: error: invalid use of undefined 
> > > > type 'struct pt_regs'
> > > > 227 | ret.sigpc = 
> > > > ((ucontext_t*)(context))->uc_mcontext.regs->nip;
> > > > |
> > > >
> > > > See also:
> > > >
> > > > * 
> > > > https://git.musl-libc.org/cgit/musl/commit/?id=c2518a8efb6507f1b41c3b12e03b06f8f2317a1f
> > > > * https://github.com/kaniini/libucontext/issues/36
> > > >
> > > > Signed-off-by: Sören Tempel 
> > > >
> > > > ChangeLog:
> > > >
> > > >   * libgo/runtime/go-signal.c: Include asm/ptrace.h for the
> > > > definition of pt_regs (used by mcontext_t) on PowerPC.
> > > > ---
> > > >  libgo/runtime/go-signal.c | 6 ++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
> > > > index d30d1603adc..fc01e04e4a1 100644
> > > > --- a/libgo/runtime/go-signal.c
> > > > +++ b/lib

Re: [gofrontend-dev] Re: [PATCH] libgo: include asm/ptrace.h for pt_regs definition on PowerPC

2022-03-06 Thread Rich Felker
On Mon, Feb 21, 2022 at 09:25:43AM -0800, Ian Lance Taylor via Gcc-patches 
wrote:
> Note for gofrontend-dev: on gcc-patches only Andreas Schwab suggested
> using uc_regs instead of regs, which does look correct to me.

Yes, this is absolutely the correct fix. Having pt_regs appear at all
in code not using ptrace is a serious code smell.

The root of this problem is twofold: (1) ancient Linux (2.0.x?) had a
bad definition of powerpc32 ucontext_t that lacked any mcontext_t,
instead having a regs member pointing to the storage for the register
state (as pt_regs). This was ostensibly done for extensibility
reasons, but was non-POSIX-conforming and broken, and was later fixed.

And (2) glibc's definition of ucontext_t is also non-conforming,
making the uc_mcontext member have type anon-union rather than type
mcontext_t.

musl does not follow this but puts the uc_mcontext member in the place
later kernel ABI assigned to it after the kernel mistake was fixed.

Ideally you would access uc_mcontext.gregs[32] (32==NIP) and be done
with it, but this won't work on glibc because of (2). However musl
also supports the old uc_regs pointer (it's in the reserved namespace
anyway so not a conformance error), making it so uc_regs->gregs[32]
works on either.

Rich



> On Mon, Feb 21, 2022 at 8:47 AM Sören Tempel  wrote:
> >
> > Ping.
> >
> > Summary: Fix build of libgo on PPC with musl libc and libucontext by
> > explicitly including the Linux header defining `struct pt_regs` instead of
> > relying on other libc headers to include it implicitly.
> >
> > See: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587520.html
> >
> > If the patch needs to be revised further please let me know. This patch has
> > been applied at Alpine Linux downstream (which uses musl libc) for a while, 
> > I
> > have not tested it on other systems.
> >
> > Greetings,
> > Sören
> >
> > Sören Tempel  wrote:
> > > Both glibc and musl libc declare pt_regs as an incomplete type. This
> > > type has to be completed by inclusion of another header. On Linux, the
> > > asm/ptrace.h header file provides this type definition. Without
> > > including this header file, it is not possible to access the regs member
> > > of the mcontext_t struct as done in libgo/runtime/go-signal.c. On glibc,
> > > other headers (e.g. sys/user.h) include asm/ptrace.h but on musl
> > > asm/ptrace.h is not included by other headers and thus the
> > > aforementioned files do not compile without an explicit include of
> > > asm/ptrace.h:
> > >
> > >   libgo/runtime/go-signal.c: In function 'getSiginfo':
> > >   libgo/runtime/go-signal.c:227:63: error: invalid use of undefined 
> > > type 'struct pt_regs'
> > > 227 | ret.sigpc = 
> > > ((ucontext_t*)(context))->uc_mcontext.regs->nip;
> > > |
> > >
> > > See also:
> > >
> > > * 
> > > https://git.musl-libc.org/cgit/musl/commit/?id=c2518a8efb6507f1b41c3b12e03b06f8f2317a1f
> > > * https://github.com/kaniini/libucontext/issues/36
> > >
> > > Signed-off-by: Sören Tempel 
> > >
> > > ChangeLog:
> > >
> > >   * libgo/runtime/go-signal.c: Include asm/ptrace.h for the
> > > definition of pt_regs (used by mcontext_t) on PowerPC.
> > > ---
> > >  libgo/runtime/go-signal.c | 6 ++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/libgo/runtime/go-signal.c b/libgo/runtime/go-signal.c
> > > index d30d1603adc..fc01e04e4a1 100644
> > > --- a/libgo/runtime/go-signal.c
> > > +++ b/libgo/runtime/go-signal.c
> > > @@ -10,6 +10,12 @@
> > >  #include 
> > >  #include 
> > >
> > > +// On PowerPC, ucontext.h uses a pt_regs struct as an incomplete
> > > +// type. This type must be completed by including asm/ptrace.h.
> > > +#ifdef __PPC__
> > > +#include 
> > > +#endif
> > > +
> > >  #include "runtime.h"
> > >
> > >  #ifndef SA_RESTART
> >
> > --
> > You received this message because you are subscribed to the Google Groups 
> > "gofrontend-dev" group.
> > To unsubscribe from this group and stop receiving emails from it, send an 
> > email to gofrontend-dev+unsubscr...@googlegroups.com.
> > To view this discussion on the web visit 
> > https://groups.google.com/d/msgid/gofrontend-dev/2FICMX0ORZ6O1.3DYXRTDHNGXCN%408pit.net.


Re: [musl] Re: [PATCH v2] configure: define TARGET_LIBC_GNUSTACK on musl

2021-11-16 Thread Rich Felker
On Tue, Nov 16, 2021 at 03:40:00PM +0100, Dragan Mladjenovic wrote:
> Hi,
> 
> Looks fine to me. If possible, maybe it should even be back-ported
> to stable branches.
> 
> Not sure if MIPS assembly sources (if any) in musl would need
> explicit ..note.GNU-stack
> 
> to complement this?

What are the actual consequences of making this change, and what is
the goal? I'm concerned that it might produce object files which don't
include annotation that they don't need executable stack, in which
case the final executable file will be marked as executable-stack and
the kernel will load it as such. That would be very bad.

Rich


> On 16-Nov-21 06:13, Ilya Lipnitskiy wrote:
> >musl only uses PT_GNU_STACK to set default thread stack size and has no
> >executable stack support[0], so there is no reason not to emit the
> >.note.GNU-stack section on musl builds.
> >
> >[0]: 
> >https://lore.kernel.org/all/20190423192534.gn23...@brightrain.aerifal.cx/T/#u
> >
> >gcc/ChangeLog:
> >
> > * configure: Regenerate.
> > * configure.ac: define TARGET_LIBC_GNUSTACK on musl
> >
> >Signed-off-by: Ilya Lipnitskiy 
> >---
> >  gcc/configure| 3 +++
> >  gcc/configure.ac | 3 +++
> >  2 files changed, 6 insertions(+)
> >
> >diff --git a/gcc/configure b/gcc/configure
> >index 74b9d9be4c85..7091a838aefa 100755
> >--- a/gcc/configure
> >+++ b/gcc/configure
> >@@ -31275,6 +31275,9 @@ fi
> >  # Check if the target LIBC handles PT_GNU_STACK.
> >  gcc_cv_libc_gnustack=unknown
> >  case "$target" in
> >+  mips*-*-linux-musl*)
> >+gcc_cv_libc_gnustack=yes
> >+;;
> >mips*-*-linux*)
> >  if test $glibc_version_major -gt 2 \
> >diff --git a/gcc/configure.ac b/gcc/configure.ac
> >index c9ee1fb8919e..8a2d34179a75 100644
> >--- a/gcc/configure.ac
> >+++ b/gcc/configure.ac
> >@@ -6961,6 +6961,9 @@ fi
> >  # Check if the target LIBC handles PT_GNU_STACK.
> >  gcc_cv_libc_gnustack=unknown
> >  case "$target" in
> >+  mips*-*-linux-musl*)
> >+gcc_cv_libc_gnustack=yes
> >+;;
> >mips*-*-linux*)
> >  GCC_GLIBC_VERSION_GTE_IFELSE([2], [31], [gcc_cv_libc_gnustack=yes], )
> >  ;;


Re: [PATCH] musl: Don't use gthr weak refs in libgcc PR91737

2019-11-17 Thread Rich Felker
On Sun, Nov 17, 2019 at 11:31:02AM -0700, Jeff Law wrote:
> On 11/15/19 3:00 AM, Szabolcs Nagy wrote:
> > The gthr weak reference based single thread detection is unsafe with
> > static linking and in case of dynamic linking it's ineffective on musl
> > since pthread symbols are defined in libc.so.
> > 
> > Ideally this should be fixed for all targets, since glibc plans to move
> > libpthread.so into libc.so too and users want to static link to pthread
> > without --whole-archive: PR87189.
> > 
> > For now we have to explicitly opt out from the broken behaviour in the
> > config machinery of each target lib and libgcc was previously missed.
> > 
> > libgcc/ChangeLog:
> > 
> > 2019-11-15  Szabolcs Nagy  
> > 
> > * config.host: Add t-gthr-noweak on *-*-musl*.
> > * config/t-gthr-noweak: New file.
> > 
> Given the patch is constrained to musl, it's obviously OK.
> 
> WRT the bigger question, even if glibc gets those bits moved into
> libc.so it's likely going to be a long time before the split between
> libc and libpthreads disappears from the wild :(

The right thing for GCC to do on the glibc side is just having the
affected target libs depend on libpthread until the symbols are moved.
With the current invalid use of weak refs, static linking of
multithreaded programs is completely broken on glibc, and distros are
resorting to hacks of using ld -r or similar to move all of
libpthread.a into a monolithic object file to work around it.

In any case, I'll be happy to have it fixed just for musl now, so we
can drop these patches on our side and have upstream GCC working
pretty much out of the box.

Rich


Re: [PATCH] musl: use correct long double abi by default

2019-11-15 Thread Rich Felker
On Fri, Nov 15, 2019 at 01:22:20PM -0600, Segher Boessenkool wrote:
> On Fri, Nov 15, 2019 at 06:58:24PM +, Szabolcs Nagy wrote:
> > 2019-11-15  Szabolcs Nagy  
> > 
> > * configure.ac (gcc_cv_target_ldbl128): Set for *-musl* targets.
> 
> That is not what the patch does.  It sets it to yes for s390*-linux-musl*,
> it sets it to no for powerpc*-*-linux-musl*, and it doesn't do anything for
> other *-musl* configurations.
> 
> The powerpc part is okay for trunk, if this is what musl wants to do.
> (musl has no OS port maintainer listed in MAINTAINERS, maybe that should
> be fixed?)

This patch is in line with the ABIs presently supported by musl for
powerpc[64] and s390x: ld is ieee-double for powerpc*, ieee-quad for
s390x.

Rich


Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-07-16 Thread Rich Felker
On Tue, Jul 16, 2019 at 11:34:06AM +0100, Richard Sandiford wrote:
> > diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
> > index 66ec0ea..d7cc923 100644
> > --- a/gcc/config/arm/linux-eabi.h
> > +++ b/gcc/config/arm/linux-eabi.h
> > @@ -89,7 +89,7 @@
> >  #define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
> >  #endif
> >  #define MUSL_DYNAMIC_LINKER \
> > -  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
> > +  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E 
> > "%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"
> >  
> >  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
> > use the GNU/Linux version, not the generic BPABI version.  */
> 
> Rich, could you confirm that this is (going to be?) the correct name?

Yes, as I understand the above logic, which should yield:

/lib/ld-musl-arm-fdpic.so.1
/lib/ld-musl-armhf-fdpic.so.1
/lib/ld-musl-armeb-fdpic.so.1
/lib/ld-musl-armebhf-fdpic.so.1

Rich


Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-05-21 Thread Rich Felker
On Tue, May 21, 2019 at 05:28:51PM +0200, Christophe Lyon wrote:
> On Wed, 15 May 2019 at 18:06, Rich Felker  wrote:
> >
> > On Wed, May 15, 2019 at 03:59:39PM +, Szabolcs Nagy wrote:
> > > On 15/05/2019 16:37, Rich Felker wrote:
> > > > On Wed, May 15, 2019 at 05:12:11PM +0200, Christophe Lyon wrote:
> > > >> On Wed, 15 May 2019 at 16:37, Rich Felker  wrote:
> > > >>> On Wed, May 15, 2019 at 01:55:30PM +, Szabolcs Nagy wrote:
> > > >>>> can support both normal elf and fdpic elf so you can test/use
> > > >>>> an fdpic toolchain on a system with mmu, but this requires
> > > >>>> different dynamic linker name ..otherwise one has to run
> > > >>>> executables in a chroot or separate mount namespace to change
> > > >>>> the dynamic linker)
> > > >>>
> > > >>> Indeed, it's a bad idea to make them clash.
> > > >>>
> > > >>
> > > >> Not sure to understand your point: indeed FDPIC binaries work
> > > >> on a system with mmu, provided you have the right dynamic
> > > >> linker in the right place, as well as the needed runtime libs (libc, 
> > > >> etc)
> > > >>
> > > >> Do you want me to change anything here?
> > > >
> > > > I think the concern is that if the PT_INTERP name is the same for
> > > > binaries with different ABIs, you wouldn't be able to have both
> > > > present in the same root fs, and this would make it more of a pain to
> > > > debug fdpic binaries on a full (with-mmu) host.
> > > >
> > > > musl always uses a different PT_INTERP name for each ABI combination,
> > > > so I guess the question is whether uclibc or whatever other libc
> > > > you're intending people to use would also want to do this.
> > >
> > > glibc uses different names now for new abis, so i was expecting
> > > some *_DYNAMIC_LINKER update, but it seems uclibc always uses
> > > the same fixed name
> > >
> > > /lib/ld-uClibc.so.0
> > >
> > > i guess it makes sense for them since iirc uclibc can change
> > > its runtime abi based on lot of build time config so having
> > > different name for each abi variant may be impractical.
> >
> > Yes, this "feature" of uclibc was was of the key motivations behind
> > the creation of musl... :-)
> >
> 
> Hi,
> 
> I discussed a bit further with Szabolcs on irc, and tried to get some
> feedback from uclibc-ng community (none so far)
> 
> I propose the following 2 patches on top of this one to address part
> of the concerns:
> diff --git a/gcc/config/arm/linux-eabi.h b/gcc/config/arm/linux-eabi.h
> index 67edb42..d7cc923 100644
> --- a/gcc/config/arm/linux-eabi.h
> +++ b/gcc/config/arm/linux-eabi.h
> @@ -89,7 +89,7 @@
>  #define MUSL_DYNAMIC_LINKER_E "%{mbig-endian:eb}"
>  #endif
>  #define MUSL_DYNAMIC_LINKER \
> -  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E "%{mfloat-abi=hard:hf}.so.1"
> +  "/lib/ld-musl-arm" MUSL_DYNAMIC_LINKER_E
> "%{mfloat-abi=hard:hf}%{mfdpic:-fdpic}.so.1"
> 
>  /* At this point, bpabi.h will have clobbered LINK_SPEC.  We want to
> use the GNU/Linux version, not the generic BPABI version.  */
> 
> diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
> index c38b3f4..92bca69 100644
> --- a/libsanitizer/configure.tgt
> +++ b/libsanitizer/configure.tgt
> @@ -45,7 +45,7 @@ case "${target}" in
> ;;
>sparc*-*-solaris2.11*)
> ;;
> -  arm*-*-uclinuxfdpiceabi)
> +  arm*-*-fdpiceabi)
> UNSUPPORTED=1
> ;;
>arm*-*-linux*)
> 
> However, regarding -staic/-static-pie, it seems I have several options:
> (a) add support for static-pie to uclibc-ng. This means creating a new
> rcrt1.o or similar, which would embed parts of the dynamic linker into
> static-pie executables. This seems to involve quite a bit of work
> 
> (b) add support for FDPIC on arm to musl, which I'm not familiar with
> 
> (c) declare -static not supported on arm-FDPIC
> 
> (d) gather consensus that -static with pt_interp is ok (my preference,
> since that's what the current patches do :-)

musl definitely does not support static with pt_interp, and won't. If
it works, it's by chance, and not a good idea to try relying on it. If
you want to follow this path in upstream for now that's fine but it
means musl users will need to apply patches. This is already done
anyway, so it's not a *new* burden, but it's still annoying.

> At this point, I'd prefer to stick with (d), or (c), to avoid further delaying
> inclusion of FDPIC support for arm in GCC, and address improvements
> later, so that it's not a constantly moving target.

I'd find (c) mildly better as long as it's still easy for us to patch.
Providing a -static that's not actually static is not useful and will
be harder to fix later.

Rich


Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-05-15 Thread Rich Felker
On Wed, May 15, 2019 at 03:59:39PM +, Szabolcs Nagy wrote:
> On 15/05/2019 16:37, Rich Felker wrote:
> > On Wed, May 15, 2019 at 05:12:11PM +0200, Christophe Lyon wrote:
> >> On Wed, 15 May 2019 at 16:37, Rich Felker  wrote:
> >>> On Wed, May 15, 2019 at 01:55:30PM +, Szabolcs Nagy wrote:
> >>>> can support both normal elf and fdpic elf so you can test/use
> >>>> an fdpic toolchain on a system with mmu, but this requires
> >>>> different dynamic linker name ..otherwise one has to run
> >>>> executables in a chroot or separate mount namespace to change
> >>>> the dynamic linker)
> >>>
> >>> Indeed, it's a bad idea to make them clash.
> >>>
> >>
> >> Not sure to understand your point: indeed FDPIC binaries work
> >> on a system with mmu, provided you have the right dynamic
> >> linker in the right place, as well as the needed runtime libs (libc, 
> >> etc)
> >>
> >> Do you want me to change anything here?
> > 
> > I think the concern is that if the PT_INTERP name is the same for
> > binaries with different ABIs, you wouldn't be able to have both
> > present in the same root fs, and this would make it more of a pain to
> > debug fdpic binaries on a full (with-mmu) host.
> > 
> > musl always uses a different PT_INTERP name for each ABI combination,
> > so I guess the question is whether uclibc or whatever other libc
> > you're intending people to use would also want to do this.
> 
> glibc uses different names now for new abis, so i was expecting
> some *_DYNAMIC_LINKER update, but it seems uclibc always uses
> the same fixed name
> 
> /lib/ld-uClibc.so.0
> 
> i guess it makes sense for them since iirc uclibc can change
> its runtime abi based on lot of build time config so having
> different name for each abi variant may be impractical.

Yes, this "feature" of uclibc was was of the key motivations behind
the creation of musl... :-)

Rich


Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-05-15 Thread Rich Felker
On Wed, May 15, 2019 at 05:12:11PM +0200, Christophe Lyon wrote:
> On Wed, 15 May 2019 at 16:37, Rich Felker  wrote:
> >
> > On Wed, May 15, 2019 at 01:55:30PM +, Szabolcs Nagy wrote:
> > > On 15/05/2019 13:39, Christophe Lyon wrote:
> > > > In FDPIC mode, we set -fPIE unless the user provides -fno-PIE, -fpie,
> > > > -fPIC or -fpic: indeed FDPIC code is PIC, but we want to generate code
> > > > for executables rather than shared libraries by default.
> > > >
> > > > We also make sure to use the --fdpic assembler option, and select the
> > > > appropriate linker emulation.
> > > >
> > > > At link time, we also default to -pie, unless we are generating a
> > > > shared library or a relocatable file (-r). Note that even for static
> > > > link, we must specify the dynamic linker because the executable still
> > > > has to relocate itself at startup.
> > > >
> > > > We also force 'now' binding since lazy binding is not supported.
> > > >
> > > > We should also apply the same behavior for -Wl,-Ur as for -r, but I
> > > > couldn't find how to describe that in the specs fragment.
> > > ...
> > > > +/* Unless we generate a shared library or a relocatable object, we
> > > > +   force -pie.  */
> > > > +/* Even with -static, we have to define the dynamic-linker, as we
> > > > +   have some relocations to resolve at load time.  */
> > > > +#undef  SUBTARGET_EXTRA_LINK_SPEC
> > > > +#define SUBTARGET_EXTRA_LINK_SPEC  \
> > > > +  "%{!mno-fdpic: -m " TARGET_FDPIC_LINKER_EMULATION\
> > > > +   "%{!shared:%{!r: -pie}} \
> > > > +%{static:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}" \
> > > > +  "%{mno-fdpic: -m " TARGET_LINKER_EMULATION "}"   \
> > > > +  "%{!r:%{!mno-fdpic: -z now}}"
> > >
> > > i think -dynamic-linker can be avoided for -static using
> > > -static-pie linking with rcrt0.o
> >
> > Yes, -dynamic-linker should never be used with -static.
> 
> So, do you mean dropping completely the line:
> +%{static:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}" \
> and thus make -static unsupported, forcing users to use -static-pie instead?

Rather I would have -static behave the same as -static-pie. The intent
on musl when we added static pie (before glibc had it) was that
-static plus -pie (including default-pie, if built as default-pie)
would yield static pie output, and our patches still do this. When
static-pie was upstreamed in gcc, it was done differently for
compatibility with legacy versions of glibc. That's not a
consideration for fdpic.

> > > but more importantly: does the abi spec require the sysv dynamic
> > > linker name? that sounds suboptimal (in principle the same os
> >
> > ABI specs typically do this and we just ignore it. BFD contains
> > default dynamic linker strings for all sorts of ABIs, and they're all
> > wrong -- things like /lib/ld64.so.1, etc. I don't think it's worth
> > bothering with fighting the desire of folks writing ABI specs to do
> > this again and again. GCC overrides them all with the actually-correct
> > values when !static.
> >
> > > can support both normal elf and fdpic elf so you can test/use
> > > an fdpic toolchain on a system with mmu, but this requires
> > > different dynamic linker name ..otherwise one has to run
> > > executables in a chroot or separate mount namespace to change
> > > the dynamic linker)
> >
> > Indeed, it's a bad idea to make them clash.
> >
> 
> Not sure to understand your point: indeed FDPIC binaries work
> on a system with mmu, provided you have the right dynamic
> linker in the right place, as well as the needed runtime libs (libc, etc)
> 
> Do you want me to change anything here?

I think the concern is that if the PT_INTERP name is the same for
binaries with different ABIs, you wouldn't be able to have both
present in the same root fs, and this would make it more of a pain to
debug fdpic binaries on a full (with-mmu) host.

musl always uses a different PT_INTERP name for each ABI combination,
so I guess the question is whether uclibc or whatever other libc
you're intending people to use would also want to do this.

> > > > +#undef STARTFILE_SPEC
> > > > +#define STARTFILE_SPEC "%{!mno-fdpic:%{!shared:crtreloc.o%s}} " \
> > > > +  LINUX_OR_ANDROID_

Re: [ARM/FDPIC v5 03/21] [ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided

2019-05-15 Thread Rich Felker
On Wed, May 15, 2019 at 01:55:30PM +, Szabolcs Nagy wrote:
> On 15/05/2019 13:39, Christophe Lyon wrote:
> > In FDPIC mode, we set -fPIE unless the user provides -fno-PIE, -fpie,
> > -fPIC or -fpic: indeed FDPIC code is PIC, but we want to generate code
> > for executables rather than shared libraries by default.
> > 
> > We also make sure to use the --fdpic assembler option, and select the
> > appropriate linker emulation.
> > 
> > At link time, we also default to -pie, unless we are generating a
> > shared library or a relocatable file (-r). Note that even for static
> > link, we must specify the dynamic linker because the executable still
> > has to relocate itself at startup.
> > 
> > We also force 'now' binding since lazy binding is not supported.
> > 
> > We should also apply the same behavior for -Wl,-Ur as for -r, but I
> > couldn't find how to describe that in the specs fragment.
> ...
> > +/* Unless we generate a shared library or a relocatable object, we
> > +   force -pie.  */
> > +/* Even with -static, we have to define the dynamic-linker, as we
> > +   have some relocations to resolve at load time.  */
> > +#undef  SUBTARGET_EXTRA_LINK_SPEC
> > +#define SUBTARGET_EXTRA_LINK_SPEC  \
> > +  "%{!mno-fdpic: -m " TARGET_FDPIC_LINKER_EMULATION\
> > +   "%{!shared:%{!r: -pie}} \
> > +%{static:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}" \
> > +  "%{mno-fdpic: -m " TARGET_LINKER_EMULATION "}"   \
> > +  "%{!r:%{!mno-fdpic: -z now}}"
> 
> i think -dynamic-linker can be avoided for -static using
> -static-pie linking with rcrt0.o

Yes, -dynamic-linker should never be used with -static.

> but more importantly: does the abi spec require the sysv dynamic
> linker name? that sounds suboptimal (in principle the same os

ABI specs typically do this and we just ignore it. BFD contains
default dynamic linker strings for all sorts of ABIs, and they're all
wrong -- things like /lib/ld64.so.1, etc. I don't think it's worth
bothering with fighting the desire of folks writing ABI specs to do
this again and again. GCC overrides them all with the actually-correct
values when !static.

> can support both normal elf and fdpic elf so you can test/use
> an fdpic toolchain on a system with mmu, but this requires
> different dynamic linker name ..otherwise one has to run
> executables in a chroot or separate mount namespace to change
> the dynamic linker)

Indeed, it's a bad idea to make them clash.

> > +
> > +#undef STARTFILE_SPEC
> > +#define STARTFILE_SPEC "%{!mno-fdpic:%{!shared:crtreloc.o%s}} " \
> > +  LINUX_OR_ANDROID_LD (GNU_USER_TARGET_STARTFILE_SPEC, 
> > ANDROID_STARTFILE_SPEC)
> > diff --git a/libsanitizer/configure.tgt b/libsanitizer/configure.tgt
> > index b241ddb..c38b3f4 100644
> > --- a/libsanitizer/configure.tgt
> > +++ b/libsanitizer/configure.tgt
> > @@ -45,6 +45,9 @@ case "${target}" in
> > ;;
> >sparc*-*-solaris2.11*)
> > ;;
> > +  arm*-*-uclinuxfdpiceabi)
> > +   UNSUPPORTED=1
> > +   ;;
> 
> musl libc has fdpic support on sh (e.g. with sh2eb-linux-muslfdpic
> target and --enable-fdpic), it can probably support fdpic on arm
> too with minimal changes, i assume the target name for that would
> be arm-linux-muslfdpiceabi.

I plan to add ARM FDPIC support as soon as there is (1) published ABI
for relocation types, entry point contract, etc., and (2) there's
tooling to support it that's either upstream or can be applied as
clean patches to recent gcc (as opposed to some fork of gcc4 or
whatever it was this got started as). I think those conditions are
mostly met now.

> so i think it is better to check arm*-*fdpiceabi where the libc
> does not matter (so we dont have to patch the same files when
> *-muslfdpiceabi support is added).

Yes, that would be appreciated. Maybe we could get musl ldso names
added at the same time, too? I think they should just be the same as
the existing musl ldso names but with "-fdpic" appended before
".so.1".

Rich


Re: [PATCH v3 3/3] or1k: gcc: initial support for openrisc

2018-11-05 Thread Rich Felker
On Mon, Nov 05, 2018 at 11:13:53AM +, Szabolcs Nagy wrote:
> On 04/11/18 09:05, Stafford Horne wrote:
> > On Mon, Oct 29, 2018 at 02:28:11PM +, Szabolcs Nagy wrote:
> >> On 27/10/18 05:37, Stafford Horne wrote:
> ...
> >>> +#undef LINK_SPEC
> >>> +#define LINK_SPEC "%{h*} \
> >>> +   %{static:-Bstatic}\
> >>> +   %{shared:-shared} \
> >>> +   %{symbolic:-Bsymbolic}\
> >>> +   %{!static:\
> >>> + %{rdynamic:-export-dynamic} \
> >>> + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
> >>> +
> >>> +#endif /* GCC_OR1K_LINUX_H */
> >>
> >> note that because of the -static-pie mess each
> >> target needs a more complicated LINK_SPEC now.
> > 
> > Hello,
> > 
> > Does something like this look better?
> > 
> > --- a/gcc/config/or1k/linux.h
> > +++ b/gcc/config/or1k/linux.h
> > @@ -37,8 +37,9 @@
> > %{static:-Bstatic}  \
> > %{shared:-shared}   \
> > %{symbolic:-Bsymbolic}  \
> > -   %{!static:  \
> > +   %{!static:%{!static-pie:\
> >   %{rdynamic:-export-dynamic}   \
> > - %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}"
> > + %{!shared:-dynamic-linker " GNU_USER_DYNAMIC_LINKER "}}} \
> > +   %{static-pie:-Bstatic -pie --no-dynamic-linker -z text}"
> >  
> >  #endif /* GCC_OR1K_LINUX_H */
> 
> looks ok.
> 
> > I have tested this out with or1k-linux-musl, but I get some LD complaints 
> > i.e.
> > 
> > .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> > relocation against symbol __fini_array_end
> > .../or1k-linux-musl/bin/ld: .../or1k-linux-musl/lib/libc.a(exit.o): non-pic 
> > relocation against symbol __fini_array_start
> > 
> > Those are some warnings we recently added to LD, perhaps I need to rebuild 
> > the
> > libc.a with PIE as well.  I will try it out, but if anyone has some 
> > suggestions
> > that would be helpful.
> 
> yes, musl does not build libc.a with pic by default,
> either use a gcc configured with --enable-default-pie
> or CC='gcc -fPIC' when building musl.
> 
> after that -static-pie linking should work.
> 
> (maybe musl should have an --enable-static-pie config
> option to make this simpler)

For practical purposes, if you want to use static pie, you need a
default-pie toolchain. This is because _every_ static lib you might
link needs to be built with -fPIE (or -fPIC), and ensuring that
happens on a package-by-package basis is largely impractical; at least
it's on the same order of magnitude of difficulty as other systems
integration/packaging tasks.

However from the musl side it might make sense to produce a libc_pic.a
as part of the build process. This would make it easy to replace
libc.a with libc_pic.a if desired, and could also be used as the basis
for linking libc.so and to allow production of a stripped-down libc.so
that only includes symbols a fixed set of binaries depend on. We could
discuss something like this on the musl list.

Rich


Re: RFC: [PATCH] x86: Add -mzero-caller-saved-regs=[skip|used|all]

2018-09-27 Thread Rich Felker
On Wed, Sep 26, 2018 at 11:10:29AM -0700, H.J. Lu wrote:
> Add -mzero-caller-saved-regs=[skip|used|all] command-line option and
> zero_caller_saved_regs("skip|used|all") function attribue:

Minor nit, but could this be named -mzero-call-clobbered-regs?
"Caller-saved" is a misnomer and inconsistent with other gcc usage.
For example -fcall-used-[reg] documents it as "an allocable register
that is clobbered by function calls". This is a piece of terminology
I've been trying to get fixed in educational materials so it would be
nice to avoid a new instance of it in gcc.

Rich


Re: [PATCH] libgcc: m68k: avoid TEXTRELs in shared library (PR 86224)

2018-07-28 Thread Rich Felker
On Sat, Jul 28, 2018 at 08:47:33PM +0200, Andreas Schwab wrote:
> On Jul 28 2018, sly...@inbox.ru wrote:
> 
> > From: Sergei Trofimovich 
> >
> > Cc: Ian Lance Taylor 
> > Cc: Jeff Law 
> > Cc: Andreas Schwab 
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  libgcc/config/m68k/lb1sf68.S | 19 ++-
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/libgcc/config/m68k/lb1sf68.S b/libgcc/config/m68k/lb1sf68.S
> > index 325a7c17d9b..16c6dc3f5a7 100644
> > --- a/libgcc/config/m68k/lb1sf68.S
> > +++ b/libgcc/config/m68k/lb1sf68.S
> > @@ -435,7 +435,10 @@ $_exception_handler:
> > .text
> > FUNC(__mulsi3)
> > .globl  SYM (__mulsi3)
> > +   .globl  SYM (__mulsi3_internal)
> > +   .hidden SYM (__mulsi3_internal)
> 
> No need for extra entry symbols, just mark the real entry point as
> hidden, like in the static library.

That's clearly not correct or valid, as these are public interfaces.
If you make them hidden they'll be dropped from the dynamic symbol
table of libgcc_s.so.

Of course for libgcc.a they need to be hidden (it's an ABI bug if
they're not hidden there already but I think there's a separate layer
of the build system that forces them to be hidden).

Rich


Re: [RFA][PATCH] Stack clash protection 07/08 -- V4 (aarch64 bits)

2017-11-28 Thread Rich Felker
On Mon, Nov 27, 2017 at 03:48:41PM +, Szabolcs Nagy wrote:
> On 28/10/17 05:08, Jeff Law wrote:
> > On 10/13/2017 02:26 PM, Wilco Dijkstra wrote:
> >> For larger frames the first oddity is that there are now 2 separate params
> >> controlling how probes are generated:
> >>
> >> stack-clash-protection-guard-size (default 12, but set to 16 on AArch64)
> >> stack-clash-protection-probe-interval (default 12)
> >>
> >> I don't see how this makes sense. These values are closely related, so if
> >> one is different from the other, probing becomes ineffective/incorrect. 
> >> For example we generate code that trivially bypasses the guard despite
> >> all the probing:
> > My hope would be that we simply don't ever use the params.  They were
> > done as much for *you* to experiment with as anything.  I'd happy just
> > delete them as there's essentially no guard rails to ensure their values
> > are sane.
> 
> so is there a consensus now that 64k guard size is what
> gcc stack probing will assume?

I would very much prefer the compiler never assume more than 1 guard
page is present. I understand the performance argument, but I think
it's mostly in artificial scenarios where it makes a difference:

If a function has 4k and actually uses it, you're almost certainly
looking at >4k cycles, and a few cycles to probe the guard page are
irrelevant (dominated by the actual work).

If a function has >4k of local data and code paths that can easily be
determined don't use it (e.g. the big data is in a conditional block),
the compiler should shrink-wrap them anyway in order to avoid
pathological blowing away of the stack like what LLVM does all over
the place (hoisting stack allocations up as far as it can). The probe
can then likewise be moved with the shrinkwrapping.

The only remaining case is functions which have >4k of local data
that's mostly unused, but no easy way to determine that it's not used,
or where which small part is actually used is data-dependant. This is
the case where the probe is mildly costly, but it seems very unlikely
to be a common case in real-world usage.

> if so i can propose a patch to glibc to actually have
> that much guard by default in threads.. (i think it
> makes sense on all 64bit targets to have bigger guard
> and a consensus here would help making that change)

I don't object to making musl libc default to >64k (I'd prefer 68k to
avoid preserving alignment mod a large power of 2) guard size on
64-bit archs (on 32-bit, including ILP32, though, it's prohibitive
because it exhausts virtual address space; this may affect aarch64's
ILP32 model). It doesn't have any significant cost and it's useful
hardening. But I think it would be unfortunate if smaller guard sizes,
which applications can request/set, were left unsafe due to the
compiler making a tradeoff for performance that doesn't actually get
you any measureable real-world performance benefits.

Rich


Re: [PATCH] Use Pcrt1.o%s/gPcrt1.o%s for -static-pie

2017-11-01 Thread Rich Felker
On Sun, Oct 15, 2017 at 06:16:57AM -0700, H.J. Lu wrote:
> crt1.o is used to create dynamic and non-PIE static executables.  Static
> PIE needs to link with Pcrt1.o, instead of crt1.o, to relocate static PIE
> at run-time.  When -pg is used with -static-pie, gPcrt1.o should be used.
> 
> Tested on x86-64.  OK for master?

Is there a reason you didn't follow the existing naming practice here?
Openbsd and musl libc have both had static pie for a long time now and
have used rcrt1.o as the name.

Rich


> ---
>   * config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): Use
>   Pcrt1.o%s/gPcrt1.o%s for -static-pie.
> ---
>  gcc/config/gnu-user.h | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/config/gnu-user.h b/gcc/config/gnu-user.h
> index a967b69a350..c1cfdc3da90 100644
> --- a/gcc/config/gnu-user.h
> +++ b/gcc/config/gnu-user.h
> @@ -51,9 +51,10 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
>  #if defined HAVE_LD_PIE
>  #define GNU_USER_TARGET_STARTFILE_SPEC \
>"%{shared:; \
> - pg|p|profile:gcrt1.o%s; \
> + pg|p|profile:%{static-pie:gPcrt1.o%s;:gcrt1.o%s}; \
>   static:crt1.o%s; \
> - static-pie|" PIE_SPEC ":Scrt1.o%s; \
> + static-pie:Pcrt1.o%s; \
> + " PIE_SPEC ":Scrt1.o%s; \
>   :crt1.o%s} \
> crti.o%s \
> %{static:crtbeginT.o%s; \
> -- 
> 2.13.5


Re: [PATCH][i386][musl] Add cpuinfo to static libgcc only on *-musl*

2016-11-11 Thread Rich Felker
On Fri, Nov 11, 2016 at 05:40:04PM +0100, Uros Bizjak wrote:
> On Fri, Nov 11, 2016 at 4:50 PM, Szabolcs Nagy  wrote:
> > The __cpu_indicator_init and __cpu_model symbols are not safe to use
> > from shared libgcc_s.so from ifunc resolvers, so since gcc-6, only
> > the definitions from static libgcc.a are used, however the symbols
> > are kept in libgcc_s as well for backward compatibility (with
> > appropriate symbol version).  On targets without such backward
> > compatibility concern add cpuinfo to the static library only (this
> > avoids running the ctor, reduces libgcc_s size and elf abi concerns
> > because of gnu symbol versions).
> >
> > build tested on x86_64-linux-gnu and x86_64-linux-musl.
> >
> > ok to commit?
> >
> > i'd like to back port this to gcc-6 because musl dynamic linker
> > cannot load the libgcc_s.so.1 with the versioned symbols
> > (not an abi break: those symbols were never used on musl and
> > the current code does not work).
> >
> > libgcc/ChangeLog:
> >
> > 2016-11-11  Szabolcs Nagy  
> >
> > * config.host (i[3456]86-*-musl*, x86_64-*-musl*): Use
> > i386/t-cpuinfo-static instead of i386/t-cpuinfo.
> > * config/i386/t-cpuinfo-static: New.
> 
> LGTM, it is musl specific, but you know the musl part best ;)
> 
> Rubber-stamped OK for mainline and backports.

I would have preferred the previously-rejected fix getting rid of the
gratuitous dependency on binding to a non-default symver, that works
the same on all targets, but I'm fine with whatever fixes the problem
without introducing new ABI mess.

Rich


Re: [PATCH v4] SH FDPIC backend support

2015-11-15 Thread Rich Felker
On Sun, Nov 15, 2015 at 02:08:34PM +0900, Oleg Endo wrote:
> On Wed, 2015-11-11 at 09:56 -0500, Rich Felker wrote:
> 
> > Sorry, I don't really understand RTL well enough to make a code
> > snippet. What I want to express is that an insn "uses" (in the (use
> > ...) sense) a register (r12) conditionally depending on a runtime
> > option (TARGET_FDPIC).
> 
> As far as I know this is not possible.  It would require two variants
> of the same pattern, one with the use and another without the use.  

OK. That's exactly what we've got now.

> > Is this possible in the sh backend or does it need changes to
> > higher-level gcc code? (i.e. is it presently possible to make an insn
> > that conditionally clobbers different things rather than having to
> > make tons of different insns for each possible set of clobbers?)
> 
> This is basically the same as above ... it's not possible to
> conditionally construct/modify pattern descriptions in the .md. 
>  However, it's possible to modify the CALL_INSN_FUNCTION_USAGE field of
> call insns -- for some examples see 'grep -r CALL_INSN_FUNCTION_USAGE
> gcc/config/*'.  Also, it seems the SH backend doesn't make use of some
> existing libcall related parameters and target hooks/macros.  Maybe
> those could be helpful.

I'll take a look at this. Let me know if you turn up anything
interesting.

> > Another issue I've started looking at is how r12 is put in 
> > fixed_regs, which is conceptually wrong. Preliminary tests show that 
> > removing it from fixed_regs doesn't break and produces much better 
> > code -- r12 gets used as a temp register in functions that don't need 
> > it, and in one function that made multiple calls, the saving of 
> > initial r12 to a call-saved register even happened in the delay slot 
> > of the call. I've been discussing it with Alexander Monakov on IRC 
> > (#musl) and based on my understanding so far of how gcc works (which 
> > admittedly may be wrong) the current FDPIC code looks like it's 
> > written not to depend on r12 being 'fixed'. Also I think I'm pretty 
> > close to understanding how we could make the same improvements for 
> > non-FDPIC PIC codegen: instead loading r12 in the prologue, load a 
> > pseudo, then use that pseudo for GOT access and force it into r12 the 
> > same way FDPIC call code does for PLT calls. Does this sound correct?
> 
> Maybe TARGET_USE_PSEUDO_PIC_REG could be useful?

Yes. Is there any documentation on using it? I came across that but
couldn't figure out how it compares to just doing the pseudo yourself
in the target files. Is non-target-specific code affected by this?

Rich


Re: [PATCH] Fix SH/FDPIC bad codegen with ssp enabled

2015-11-14 Thread Rich Felker
On Sat, Nov 14, 2015 at 09:24:32AM +0900, Kaz Kojima wrote:
> Rich Felker  wrote:
> > The "chk_guard_add" pattern used for loading the GOT slot address for
> > __stack_chk_guard hard-codes use of r12 as a fixed GOT register and
> > thus is not suitable for FDPIC, where the saved initial value of r12
> > from function entry is what we need.
> 
> The patch is OK.  Committed as revision 230366.

Thanks!

> > I would actually prefer removing this hack entirely if possible. I
> > tried non-FDPIC with it disabled and did not experience any problems;
> > I suspect it was written to work around a bug that no longer exists.
> 
> Even we don't see the problem without that, it'll be a latent
> issue with the old reload, I think.  When SH switches to LRA
> completely, this should be revisit.

I thought about trying a patch to eliminate the use of the hard-coded
PIC register and instead generate a pseudo for the GOT address. Then
some FDPIC and non-FDPIC code could potentially be unified to use this
pseudo, just with different ways of defining it (r12-at-entry vs
PC-relative symbolic load of _GLOBAL_OFFSET_TABLE_). Would this be
acceptable?

I've already done some experiments removing r12 from fixed_regs for
FDPIC and nothing visibly broke; gcc correctly used r12 as a temp in
functions which didn't need the GOT address, and codegen was improved
even in functions which do use it. These results were observed on musl
libc.so.

Rich


Re: [PATCH, x86] Fix posix_memalign declaration in mm_malloc.h

2015-11-13 Thread Rich Felker
On Fri, Nov 13, 2015 at 09:58:30PM +0100, Bernd Schmidt wrote:
> On 11/13/2015 06:11 PM, Szabolcs Nagy wrote:
> >Followup to https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01433.html
> >
> >The posix_memalign declaration is incompatible with musl libc in C++,
> >because of the exception specification (matters with -std=c++11
> >-pedantic-errors).  It also pollutes the namespace and lacks protection
> >against a potential macro definition that is allowed by POSIX.  The
> >fix avoids source level namespace pollution but retains the dependency
> >on the posix_memalign extern libc symbol.
> 
> >  #ifndef __cplusplus
> >-extern int posix_memalign (void **, size_t, size_t);
> >+extern int _mm_posix_memalign (void **, size_t, size_t)
> >  #else
> >-extern "C" int posix_memalign (void **, size_t, size_t) throw ();
> >+extern "C" int _mm_posix_memalign (void **, size_t, size_t) throw ()
> >  #endif
> >+__asm__("posix_memalign");
> 
> Can't say I like the __asm__ after the #if/#else/#endif block.

It could trivially be moved inside.

> >  static __inline void *
> >  _mm_malloc (size_t size, size_t alignment)
> >@@ -42,7 +43,7 @@ _mm_malloc (size_t size, size_t alignment)
> >  return malloc (size);
> >if (alignment == 2 || (sizeof (void *) == 8 && alignment == 4))
> >  alignment = sizeof (void *);
> >-  if (posix_memalign (&ptr, alignment, size) == 0)
> >+  if (_mm_posix_memalign (&ptr, alignment, size) == 0)
> >  return ptr;
> >else
> >  return NULL;
> 
> Random observation: this seems overly conservative as malloc is
> defined to return something aligned to more than one byte.

You can assume it returns memory aligned to _Alignof(max_align_t),
nothing more. But on some broken library implementations (windows?)
you might not even get that.

> Couldn't this bug be fixed by either
>  - just overallocating and aligning manually (eliminating the dependence
>entirely)

I don't think so; then the memory is not freeable, at least not
without extra hacks.

>  - or moving _mm_malloc into libgcc.a?

I wouldn't oppose that, but it seems like a more invasive change than
is necessary to fix this bug.

Rich


[PATCH] Fix SH/FDPIC bad codegen with ssp enabled

2015-11-13 Thread Rich Felker
The "chk_guard_add" pattern used for loading the GOT slot address for
__stack_chk_guard hard-codes use of r12 as a fixed GOT register and
thus is not suitable for FDPIC, where the saved initial value of r12
from function entry is what we need.

I would actually prefer removing this hack entirely if possible. I
tried non-FDPIC with it disabled and did not experience any problems;
I suspect it was written to work around a bug that no longer exists.

2015-11-13  Rich Felker 

gcc/
* config/sh/sh.md (symGOT_load): Suppress __stack_chk_guard
address loading hack for FDPIC targets.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 7a40d0f..45c9995 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -11082,7 +11082,7 @@ (define_expand "symGOT_load"
   operands[2] = !can_create_pseudo_p () ? operands[0] : gen_reg_rtx (Pmode);
   operands[3] = !can_create_pseudo_p () ? operands[0] : gen_reg_rtx (Pmode);
 
-  if (!TARGET_SHMEDIA
+  if (!TARGET_SHMEDIA && !TARGET_FDPIC
   && flag_stack_protect
   && GET_CODE (operands[1]) == CONST
   && GET_CODE (XEXP (operands[1], 0)) == UNSPEC


Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker
On Wed, Nov 11, 2015 at 09:56:42AM -0500, Rich Felker wrote:
> > > I'm actually
> > > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > > interested in that's not so invasive, and for now I'm just having
> > > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > > to
> > > avoid needing all the label stuff, but it would be nice to find a way
> > > to reuse the existing framework.
> > 
> > Do you know how this affects code size (and inherently performance)?
> 
> I suspect it makes very little difference, but to compare I'd need to
> do the same hack on 5.2.0 or trunk. The only difference should be one
> additional load per call, and one additional GOT slot per function
> called this way (but just once per executable/library).

Actually I think this is not quite right: if the call takes place via
the GOT, this also requires the initial r12 to be preserved somewhere
in order to load the function address, whereas for SFUNC_STATIC, the
initial r12 can be completely discarded, right? (SFUNC functions are
not permitted to use the GOT themselves as far as I can tell, and thus
do not receive the hidden GOT argument in r12.)

Rich


Re: [PATCH v4] SH FDPIC backend support

2015-11-11 Thread Rich Felker
On Wed, Nov 11, 2015 at 11:36:26PM +0900, Oleg Endo wrote:
> On Tue, 2015-11-10 at 15:07 -0500, Rich Felker wrote:
> 
> > > The way libcalls are now emitted is a bit unhandy.  If more special
> > > -ABI
> > > libcalls are to be added in the future, they all have to do the jsr
> > > vs.
> > > bsrf handling (some potential candidates for new libcalls are
> > > optimized
> > > soft FP routines).  Then we still have PR 65374 and PR 54019. In
> > > the
> > > future maybe we should come up with something that allows emitting
> > > libcalls in a more transparent way...
> > 
> > I'd like to look into improving this at some point in the near
> > future.
> > On further reading of the changes made, I think there's a lot of code
> > we could reduce or simplify.
> > 
> > In all the places where new RTL patterns were added for *call*_fdpic,
> > the main constraint change vs the non-fdpic version is using REG_PIC.
> > Is it possible to make a REG_GOT_ARG macro or similar that's defined
> > as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?
> 
> I'm not sure I understand what you mean by that.  Do you have a small
> code snippet example?

Sorry, I don't really understand RTL well enough to make a code
snippet. What I want to express is that an insn "uses" (in the (use
...) sense) a register (r12) conditionally depending on a runtime
option (TARGET_FDPIC).

> > As for the call site stuff, I wonder why the existing call site stuff
> > used by "call_pcrel" can't be used for SFUNC_STATIC. 
> 
> "call_pcrel" is a real call insn.  The libcalls are not expanded as
> real call insns to avoid the regular register save/restores etc which
> is needed to do a normal function call.

Yes, I see that. What I was really wondering though is why the new
call site generation code and constraint was added when the call_pcrel
code already has mechanisms for this, rather than just duplicating the
internals that call_pcrel uses. It seems like we're doing things in a
gratuitously different way here.

> I guess the generic fix for this issue would be some mechanism to
> specify which regs are clobbered/preserved and then provide the right
> settings for the libcall functions.

Is this possible in the sh backend or does it need changes to
higher-level gcc code? (i.e. is it presently possible to make an insn
that conditionally clobbers different things rather than having to
make tons of different insns for each possible set of clobbers?)

> > I'm actually
> > trying to prepare a simpler FDPIC patch for other gcc versions we're
> > interested in that's not so invasive, and for now I'm just having
> > function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC
> > to
> > avoid needing all the label stuff, but it would be nice to find a way
> > to reuse the existing framework.
> 
> Do you know how this affects code size (and inherently performance)?

I suspect it makes very little difference, but to compare I'd need to
do the same hack on 5.2.0 or trunk. The only difference should be one
additional load per call, and one additional GOT slot per function
called this way (but just once per executable/library).

Another issue I've started looking at is how r12 is put in fixed_regs,
which is conceptually wrong. Preliminary tests show that removing it
from fixed_regs doesn't break and produces much better code -- r12
gets used as a temp register in functions that don't need it, and in
one function that made multiple calls, the saving of initial r12 to a
call-saved register even happened in the delay slot of the call. I've
been discussing it with Alexander Monakov on IRC (#musl) and based on
my understanding so far of how gcc works (which admittedly may be
wrong) the current FDPIC code looks like it's written not to depend on
r12 being 'fixed'. Also I think I'm pretty close to understanding how
we could make the same improvements for non-FDPIC PIC codegen: instead
of loading r12 in the prologue, load a pseudo, then use that pseudo
for GOT access and force it into r12 the same way FDPIC call code does
for PLT calls. Does this sound correct?

Rich


Re: [PATCH v4] SH FDPIC backend support

2015-11-10 Thread Rich Felker
On Tue, Oct 27, 2015 at 11:01:39PM +0900, Oleg Endo wrote:
> On Mon, 2015-10-26 at 22:47 -0400, Rich Felker wrote:
> > On Sun, Oct 25, 2015 at 11:28:51PM +0900, Oleg Endo wrote:
> > > On Fri, 2015-10-23 at 02:32 -0400, Rich Felker wrote:
> > > > Here's my updated version of the FDPIC patch with all requested
> > > > changes made and Changelog added. I've included all the original
> > > > authors. This is my first time writing such an extensive
> > > > Changelog
> > > > entry so please let me know if there are things I got wrong.
> > > 
> > > I took the liberty and fixed some minor formatting trivia and
> > > extracted
> > > functions sh_emit_storesi and sh_emit_storehi which are used in
> > >  sh_trampoline_init to effectively memcpy code into the trampoline
> > > area.  Can you please check it?  If it's OK I'll commit the
> > > attached
> > > patch to trunk.
> > 
> > Is there anything in particular you'd like me to check? It builds
> > fine
> > for fdpic target, successfully compiles musl libc.so, and busybox
> > runs
> > with the resulting libc.so. I did a quick visual inspection of the
> > diff between my version and yours too and didn't see anything that
> > looked suspicious to me.
> 
> Thanks.  I have committed it as r229438 after a sanity check with "make
> all" on sh-elf.
> 
> The way libcalls are now emitted is a bit unhandy.  If more special-ABI
> libcalls are to be added in the future, they all have to do the jsr vs.
> bsrf handling (some potential candidates for new libcalls are optimized
> soft FP routines).  Then we still have PR 65374 and PR 54019. In the
> future maybe we should come up with something that allows emitting
> libcalls in a more transparent way...

I'd like to look into improving this at some point in the near future.
On further reading of the changes made, I think there's a lot of code
we could reduce or simplify.

In all the places where new RTL patterns were added for *call*_fdpic,
the main constraint change vs the non-fdpic version is using REG_PIC.
Is it possible to make a REG_GOT_ARG macro or similar that's defined
as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?

As for the call site stuff, I wonder why the existing call site stuff
used by "call_pcrel" can't be used for SFUNC_STATIC. I'm actually
trying to prepare a simpler FDPIC patch for other gcc versions we're
interested in that's not so invasive, and for now I'm just having
function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC to
avoid needing all the label stuff, but it would be nice to find a way
to reuse the existing framework.

Rich


Re: [PATCH v4] SH FDPIC backend support

2015-10-26 Thread Rich Felker
On Sun, Oct 25, 2015 at 11:28:51PM +0900, Oleg Endo wrote:
> On Fri, 2015-10-23 at 02:32 -0400, Rich Felker wrote:
> > Here's my updated version of the FDPIC patch with all requested
> > changes made and Changelog added. I've included all the original
> > authors. This is my first time writing such an extensive Changelog
> > entry so please let me know if there are things I got wrong.
> 
> I took the liberty and fixed some minor formatting trivia and extracted
> functions sh_emit_storesi and sh_emit_storehi which are used in
>  sh_trampoline_init to effectively memcpy code into the trampoline
> area.  Can you please check it?  If it's OK I'll commit the attached
> patch to trunk.

Is there anything in particular you'd like me to check? It builds fine
for fdpic target, successfully compiles musl libc.so, and busybox runs
with the resulting libc.so. I did a quick visual inspection of the
diff between my version and yours too and didn't see anything that
looked suspicious to me.

Rich


Re: [PATCH] Add missing INCLUDE_DEFAULTS_MUSL_LOCAL

2015-10-26 Thread Rich Felker
On Tue, Oct 27, 2015 at 12:16:16AM +, Joseph Myers wrote:
> On Mon, 26 Oct 2015, Rich Felker wrote:
> 
> > On Mon, Oct 26, 2015 at 11:42:37PM +, Joseph Myers wrote:
> > > On Mon, 26 Oct 2015, Rich Felker wrote:
> > > 
> > > > musl explicitly does not support using a mix of libc headers and
> > > > compiler-provided freestanding headers. While there may be
> > > 
> > > In that case the GCC ports for musl should define USER_H = 
> > > $(EXTRA_HEADERS) like t-openbsd does.  (Of course that won't work for 
> > > multilib builds supporting different C libraries with different 
> > > multilibs.)
> > 
> > This sounds interesting. Are there practical ways it's a better
> > solution than what linux.h is doing now for musl? Inability to support
> 
> Well, it ensures the installed compiler can't find the freestanding 
> headers at all, because they're not installed (other than 
> architecture-specific intrinsics headers).

Oh. I think that breaks Linux kernel builds, which use -nostdinc then
add back the gcc include directory or something like that. I don't
remember the details.

Rich


Re: [PATCH] Add missing INCLUDE_DEFAULTS_MUSL_LOCAL

2015-10-26 Thread Rich Felker
On Mon, Oct 26, 2015 at 11:42:37PM +, Joseph Myers wrote:
> On Mon, 26 Oct 2015, Rich Felker wrote:
> 
> > musl explicitly does not support using a mix of libc headers and
> > compiler-provided freestanding headers. While there may be
> 
> In that case the GCC ports for musl should define USER_H = 
> $(EXTRA_HEADERS) like t-openbsd does.  (Of course that won't work for 
> multilib builds supporting different C libraries with different 
> multilibs.)

This sounds interesting. Are there practical ways it's a better
solution than what linux.h is doing now for musl? Inability to support
multilib well is something I've wondered if we could improve on, but
from what you said it sounds like this wouldn't help.

Rich


Re: [PATCH] Add missing INCLUDE_DEFAULTS_MUSL_LOCAL

2015-10-26 Thread Rich Felker
On Mon, Oct 26, 2015 at 12:32:01PM +, Szabolcs Nagy wrote:
> On 23/10/15 21:20, Joseph Myers wrote:
> >On Fri, 23 Oct 2015, Szabolcs Nagy wrote:
> >
> >>i think bsd libcs do the same, compiler headers interfering
> >>with libc headers is problematic (e.g. FLT_ROUNDS is wrong
> >>in gcc float.h, applications shouldn't see that), i'm not sure
> >
> >FLT_ROUNDS is an ordinary compiler bug (bug 30569), should be fixable
> >reasonably straightforwardly as outlined at
> >, probably within say a
> >week's work if most architecture-specific changes are left to architecture
> >maintainers.
> 
> musl tries to support old compilers in general (it can be built
> with gcc 3.x, and it should be possible to use with a wider range
> of compilers with reasonably consistent semantics, so fixing that
> bug in gcc does not help much.)
> 
> a better example would be stddef.h (it has incompatible definition
> of NULL, max_align_t etc, the ifdefs in gcc are fragile and none
> of the __need_FOO patterns match the ones musl use).
> 
> i think in general the higher level layer should come first
> (c++ first, then libc, then compiler include paths), so the one
> closer to the user gets a chance to override the ones after it,
> stdc-predef.h was a good step toward that.

musl explicitly does not support using a mix of libc headers and
compiler-provided freestanding headers. While there may be
circumstances under which no effective breakage occurs, this is merely
by chance and is not a supported usage. No effort is made by musl to
interact with the gcc headers (e.g. defining the macros they use to
prevent multiple definitions or control multiple inclusion, or testing
for whether they have already been included are made). It is necessary
to use the BSD-style header order, so that the libc headers get used
instead of the compiler-provided ones, to have a reliable musl target.

Rich


[PATCH v4] SH FDPIC backend support

2015-10-22 Thread Rich Felker
Here's my updated version of the FDPIC patch with all requested
changes made and Changelog added. I've included all the original
authors. This is my first time writing such an extensive Changelog
entry so please let me know if there are things I got wrong.

Rich


2010-08-19  Daniel Jacobowitz  
Joseph Myers  
Mark Shinwell  
Andrew Stubbs  
    Rich Felker 

gcc/
* config.gcc: Handle --enable-fdpic.
* config/sh/constraints.md: Add Ccl constraint.
* config/sh/linux.h (SUBTARGET_LINK_EMUL_SUFFIX): Handle -mfdpic.
* config/sh/sh-c.c: (sh_cpu_cpp_builtins): Add __FDPIC__ and
__SH_FDPIC__.
* config/sh/sh-mem.cc (expand_block_move): Support FDPIC
for calls to library functions.
* config/sh/sh-protos.h (function_symbol): Adapt for FDPIC
support.
(sh_get_fdpic_reg_initial_val, sh_load_function_descriptor):
Add functions for FDPIC support.
* config/sh/sh.c (sh_assemble_integer,
sh_cannot_force_const_mem_p, sh_reloc_rw_mask,
TARGET_ASM_INTEGER, TARGET_CANNOT_FORCE_CONST_MEM,
TARGET_ASM_RELOC_RW_MASK): New for FDPIC.
(sh_option_override): Force -fPIC and -fno-function-cse for
FDPIC.
(sh_asm_output_addr_const_extra): Add function descriptor
reference outputs.
(prepare_move_operands): Use FDPIC initial GOT register for
TLS-related GOT access; inhibit cross-section address offset
constants for FDPIC.
(sh_assemble_integer): Produce function descriptor addresses
for FDPIC function address values.
(sh_cannot_copy_insn_p): Inhibit copying instructions that are
FDPIC PC-relative call sites.
(expand_ashiftrt): Support FDPIC for call to library function.
(sh_expand_prologue): Inhibit PC-relative GOT address load for
FDPIC; for FDPIC, GOT address is a hidden argument in r12.
(nonpic_symbol_mentioned_p): Add cases for UNSPEC_GOTFUNCDESC
and UNSPEC_GOTOFFFUNCDESC.
(legitimize_pic_address): Resolve function symbols to function
descriptors for FDPIC; do not use GOT-relative addressing for
local data that may be read-only on FDPIC.
(sh_trampoline_init): Generate FDPIC trampolines.
(sh_expand_sym_label2reg): Don't assume sibcalls are local;
this need not be true for FDPIC.
(sh_output_mi_thunk): Generate FDPIC call.
(function_symbol): For SFUNC_STATIC on FDPIC, use PC-relative
addressing rather than GOT-relative addressing; generate call
site labels to make this possible.
(sh_conditional_register_usage): Mark GOT register usage by
FDPIC.
(sh_legitimate_constant_p): Impose FDPIC constant constraints.
(sh_cannot_force_const_mem_p, sh_load_function_descriptor,
sh_get_fdpic_reg_initial_val, sh_reloc_rw_mask): New for
FDPIC.
* config/sh/sh.h (SUBTARGET_ASM_SPEC,
SUBTARGET_LINK_EMUL_SUFFIX): Handle -mfdpic.
(FDPIC_SELF_SPECS): New self specs to insert -mfdpic by
default if configured with --enable-fdpic.
(TRAMPOLINE_SIZE): Select trampoline size for FDPIC.
(PIC_OFFSET_TABLE_REG_CALL_CLOBBERED,
SH_OFFSETS_MUST_BE_WITHIN_SECTIONS_P): New for FDPIC.
(ASM_PREFERRED_EH_DATA_FORMAT): Add EH format constraints for
FDPIC.
* config/sh/sh.md (R12_REG, UNSPEC_GOTFUNCDESC,
UNSPEC_GOTOFFFUNCDESC, calli_fdpic, call_valuei_fdpic,
sibcalli_fdpic, sibcalli_pcrel_fdpic, sibcall_pcrel_fdpic,
sibcall_valuei_fdpic, sibcall_valuei_pcrel_fdpic,
sibcall_value_pcrel_fdpic, sym2GOTFUNCDESC,
symGOTFUNCDESC2reg, sym2GOTOFFFUNCDESC,
symGOTOFFFUNCDESC2reg): New.
(udivsi3_i1, udivsi3_i4, udivsi3_i4_single, udivsi3,
divsi_inv_call_combine, divsi3_i4, divsi3_i4_single, divsi3,
ashlsi3, ashlsi3_d_call, ashrsi3_n, lshrsi3, lshrsi3_d_call,
calli, call_valuei, call, call_value, sibcalli,
sibcalli_pcrel, sibcall_pcrel, sibcall, sibcall_valuei,
sibcall_valuei_pcrel, sibcall_value_pcrel, sibcall_value, 
GOTaddr2picreg, symGOT_load, symGOTOFF2reg, block_move_real,
block_lump_real, block_move_real_i4, block_lump_real_i4): Add
support for FDPIC.
(mulsi3, ic_invalidate_line, initialize_trampoline, call_pop,
call_value_pop): Adjust for new function_symbol signature.
* config/sh/sh.opt (-mfdpic): New option.
* doc/install.texi (Options specification): Add --enable-fdpic.
* doc/invoke.texi (SH Options): Add -mfdpic.

include/
* longlong.h (udiv_qrnnd): Add FDPIC compatible version.

libitm/
* config/sh/sjlj.S (_ITM_beginTransaction): Bypass PLT calling
GTM_begin_transaction for compatibility with FDPIC.


diff --git a/gcc/config.gcc b/gcc/config.gcc
index bf26776..ed118f3 1

Re: [PATCH v3] SH FDPIC backend support

2015-10-21 Thread Rich Felker
On Wed, Oct 21, 2015 at 10:17:51PM +0900, Oleg Endo wrote:
> Rich,
> 
> Thanks for the updated patch.
> Please do not start new threads for a continuation of an existing
> thread.  This makes it difficult to track in the archives.
> 
> On Tue, 2015-10-20 at 23:41 -0400, Rich Felker wrote:
> > Attached is a hopefully near-ready-for-commit version of the SH/FDPIC
> > patch. I believe I've addressed all comments by Oleg and Kaz on the
> > previous versions of the patch. I'm still working on drafting the
> > Changelog entry (there's a lot to go in it, and I might very well be
> > going into more detail than is needed).
> 
> Other than the missing ChangeLog: How did you test the patch?

I've tested the new functionality (FDPIC) building musl, busybox,
dropbear, and other software. I believe Kaz tested that sh4 had no
regressions in the gcc tests with the patch applied.

> > One thing I've considered doing, since TARGET_FDPIC implies flag_pic
> > now, is removing all parts of the patch that just replace checks for
> > flag_pic with (flag_pic || TARGET_FDPIC). Would doing this be
> > desirable? It shrinks the patch a bit but of course more strongly
> > codes the assumption that TARGET_FDPIC implies flag_pic.
> 
> If FDPIC only ever will make sense in combination with flag_pic != 0,
> then I guess this could be done.

The original patch did not force flag_pic, but I was getting broken
codegen and/or ICE without it and couldn't track down the source. In
any case, FDPIC _is_ position-independent code (an even more
constrained variant of it) so it makes sense for flag_pic to be set, I
think.

> If you do that, please add a comment
> above this hunk:
> 
> > +  if (TARGET_FDPIC && !flag_pic)
> > +flag_pic = 2;

OK, if I make this change I'll do that.

> Some other nits:
> 
> >   rtx lab = function_symbol (func_addr_rtx, "__movmemSI12_i4", 
> > SFUNC_STATIC).lab;
> 
> Break overlong lines to fit into 80 columns.  E.g.
> 
> rtx lab = function_symbol (func_addr_rtx, "__movmemSI12_i4",
>SFUNC_STATIC).lab;

OK. For some of these I wasn't sure of the proper style for wrapping
them so I figured I'd just wait to see what you say.

> >   if (TARGET_FDPIC
> >   && (TARGET_SHMEDIA || TARGET_SHCOMPACT || !TARGET_SH2))
> > sorry ("non-SH2 FDPIC");
> 
> Drop SH5 stuff.

Without this passing -mfdpic on sh5 will probably lead to ICE or very
bogus codegen since there are lots of places that assume FDPIC and sh5
are mutually exclusive. If you're ok with that I can drop it, though.

> >   if (TARGET_FDPIC)
> > {
> >   emit_move_insn (gen_rtx_REG (Pmode, PIC_REG),
> >   sh_get_fdpic_reg_initial_val ());
> > }
> > 
> 
> Remove braces around single statements.

OK. I prefer that too but I wasn't sure what style GCC favors.

> >   return (GET_CODE (x) != CONST_DOUBLE
> >   || mode == DFmode || mode == SFmode
> >   || mode == DImode || GET_MODE (x) == VOIDmode);
> 
> Remove unnecessary parens around return statements.

OK,

> When applying the patch I'm getting:
> patching file gcc/config/sh/sh-protos.h
> (Stripping trailing CRs from patch; use --binary to disable.)
> 
> Maybe something with your editor settings?

I have my editor (Emacs) setup to always force unix text mode, so any
CR's show up as an editable/deletable ^M in the buffer, and I can't
find any. Grepping for literal CRs also fails. Is it possible the mail
system botched this?

Rich


[PATCH v3] SH FDPIC backend support

2015-10-20 Thread Rich Felker
Attached is a hopefully near-ready-for-commit version of the SH/FDPIC
patch. I believe I've addressed all comments by Oleg and Kaz on the
previous versions of the patch. I'm still working on drafting the
Changelog entry (there's a lot to go in it, and I might very well be
going into more detail than is needed).

One thing I've considered doing, since TARGET_FDPIC implies flag_pic
now, is removing all parts of the patch that just replace checks for
flag_pic with (flag_pic || TARGET_FDPIC). Would doing this be
desirable? It shrinks the patch a bit but of course more strongly
codes the assumption that TARGET_FDPIC implies flag_pic.

Rich
diff --git a/gcc/config.gcc b/gcc/config.gcc
index bf26776..ed118f3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2621,6 +2621,9 @@ sh-*-elf* | sh[12346l]*-*-elf* | \
tm_file="${tm_file} dbxelf.h elfos.h sh/elf.h"
case ${target} in
sh*-*-linux*)   tmake_file="${tmake_file} sh/t-linux"
+   if test x$enable_fdpic = xyes; then
+   tm_defines="$tm_defines FDPIC_DEFAULT=1"
+   fi
tm_file="${tm_file} gnu-user.h linux.h glibc-stdint.h 
sh/linux.h" ;;
sh*-*-netbsd*)
tm_file="${tm_file} netbsd.h netbsd-elf.h 
sh/netbsd-elf.h"
diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md
index 4d1eb2d..41c88a2 100644
--- a/gcc/config/sh/constraints.md
+++ b/gcc/config/sh/constraints.md
@@ -25,6 +25,7 @@
 ;;  Bsc: SCRATCH - for the scratch register in movsi_ie in the
 ;;   fldi0 / fldi0 cases
 ;; Cxx: Constants other than only CONST_INT
+;;  Ccl: call site label
 ;;  Css: signed 16-bit constant, literal or symbolic
 ;;  Csu: unsigned 16-bit constant, literal or symbolic
 ;;  Csy: label or symbol
@@ -233,6 +234,11 @@
hence mova is being used, hence do not select this pattern."
   (match_code "scratch"))
 
+(define_constraint "Ccl"
+  "A call site label, for bsrf."
+  (and (match_code "unspec")
+   (match_test "XINT (op, 1) == UNSPEC_CALLER")))
+
 (define_constraint "Css"
   "A signed 16-bit constant, literal or symbolic."
   (and (match_code "const")
diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h
index a9dd43a..5d4dd1f 100644
--- a/gcc/config/sh/linux.h
+++ b/gcc/config/sh/linux.h
@@ -69,7 +69,8 @@ along with GCC; see the file COPYING3.  If not see
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX
-#define SUBTARGET_LINK_EMUL_SUFFIX "_linux"
+#define SUBTARGET_LINK_EMUL_SUFFIX "%{mfdpic:_fd;:_linux}"
+
 #undef SUBTARGET_LINK_SPEC
 #define SUBTARGET_LINK_SPEC \
   "%{shared:-shared} \
diff --git a/gcc/config/sh/sh-c.c b/gcc/config/sh/sh-c.c
index a98c148..01a12e6 100644
--- a/gcc/config/sh/sh-c.c
+++ b/gcc/config/sh/sh-c.c
@@ -141,6 +141,11 @@ sh_cpu_cpp_builtins (cpp_reader* pfile)
 builtin_define ("__HITACHI__");
   if (TARGET_FMOVD)
 builtin_define ("__FMOVD_ENABLED__");
+  if (TARGET_FDPIC)
+{
+  builtin_define ("__SH_FDPIC__");
+  builtin_define ("__FDPIC__");
+}
   builtin_define (TARGET_LITTLE_ENDIAN
  ? "__LITTLE_ENDIAN__" : "__BIG_ENDIAN__");
 
diff --git a/gcc/config/sh/sh-mem.cc b/gcc/config/sh/sh-mem.cc
index 23a7287..6e521ba 100644
--- a/gcc/config/sh/sh-mem.cc
+++ b/gcc/config/sh/sh-mem.cc
@@ -123,10 +123,10 @@ expand_block_move (rtx *operands)
  rtx r4 = gen_rtx_REG (SImode, 4);
  rtx r5 = gen_rtx_REG (SImode, 5);
 
- function_symbol (func_addr_rtx, "__movmemSI12_i4", SFUNC_STATIC);
+ rtx lab = function_symbol (func_addr_rtx, "__movmemSI12_i4", 
SFUNC_STATIC).lab;
  force_into (XEXP (operands[0], 0), r4);
  force_into (XEXP (operands[1], 0), r5);
- emit_insn (gen_block_move_real_i4 (func_addr_rtx));
+ emit_insn (gen_block_move_real_i4 (func_addr_rtx, lab));
  return true;
}
   else if (! optimize_size)
@@ -139,13 +139,13 @@ expand_block_move (rtx *operands)
  rtx r6 = gen_rtx_REG (SImode, 6);
 
  entry_name = (bytes & 4 ? "__movmem_i4_odd" : "__movmem_i4_even");
- function_symbol (func_addr_rtx, entry_name, SFUNC_STATIC);
+ rtx lab = function_symbol (func_addr_rtx, entry_name, 
SFUNC_STATIC).lab;
  force_into (XEXP (operands[0], 0), r4);
  force_into (XEXP (operands[1], 0), r5);
 
  dwords = bytes >> 3;
  emit_insn (gen_move_insn (r6, GEN_INT (dwords - 1)));
- emit_insn (gen_block_lump_real_i4 (func_addr_rtx));
+ emit_insn (gen_block_lump_real_i4 (func_addr_rtx, lab));
  return true;
}
   else
@@ -159,10 +159,10 @@ expand_block_move (rtx *operands)
   rtx r5 = gen_rtx_REG (SImode, 5);
 
   sprintf (entry, "__movmemSI%d", bytes);
-  function_symbol (func_addr_rtx, entry, SFUNC_STATIC);
+  rtx lab = function_symbol (func_addr_rtx, entry, SFUNC_STATIC).lab;
   force_into (XEXP (operands[0], 0), r4);
   force_i

Re: [PATCH v2] SH FDPIC backend support

2015-10-06 Thread Rich Felker
On Wed, Oct 07, 2015 at 07:22:59AM +0900, Oleg Endo wrote:
> On Tue, 2015-10-06 at 12:52 -0400, Rich Felker wrote:
> > > > +  if (TARGET_FDPIC)
> > > > +{
> > > > +  rtx a = force_reg (Pmode, plus_constant (Pmode, XEXP (tramp_mem, 
> > > > 0), 8));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 0), a);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4), 
> > > > OUR_FDPIC_REG);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd203d302 : 
> > > > 0xd302d203,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x5c216122 : 
> > > > 0x61225c21,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 16),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009412b : 
> > > > 0x412b0009,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 20), cxt);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 24), fnaddr);
> > > > +}
> > > > +  else
> > > > +{
> > > > +  emit_move_insn (change_address (tramp_mem, SImode, NULL_RTX),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd301d202 : 
> > > > 0xd202d301,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4),
> > > > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009422b : 
> > > > 0x422b0009,
> > > > +   SImode));
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8), cxt);
> > > > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12), fnaddr);
> > > > +}
> > > 
> > > I think this hunk really needs a comment.  It copies machine code from
> > > somewhere to somewhere via constant loads... but what exactly are the
> > > instructions ...
> > 
> > This is generating trampolines for nested functions. This portion of
> > the patch applied without modification from the old patch, so I didn't
> > read into it in any more detail; it seems to be the following, which
> > makes sense:
> > 
> > 0:  .long 1f
> > .long gotval
> > 1:  mov.l 3f,r3
> > mov.l 2f,r2
> > mov.l @r2,r1
> > mov.l @(4,r2),r12
> > jmp @r1
> > nop
> > 3:  .long cxt
> > 2:  .long fnaddr
> > 
> > The corresponding non-FDPIC version is:
> > 
> > mov.l 3f,r3
> > mov.l 2f,r2
> > jmp @r2
> > nop
> > 3:  .long cxt
> > 2:  .long fnaddr
> > 
> > Should these go into the source as comments?
> 
> Yes, please.  And of course some of the descriptive text as above.

OK.

> > I would think it does, but I've found in the RTL files sometimes extra
> > escaping is silently accepted, and I'm not sure if omitting it would
> > visibly break. Can I rely on it producing a visible error right away
> > if removing it is wrong, or do I need to search the gccint
> > documentation to figure out what the right way is?
> 
> Just compile some code and look at the generated asm.

OK, I'll try this.

> > It can't generate the same code either way because, with the patch as
> > submitted, there's an extra load inside the asm. I would prefer
> > switching to an approach that avoids that (mainly to avoid the ugly
> > near-duplication of the asm block, but also to save a couple
> > instructions) but short of feedback on acceptable ways to do the
> > punning in the C++ I'll just leave it in the asm for now.
> 
> Do you have some alternatives to what's currently in the patch?  It's
> difficult to judge without seeing them...

Perhaps something like the following:

#ifdef __SH_FDPIC__
typedef __attribute__((__may_alias__)) uintptr_t sh_aliased_uintptr_t;
#define SH_CODE_ADDR(x) (*(sh_aliased_uintptr_t *)(x))
#else
#define SH_CODE_ADDR(x) x
#endif

And then just passing SH_CODE_ADDR(__udiv_qrnnd_16) rather than just
__udiv_qrnnd_16 as the input to the asm.

Rich


Re: [PATCH v2] SH FDPIC backend support

2015-10-06 Thread Rich Felker
On Tue, Oct 06, 2015 at 09:39:20PM +0900, Oleg Endo wrote:
> On Mon, 2015-10-05 at 23:15 -0400, Rich Felker wrote:
> > Attached is the initial version of the patch against trunk. I've fixed
> > the functional issues I'm aware of from the previous version: ICE in
> > generating the plain-SH2 libgcc-based shifts, missing
> > sh_legitimate_constant_p changes, and bad asm spec that broke
> > non-FDPIC. Cosmetic/style changes have not been made yet.
> 
> OK, I've got a few other points.

Thanks!

> > +  if (TARGET_FDPIC)
> > +{
> > +  rtx a = force_reg (Pmode, plus_constant (Pmode, XEXP (tramp_mem, 0), 
> > 8));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 0), a);
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4), 
> > OUR_FDPIC_REG);
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8),
> > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd203d302 : 
> > 0xd302d203,
> > +   SImode));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12),
> > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x5c216122 : 
> > 0x61225c21,
> > +   SImode));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 16),
> > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009412b : 
> > 0x412b0009,
> > +   SImode));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 20), cxt);
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 24), fnaddr);
> > +}
> > +  else
> > +{
> > +  emit_move_insn (change_address (tramp_mem, SImode, NULL_RTX),
> > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0xd301d202 : 
> > 0xd202d301,
> > +   SImode));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 4),
> > + gen_int_mode (TARGET_LITTLE_ENDIAN ? 0x0009422b : 
> > 0x422b0009,
> > +   SImode));
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 8), cxt);
> > +  emit_move_insn (adjust_address (tramp_mem, SImode, 12), fnaddr);
> > +}
> 
> I think this hunk really needs a comment.  It copies machine code from
> somewhere to somewhere via constant loads... but what exactly are the
> instructions ...

This is generating trampolines for nested functions. This portion of
the patch applied without modification from the old patch, so I didn't
read into it in any more detail; it seems to be the following, which
makes sense:

0:  .long 1f
.long gotval
1:  mov.l 3f,r3
mov.l 2f,r2
mov.l @r2,r1
mov.l @(4,r2),r12
jmp @r1
nop
3:  .long cxt
2:  .long fnaddr

The corresponding non-FDPIC version is:

mov.l 3f,r3
mov.l 2f,r2
jmp @r2
nop
3:  .long cxt
2:  .long fnaddr

Should these go into the source as comments?

> In the multiple alternative insn patterns ...
> 
> > -  "jsr @%1%#"
> > +  "@
> > +   jsr @%1%#
> > +   bsrf%1\\n%O2:%#"
> 
> Please use the formatting as in the other parts of sh.md:
> "@
>   jsr @%1%#
>   bsrf%1\n%O2:%#"
> 
> (use tabs and I don't think the embedded newline needs double backslash,
> but please check)

I would think it does, but I've found in the RTL files sometimes extra
escaping is silently accepted, and I'm not sure if omitting it would
visibly break. Can I rely on it producing a visible error right away
if removing it is wrong, or do I need to search the gccint
documentation to figure out what the right way is?

> > +@item --enable-fdpic
> > +On SH Linux systems, generate ELF FDPIC code.
> 
> It should be "GNU/Linux" as far as I know.

I don't want to turn this into a political battle so we can go with
whatever is appropriate for upstream gcc. Note however that, at
present, the only targets this code is useful on are completely
non-GNU Linux (musl-based and not using any GNU userspace on the
target). uClibc may also work if someone digs up the old (untouched
since 2011) superh_fdpic branch.

The original patch said "On SH uClinux systems"; I just changed it to
"Linux" because there's no longer anything uClinux-specific about it.

> > A couple specific questions I have:
> > 
> > - Is the use of self specs (see DRIVER_SELF_SPECS in sh.h) an
> >   acceptable way to set the default? I brought this up before but
> >   don't think anyone answered. I find this method more clear and less
> >   invasive (doesn't requir

[PATCH v2] SH FDPIC backend support

2015-10-05 Thread Rich Felker
On Fri, Oct 02, 2015 at 07:36:27AM +0900, Oleg Endo wrote:
> On Thu, 2015-10-01 at 17:35 -0400, Rich Felker wrote:
> > This is a forward-port of the abandoned SH FDPIC patch from 2010:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01536.html
> > 
> > I'm submitting it at this point for initial review, not to be applied
> > right away; I would not be surprised if some changes are needed. It
> > applies on top of gcc 5.2.0 with the patch for pr 66609 applied. With
> > one trivial change it also applies to the current development version
> > of gcc, but I have not tested that setup.
> 
> Thanks for working on this.  Please submit a patch against trunk.

Attached is the initial version of the patch against trunk. I've fixed
the functional issues I'm aware of from the previous version: ICE in
generating the plain-SH2 libgcc-based shifts, missing
sh_legitimate_constant_p changes, and bad asm spec that broke
non-FDPIC. Cosmetic/style changes have not been made yet.

A couple specific questions I have:

- Is the use of self specs (see DRIVER_SELF_SPECS in sh.h) an
  acceptable way to set the default? I brought this up before but
  don't think anyone answered. I find this method more clear and less
  invasive (doesn't require #ifdef FDPIC_DEFAULT all over the place)
  but if there's a policy reason this can't be done I can rework it.

- For the udiv_qrnnd inline asm, the current patch duplicates the asm
  with a minor change to dereference the function descriptor and get a
  code address. This could be done outside the asm (via type punning
  the function pointer) to slightly improve the resulting code and
  avoid duplicating the asm (a macro would be used to load the code
  address from the function pointer; this is identity macro on
  non-FDPIC and would do the type punning on FDPIC) but if this
  approach would be preferable I need some advice on the form of type
  punning that would be acceptable in GCC.

- For the Changelog, should I just edit the one from the original
  patch (https://gcc.gnu.org/ml/gcc-patches/2010-08/txt00148.txt)
  submitted against 4.5 and add myself to the list of patch authors?

If there are no other functional issues to address I'll go ahead and
switch to the cosmetics and try to make a version that's closer to
ready for commit.

Rich
diff --git a/gcc/config.gcc b/gcc/config.gcc
index bf26776..ed118f3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2621,6 +2621,9 @@ sh-*-elf* | sh[12346l]*-*-elf* | \
tm_file="${tm_file} dbxelf.h elfos.h sh/elf.h"
case ${target} in
sh*-*-linux*)   tmake_file="${tmake_file} sh/t-linux"
+   if test x$enable_fdpic = xyes; then
+   tm_defines="$tm_defines FDPIC_DEFAULT=1"
+   fi
tm_file="${tm_file} gnu-user.h linux.h glibc-stdint.h 
sh/linux.h" ;;
sh*-*-netbsd*)
tm_file="${tm_file} netbsd.h netbsd-elf.h 
sh/netbsd-elf.h"
diff --git a/gcc/config/sh/constraints.md b/gcc/config/sh/constraints.md
index 4d1eb2d..41c88a2 100644
--- a/gcc/config/sh/constraints.md
+++ b/gcc/config/sh/constraints.md
@@ -25,6 +25,7 @@
 ;;  Bsc: SCRATCH - for the scratch register in movsi_ie in the
 ;;   fldi0 / fldi0 cases
 ;; Cxx: Constants other than only CONST_INT
+;;  Ccl: call site label
 ;;  Css: signed 16-bit constant, literal or symbolic
 ;;  Csu: unsigned 16-bit constant, literal or symbolic
 ;;  Csy: label or symbol
@@ -233,6 +234,11 @@
hence mova is being used, hence do not select this pattern."
   (match_code "scratch"))
 
+(define_constraint "Ccl"
+  "A call site label, for bsrf."
+  (and (match_code "unspec")
+   (match_test "XINT (op, 1) == UNSPEC_CALLER")))
+
 (define_constraint "Css"
   "A signed 16-bit constant, literal or symbolic."
   (and (match_code "const")
diff --git a/gcc/config/sh/linux.h b/gcc/config/sh/linux.h
index a9dd43a..5d4dd1f 100644
--- a/gcc/config/sh/linux.h
+++ b/gcc/config/sh/linux.h
@@ -69,7 +69,8 @@ along with GCC; see the file COPYING3.  If not see
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX
-#define SUBTARGET_LINK_EMUL_SUFFIX "_linux"
+#define SUBTARGET_LINK_EMUL_SUFFIX "%{mfdpic:_fd;:_linux}"
+
 #undef SUBTARGET_LINK_SPEC
 #define SUBTARGET_LINK_SPEC \
   "%{shared:-shared} \
diff --git a/gcc/config/sh/sh-c.c b/gcc/config/sh/sh-c.c
index a98c148..01a12e6 100644
--- a/gcc/config/sh/sh-c.c
+++ b/gcc/config/sh/sh-c.c
@@ -141,6 +141,11 @@ sh_cpu_cpp_builtins (cpp_reader* pfile)
 builtin_define ("__HITACHI__");
   if (TARGET_FMOVD)
 builtin_define ("__FMOVD_ENABLED__");
+  if (TARGET_FDPIC)
+{
+  builtin_define ("__

Re: [PATCH] SH FDPIC backend support

2015-10-04 Thread Rich Felker
On Sun, Oct 04, 2015 at 02:10:42PM +0900, Oleg Endo wrote:
> On Sat, 2015-10-03 at 18:34 -0400, Rich Felker wrote:
> > > 
> > > I found and fixed the problem, but I have a new concern: calls to the
> > > new shift instructions are using the following address forms:
> > > 
> > > -mno-fdpic -fPIC:
> > >   .long   __ashlsi3_r0@GOTOFF
> > > 
> > > -mfdpic:
> > >   .long   __ashlsi3_r0-(.LPCS1+2)
> > > 
> > > Neither of these seems valid. Both assume __ashlsi3_r0 will be defined
> > > in the same DSO, which is not true in general; shared libgcc_s.so
> > > might be in use. In this case the call would need to go through the
> > > PLT, which (for PIC or FDPIC) requires r12 to be loaded with the GOT
> > > address. In the non-FDPIC case, r12 _happens_ to contain the GOT
> > > address just because it was used as an addend to get the function
> > > address from the @GOTOFF address, but this does not seem
> > > safe/reliable. In the FDPIC case there's nothing to cause r12 to
> > > contain the GOT address, and in fact if the function has already made
> > > another function call (which uses and clobbers r12), no code is
> > > generated to save and restore r12 for the libgcc call.
> 
> I might be missing something, but usually R12 is preserved across
> function calls.

This is FDPIC-specific. Because there is fundamentally no way for a
function to find its own GOT (it has one GOT for each process using
the code containing the function), its GOT address has to be a
(hidden) argument to the function which arrives in r12.

For calls via the PLT, r12 contains the PLT entry's (i.e. the calling
module's) GOT pointer at the time of the call, and the PLT thunk
replaces it with the callee's GOT pointer (loaded from the function
descriptor) before jumping to the callee code. There is fundamentally
nowhere the PLT thunk could store the old value of r12 and arrange for
it to be restored at return time, so using a PLT forces r12 to be
call-clobbered.

(Note that in the special case where the PLT is bypassed because the
callee is defined in the same module and bound at link-time, the GOT
value loaded by the caller is the right GOT value for the callee
automatically.)

If we didn't care about being able to do PLT calls, there's no
fundamental reason r12 has to be call-clobbered, but it still makes a
lot more sense. Getting back the value of r12 you passed when making a
function call is rarely useful except in the case where the caller
knows the function is defined in the same module (so it can keep using
r12 as its own GOT pointer after the call).

BTW the reason I'm spending time explaining this now is that it's
something we should optimize after the FDPIC patch goes in: I think
the r12-related spills/reload could be made a lot more efficient.

> The special functions in libgcc tell the compiler
> exactly which things they clobber and which not.  R12 is not clobbered
> by the shift functions.

For FDPIC, that implies an assumption that the definition is local to
the calling module (i.e. static-linked) but I think that assumption
already existed for non-FDPIC since r12 was not explicitly set for the
call.

> > > Calls to other functions lib libgcc (e.g. division) seem to work fine
> > > and either go through the PLT or bypass it and load from the GOT
> > > directly. It's only these new special-calling-convention ones that are
> > > broken, and I can't figure out why...
> 
> Sorry, I wasn't paying attention to dynamic linking or *PIC when
> changing the shift patterns back then, so maybe I've screwed up
> something there.
> To me it looks like they do the same thing as expanders for division or
> the SH1 multiplication ("mulsi3" pattern).  Each of the libgcc support
> functions have a different "ABI", so "__ashlsi3_r0" or "__lshrsi3_r0"
> doesn't introduce a new special ABI, it already is as per definition.
> These function calls are not expanded like regular function calls, via
> e.g. (define_expand "call" ... ).  The function call is hidden from the
> regular function call machinery and everything thinks it's a regular
> instruction that just has some special register constraints and
> clobbers.
> 
> I've just tried compiling the following with -m2 -ml -fPIC
> 
> unsigned int test_2 (unsigned int x, unsigned int y)
> {
>   return x << y;
> }
> 
> unsigned int test_3 (unsigned int x, unsigned int y)
> {
>   return x / y;
> }
> 
> And the compiled code is basically identically for both.  For the labels
> I get:
> 
> ..L4: .long   _GLOBAL_OFFSET_TABLE_
> ..L5: .long   

Re: [PATCH] SH FDPIC backend support

2015-10-03 Thread Rich Felker
On Fri, Oct 02, 2015 at 11:18:32AM -0400, Rich Felker wrote:
> > > +#ifdef __FDPIC__
> > > +#define udiv_qrnnd(q, r, n1, n0, d) \
> > > +  do {   
> > > \
> > > +extern UWtype __udiv_qrnnd_16 (UWtype, UWtype)   
> > > \
> > 
> > It's really difficult to spot the subtle difference of the FDPIC version
> > and the non-FDPIC version.  At least there should be a comment.
> 
> OK, I can add a comment; this is appropriate anyway since the way it's
> making the FDPIC call is unconventional.

Before I add comments, can we discuss whether the approach I took is
appropriate? The udiv_qrnnd asm block takes as an operand a function
pointer for __udiv_qrnnd_16 which it calls from asm. The
__udiv_qrnnd_16 function is itself written in asm has a special
contract for register clobbers, and it doesn't need a GOT register.
The non-FDPIC asm calls it via jsr @%5 (%5 is the function pointer)
but on FDPIC the function pointer points to a function descriptor, not
code, so an extra level of indirection is needed. This is actually
inefficient to do in asm because we have to repeat it twice. Normally
an FDPIC call would also require loading the GOT pointer from the
function descriptor, but since this call is local, that can be
skipped.

Another option would be to pass (essentially) *(void**)__udiv_qrnnd_16
instead of __udiv_qrnnd_16 to the asm block; then the existing inline
asm can be used as-is. This could be done via passing
SH_CODE_ADDRESS(__udiv_qrnnd_16) instead of __udiv_qrnnd_16, where
SH_CODE_ADDRESS would be a macro defined to pass through its argument
for non-FDPIC and to extract the code address from the function
descriptor for FDPIC. However I'm not convinced it's clean/safe to do
the above punning. At the very least a may_alias attribute probably
belongs in there somewhere. But an approach like this would reduce
code duplication and slightly improve the size/performance of the
resulting code.

Opinions?

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-03 Thread Rich Felker
On Sat, Oct 03, 2015 at 03:12:16PM -0400, Rich Felker wrote:
> On Thu, Oct 01, 2015 at 09:30:17PM -0400, Rich Felker wrote:
> > But trying the patch on vanilla GCC trunk without my usual J2 target
> > setup revealed some additional issues I need to address. I'm getting
> > ICE in the code that generates the libgcc bitshift calls, which
> > weren't used on J2. This is my fault for failing to extend the changes
> > made to other parts of sh.md to the patterns for the new shifts (the
> > same ones that broke the kernel) and perhaps also some other things.
> > I'm going to go back and review that code and get it done right before
> > resubmitting the patch against trunk.
> 
> I found and fixed the problem, but I have a new concern: calls to the
> new shift instructions are using the following address forms:
> 
> -mno-fdpic -fPIC:
>   .long   __ashlsi3_r0@GOTOFF
> 
> -mfdpic:
>   .long   __ashlsi3_r0-(.LPCS1+2)
> 
> Neither of these seems valid. Both assume __ashlsi3_r0 will be defined
> in the same DSO, which is not true in general; shared libgcc_s.so
> might be in use. In this case the call would need to go through the
> PLT, which (for PIC or FDPIC) requires r12 to be loaded with the GOT
> address. In the non-FDPIC case, r12 _happens_ to contain the GOT
> address just because it was used as an addend to get the function
> address from the @GOTOFF address, but this does not seem
> safe/reliable. In the FDPIC case there's nothing to cause r12 to
> contain the GOT address, and in fact if the function has already made
> another function call (which uses and clobbers r12), no code is
> generated to save and restore r12 for the libgcc call.
> 
> Calls to other functions lib libgcc (e.g. division) seem to work fine
> and either go through the PLT or bypass it and load from the GOT
> directly. It's only these new special-calling-convention ones that are
> broken, and I can't figure out why...

Hmm, according to sh-protos.h:

  /* A special function that should be linked statically.  These are typically
 smaller or not much larger than a PLT entry.
 Some also have a non-standard ABI which precludes dynamic linking.  */
  SFUNC_STATIC

So apparently the strange behavior I observed is intended. Presumably
there is some mechanism to ensure that these functions are always
static-linked? But I don't see it. The libgcc spec I see is:

*libgcc:
%{static|static-libgcc:-lgcc
-lgcc_eh}%{!static:%{!static-libgcc:%{!shared-libgcc:-lgcc --as-needed
-lgcc_s --no-as-needed}%{shared-libgcc:-lgcc_s%{!shared: -lgcc

This explicitly omits -lgcc when -shared-libgcc is used with -shared.
Thankfully __ashlsi3_r0 is not exported from libgcc.so.1 (as far as I
can tell), so this will just be a link error rather than horribly
wrong behavior, but it still seems like there's a bug here unless I'm
misunderstanding something. I think the final %{!shared: -lgcc} in the
spec is an error and should be replaced by simply -lgcc if there are
targets where libgcc.a contains necessary symbols that are not/cannot
be defined in libgcc_s.so.1.

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-03 Thread Rich Felker
On Thu, Oct 01, 2015 at 09:30:17PM -0400, Rich Felker wrote:
> But trying the patch on vanilla GCC trunk without my usual J2 target
> setup revealed some additional issues I need to address. I'm getting
> ICE in the code that generates the libgcc bitshift calls, which
> weren't used on J2. This is my fault for failing to extend the changes
> made to other parts of sh.md to the patterns for the new shifts (the
> same ones that broke the kernel) and perhaps also some other things.
> I'm going to go back and review that code and get it done right before
> resubmitting the patch against trunk.

I found and fixed the problem, but I have a new concern: calls to the
new shift instructions are using the following address forms:

-mno-fdpic -fPIC:
.long   __ashlsi3_r0@GOTOFF

-mfdpic:
.long   __ashlsi3_r0-(.LPCS1+2)

Neither of these seems valid. Both assume __ashlsi3_r0 will be defined
in the same DSO, which is not true in general; shared libgcc_s.so
might be in use. In this case the call would need to go through the
PLT, which (for PIC or FDPIC) requires r12 to be loaded with the GOT
address. In the non-FDPIC case, r12 _happens_ to contain the GOT
address just because it was used as an addend to get the function
address from the @GOTOFF address, but this does not seem
safe/reliable. In the FDPIC case there's nothing to cause r12 to
contain the GOT address, and in fact if the function has already made
another function call (which uses and clobbers r12), no code is
generated to save and restore r12 for the libgcc call.

Calls to other functions lib libgcc (e.g. division) seem to work fine
and either go through the PLT or bypass it and load from the GOT
directly. It's only these new special-calling-convention ones that are
broken, and I can't figure out why...

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-03 Thread Rich Felker
On Sat, Oct 03, 2015 at 05:17:53PM +0900, Oleg Endo wrote:
> On Sat, 2015-10-03 at 00:50 -0400, Rich Felker wrote:
> 
> > I have -mfdpic in the self-specs when FDPIC_DEFAULT is defined, so I
> > think only the positive form is needed. 
> 
> Having positive and negative forms for options makes sense.  It usually
> costs nothing because anyway the compiler internally supports both and
> it allows special-casing if one of them is the default, which can be
> useful for testing.

What I'm saying is that the self-specs approach to FDPIC_DEFAULT has
the compiler driver adding -mfdpic to its own command line (via
%{!mno-fdpic:-mfdpic}) when FDPIC_DEFAULT is defined. This allows
other specs simply to test %{mfdpic:...} rather than having complex
separate forms depending on whether FDPIC_DEFAULT is defined. The
negative form is of course supported too (and suppresses the self-spec
addition of -mfdpic).

I'm not sure if this approach is acceptable upstream in gcc. I like it
a lot better because it isolates this kind of logic as described above
rather than having it spread out all over the place in error-prone
ways. I had a several-line patch for default-pie support that worked
via self-specs too; the one that was actually committed to gcc was
hugely invasive across many files including most targets...

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-02 Thread Rich Felker
On Sat, Oct 03, 2015 at 06:57:56AM +0900, Kaz Kojima wrote:
> Rich Felker  wrote:
> > I worked around it and opened an issue for it:
> > 
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67812
> > 
> > But trying the patch on vanilla GCC trunk without my usual J2 target
> > setup revealed some additional issues I need to address. I'm getting
> > ICE in the code that generates the libgcc bitshift calls, which
> > weren't used on J2. This is my fault for failing to extend the changes
> > made to other parts of sh.md to the patterns for the new shifts (the
> > same ones that broke the kernel) and perhaps also some other things.
> > I'm going to go back and review that code and get it done right before
> > resubmitting the patch against trunk.
> > 
> > If you have any other general comments on the patch in the mean time
> > I'd be happy to hear them.
> 
> FYI, the patch can be applied to trunk almost as is.  I've tried
> to build/make -k check for cross sh4-unknown-linux-gnu.

Yes, there's just one trivial conflict.

> >  #ifndef SUBTARGET_ASM_SPEC
> > -#define SUBTARGET_ASM_SPEC ""
> > +#define SUBTARGET_ASM_SPEC "%{!mno-fdpic:--fdpic}"
> >  #endif
> 
> With it, plain sh4-unknown-linux-gnu compiler adds --fdpic
> to the AS command unless -mno-fdpic is specified and the build
> fails during linking the target libgcc.so.  I've changed it into

Oops.

>  #ifndef SUBTARGET_ASM_SPEC
> -#define SUBTARGET_ASM_SPEC ""
> +#ifdef FDPIC_DEFAULT
> +#define SUBTARGET_ASM_SPEC "%{!mno-fdpic:--fdpic}"
> +#else
> +#define SUBTARGET_ASM_SPEC "%{mfdpic:--fdpic}"
> +#endif
>  #endif

I have -mfdpic in the self-specs when FDPIC_DEFAULT is defined, so I
think only the positive form is needed. If having self specs is not
acceptable, several places need changing: at least the linker
emulation and (in the musl support patch; this is not yet upstream)
changing the logic for the dynamic linker name to have separate cases
for FDPIC_DEFAULT defined/undefined.

> There are no new failures with the top level "make -k check"
> on sh4-unknown-linux-gnu.

Thanks for checking this!

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-02 Thread Rich Felker
On Fri, Oct 02, 2015 at 10:51:03PM +0900, Oleg Endo wrote:
> On Thu, 2015-10-01 at 21:30 -0400, Rich Felker wrote:
> 
> > If you have any other general comments on the patch in the mean time
> > I'd be happy to hear them.
> 
> Below are some comments.  Might be a bit unstructured, I was hopping
> through the patch file.  Sorry about that.

Thanks! This is very helpful. gcc style has changed a lot since the
old patch was submitted so I think it makes sense to update it to
match current practices rather than just making it work. I'll try to
focus on any functional problems first though so as to keep a working
patch against 5.2 as well and ease backporting to earlier versions (if
anyone wants to do that on their own; certainly I don't expect it to
happen in upstream gcc).

> > +function_symbol (rtx target, const char *name, enum sh_function_kind kind, 
> > rtx *lab)
>  ^
> 
> Please do not add unnecessary 'enum', 'struct', 'typedef' etc.  In this
> case it was already here, but since this is is touching the line, please
> remove it.

OK.

> I'd rather make the function 'function_symbol' returning a
> std::pair or something like
> 
> struct function_symbol_result
> {
>   function_symbol_result (void) : symbol (NULL), label (NULL) { }
>   function_symbol_result (rtx s, rtx l) : symbol (s), label (l) { }
> 
>   rtx symbol;
>   rtx label;
> };
> 
> instead of doing return values by pointer-args.  On the caller sites,
> you can then do something like
> 
> rtx lab = function_symbol (func_addr_rtx, "...", SFUNC_STATIC).label;
> 
> This will make the the patch also a few hunks shorter.

There are a few call sites where the symbol returned is actually used.
Would you want me to just do something like:

struct function_symbol_result funcsym = function_symbol(...);

then use funcsym.symbol and funcsym.label?

Would you object to shorter member names .sym and .lab?

> > +extern bool sh_legitimate_constant_p (rtx);
> 
> There is already a target hook/callback function:
> 
> static bool
> sh_legitimate_constant_p (machine_mode mode, rtx x)
> 
> You newly added function is an overload it and I'm not sure who invokes it.

Uhg, not sure how I missed that. (Well, yes I am -- it's C++'s
fault;-) I'll try to figure out what's going on.

> > +extern rtx sh_our_fdpic_reg (void);
> 
> Please rename this to 'sh_get_fdpic_reg_initial_val'.  There's a similar
> function 'sh_get_pr_initial_val' which also uses
> 'get_hard_reg_initial_val'.

OK.

> > +/* An rtx holding the initial value of the FDPIC register (the FDPIC
> > +   pointer passed in from the caller).  */
> > +#define OUR_FDPIC_REG  sh_our_fdpic_reg ()
> > +
> 
> Please remove this macro and add 'sh_get_fdpic_reg_initial_val' to
> sh-protos.h and use that function instead.

OK.

> >  void
> >  prepare_move_operands (rtx operands[], machine_mode mode)
> >  {
> > +  rtx tmp, base, offset;
> > +
> 
> Please declare variables where they are used.

OK.

> > +  if (TARGET_FDPIC)
> > +{
> > +  rtx pic_reg = gen_rtx_REG (Pmode, PIC_REG);
> > +  emit_move_insn (pic_reg, OUR_FDPIC_REG);
> > +}
> > +
> 
> Make this a one-liner
> 
>   emit_move_insn (gen_rtx_REG (Pmode, PIC_REG),
> sh_get_fdpic_reg_initial_val ());

OK.

> > +(define_insn "sibcalli_fdpic"
> > +  [(call (mem:SI (match_operand:SI 0 "register_operand" "k"))
> > +(match_operand 1 "" ""))
> > +   (use (reg:SI FPSCR_MODES_REG))
> > +   (use (reg:SI PIC_REG))
> > +   (return)]
> > +  "TARGET_SH1 && TARGET_FDPIC"
> >^^^
> 
> This is maybe slightly impossible, because of ..

Because SH5 is deprecated?

> > +  if (TARGET_FDPIC
> > +  && (TARGET_SHMEDIA || TARGET_SHCOMPACT || !TARGET_SH2))
> > +sorry ("non-SH2 FDPIC");
> > +
> 
> 
> > +  [(match_operand 0 "" "") (match_operand 1 "" "")]
> > 
> 
> Please don't add empty predicate/constraint strings if not necessary.  In 
> this case
>   [(match_operand 0) (match_operand 1)]
> 
> will suffice.

OK, but I'm not really familiar with this part of the code; I just
adapted the patch by pattern. There are a lot of places with
(match_operand N "" ""); should the empty strings be dropped for all
of them?

> >   if (TARGET_FDPIC)
> > +picreg = OUR_FDPIC_REG;
> > +  else
>

Re: [PATCH] SH FDPIC backend support

2015-10-01 Thread Rich Felker
On Thu, Oct 01, 2015 at 07:39:10PM -0400, Rich Felker wrote:
> On Fri, Oct 02, 2015 at 07:36:27AM +0900, Oleg Endo wrote:
> > On Thu, 2015-10-01 at 17:35 -0400, Rich Felker wrote:
> > > This is a forward-port of the abandoned SH FDPIC patch from 2010:
> > > 
> > > https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01536.html
> > > 
> > > I'm submitting it at this point for initial review, not to be applied
> > > right away; I would not be surprised if some changes are needed. It
> > > applies on top of gcc 5.2.0 with the patch for pr 66609 applied. With
> > > one trivial change it also applies to the current development version
> > > of gcc, but I have not tested that setup.
> > 
> > Thanks for working on this.  Please submit a patch against trunk.
> 
> I'm going to go ahead and submit the patch adjusted for trunk, but I
> have not yet tested it yet -- I can't get trunk to build because of a
> regression. Apparently someone added -fno-PIE to the build process,
> which breaks when the host toolchain you're building with uses -pie by
> default. Is this a known issue?

I worked around it and opened an issue for it:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67812

But trying the patch on vanilla GCC trunk without my usual J2 target
setup revealed some additional issues I need to address. I'm getting
ICE in the code that generates the libgcc bitshift calls, which
weren't used on J2. This is my fault for failing to extend the changes
made to other parts of sh.md to the patterns for the new shifts (the
same ones that broke the kernel) and perhaps also some other things.
I'm going to go back and review that code and get it done right before
resubmitting the patch against trunk.

If you have any other general comments on the patch in the mean time
I'd be happy to hear them.

Rich


Re: [PATCH] SH FDPIC backend support

2015-10-01 Thread Rich Felker
On Fri, Oct 02, 2015 at 07:36:27AM +0900, Oleg Endo wrote:
> On Thu, 2015-10-01 at 17:35 -0400, Rich Felker wrote:
> > This is a forward-port of the abandoned SH FDPIC patch from 2010:
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01536.html
> > 
> > I'm submitting it at this point for initial review, not to be applied
> > right away; I would not be surprised if some changes are needed. It
> > applies on top of gcc 5.2.0 with the patch for pr 66609 applied. With
> > one trivial change it also applies to the current development version
> > of gcc, but I have not tested that setup.
> 
> Thanks for working on this.  Please submit a patch against trunk.

I'm going to go ahead and submit the patch adjusted for trunk, but I
have not yet tested it yet -- I can't get trunk to build because of a
regression. Apparently someone added -fno-PIE to the build process,
which breaks when the host toolchain you're building with uses -pie by
default. Is this a known issue?

Rich


[PATCH] SH FDPIC backend support

2015-10-01 Thread Rich Felker
This is a forward-port of the abandoned SH FDPIC patch from 2010:

https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01536.html

I'm submitting it at this point for initial review, not to be applied
right away; I would not be surprised if some changes are needed. It
applies on top of gcc 5.2.0 with the patch for pr 66609 applied. With
one trivial change it also applies to the current development version
of gcc, but I have not tested that setup.

Aside from direct forward-porting, the following changes from the
original patch have been made:

- The udiv_qrnnd asm fragment in the original patch was utterly
  broken; it clobbered its own registers as part of the FDPIC calls.
  I've taken a different approach (which should also perform better)
  to fixing it.

- Sibcalls are enabled for fdpic; they are valid since r12 is
  call-clobbered.

- flag_pic is always on for FDPIC code generation. Without flag_pic, I
  experienced ICE and codegen issues and was not able to track down
  the cause. Conceptually FDPIC should be treated as PIC anyway.

- Additions from the patch that duplicated code containing the issue
  reported as pr 66609 were fixed in the same manner as the
  corresponding changes in the patch for pr 66609.

- A fix was made to FDPIC-incompatible SH asm in libitm to use a
  PC-relative call rather than a call through the PLT so that loading
  r12 is not necessary.

- The uclinux-target-specific parts have been dropped; with musl I am
  using the regular *-linux-* target tuples. The uclinux tuple stuff
  could be re-added if desired (perhaps for use with uClibc if anyone
  revives the SH FDPIC patches for it?) but I think this should be a
  separate patch.

I'm not entirely happy with the codegen quality at this point;
treating r12 as a fixed register rather than a call-clobbered hidden
argument register seems wrong conceptually and forces more spills than
should be needed. But I would prefer to make improvements in this area
after the basic FDPIC support is polished and committed.

Rich
diff -urp ../baseline/gcc-5.2.0/gcc/config/sh/constraints.md 
gcc-5.2.0/gcc/config/sh/constraints.md
--- ../baseline/gcc-5.2.0/gcc/config/sh/constraints.md  2015-03-23 
18:57:58.0 +
+++ gcc-5.2.0/gcc/config/sh/constraints.md  2015-09-03 17:12:56.462760038 
+
@@ -25,6 +25,7 @@
 ;;  Bsc: SCRATCH - for the scratch register in movsi_ie in the
 ;;   fldi0 / fldi0 cases
 ;; Cxx: Constants other than only CONST_INT
+;;  Ccl: call site label
 ;;  Css: signed 16-bit constant, literal or symbolic
 ;;  Csu: unsigned 16-bit constant, literal or symbolic
 ;;  Csy: label or symbol
@@ -233,6 +234,11 @@
hence mova is being used, hence do not select this pattern."
   (match_code "scratch"))
 
+(define_constraint "Ccl"
+  "A call site label, for bsrf."
+  (and (match_code "unspec")
+   (match_test "XINT (op, 1) == UNSPEC_CALLER")))
+
 (define_constraint "Css"
   "A signed 16-bit constant, literal or symbolic."
   (and (match_code "const")
diff -urp ../baseline/gcc-5.2.0/gcc/config/sh/linux.h 
gcc-5.2.0/gcc/config/sh/linux.h
--- ../baseline/gcc-5.2.0/gcc/config/sh/linux.h 2015-09-04 20:23:46.714785579 
+
+++ gcc-5.2.0/gcc/config/sh/linux.h 2015-09-11 01:48:36.830264737 +
@@ -63,7 +63,8 @@ along with GCC; see the file COPYING3.
 #define GLIBC_DYNAMIC_LINKER "/lib/ld-linux.so.2"
 
 #undef SUBTARGET_LINK_EMUL_SUFFIX
-#define SUBTARGET_LINK_EMUL_SUFFIX "_linux"
+#define SUBTARGET_LINK_EMUL_SUFFIX "%{mfdpic:_fd;:_linux}"
+
 #undef SUBTARGET_LINK_SPEC
 #define SUBTARGET_LINK_SPEC \
   "%{shared:-shared} \
diff -urp ../baseline/gcc-5.2.0/gcc/config/sh/sh-c.c 
gcc-5.2.0/gcc/config/sh/sh-c.c
--- ../baseline/gcc-5.2.0/gcc/config/sh/sh-c.c  2015-01-09 20:18:42.0 
+
+++ gcc-5.2.0/gcc/config/sh/sh-c.c  2015-09-03 18:22:04.182507130 +
@@ -149,6 +149,11 @@ sh_cpu_cpp_builtins (cpp_reader* pfile)
 builtin_define ("__HITACHI__");
   if (TARGET_FMOVD)
 builtin_define ("__FMOVD_ENABLED__");
+  if (TARGET_FDPIC)
+{
+  builtin_define ("__SH_FDPIC__");
+  builtin_define ("__FDPIC__");
+}
   builtin_define (TARGET_LITTLE_ENDIAN
  ? "__LITTLE_ENDIAN__" : "__BIG_ENDIAN__");
 
diff -urp ../baseline/gcc-5.2.0/gcc/config/sh/sh-mem.cc 
gcc-5.2.0/gcc/config/sh/sh-mem.cc
--- ../baseline/gcc-5.2.0/gcc/config/sh/sh-mem.cc   2015-01-15 
13:28:42.0 +
+++ gcc-5.2.0/gcc/config/sh/sh-mem.cc   2015-09-03 17:37:09.436004777 +
@@ -136,11 +136,13 @@ expand_block_move (rtx *operands)
  rtx func_addr_rtx = gen_reg_rtx (Pmode);
  rtx r4 = gen_rtx_REG (SImode, 4);
  rtx r5 = gen_rtx_REG (SImode, 5);
+ rtx lab;
 
- function_symbol (func_addr_rtx, "__movmemSI12_i4", SFUNC_STATIC);
+ function_symbol (func_addr_rtx, "__movmemSI12_i4", SFUNC_STATIC,
+  &lab);
  force_into (XEXP (operands[0], 0), r4);
  force_into (XEXP (operands[1], 0), r5);
- emit_insn (gen_block_move_real_

Re: [PATCH] add static-linked PIE support

2015-09-29 Thread Rich Felker
On Tue, Sep 29, 2015 at 09:34:07PM -0400, Rich Felker wrote:
> This is the gcc side support of the static-linked PIE functionality
> added to binutils in commit 9b8b325a1f4cdaf235e7d803849dde6ededec865:

And unfortunately I wasn't aware of this:

https://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=e9abca4f4a48fa8b1fd9778f6a3cd748e099e3bb

Now I need to figure out the magic spec macros going on there and work
around the fact that default-pie mode has -static implicitly turning
default-pie off while we need it to stay on... Would a new
--enable-default-pie=always mode be appropriate? I don't think people
would be happy with changing the default, especially since glibc does
not have an rcrt1.o (yet)...

Rich


[PATCH] add static-linked PIE support

2015-09-29 Thread Rich Felker
This is the gcc side support of the static-linked PIE functionality
added to binutils in commit 9b8b325a1f4cdaf235e7d803849dde6ededec865:

https://sourceware.org/git/?p=binutils-gdb.git;a=commitdiff;h=9b8b325a1f4cdaf235e7d803849dde6ededec865

I've moved the linking spec change from SUBTARGET_LINK_SPEC, where I
had it in earlier sh-only versions of this patch, to LINK_PIE_SPEC; I
believe the new location is more appropriate and avoids widespread
per-target changes.

A special start file named rcrt1.o must be provided by the C
runtime/standard library to perform relative relocation processing as
part of the program's entry point, since there is no dynamic linker to
do relocation processing.

No attempt is made to suppress passing --no-dynamic-linker to older
versions of ld that don't support it, because use of -static and -pie
together was already broken (it silently produced binaries with an
incorrect PT_INTERP header). If this is a problem I can work on adding
checking to configure, but I think making it conditional would
actually give worse user experience, since upgrading binutils to
support static PIE would then also require rebuilding gcc with the new
binutils available so that support gets detected.

2015-09-14  Rich Felker  

* config/gnu-user.h (GNU_USER_TARGET_STARTFILE_SPEC): use
rcrt1.o for static-linked PIE.
* gcc.c (LINK_PIE_SPEC): pass --no-dynamic-linker to linker
for static-linked PIE.

--- gcc-5.2.0.orig/gcc/config/gnu-user.h2015-01-05 12:33:28.0 
+
+++ gcc-5.2.0/gcc/config/gnu-user.h 2015-08-25 08:15:18.354957759 +
@@ -42,8 +42,8 @@
 
 #if defined HAVE_LD_PIE
 #define GNU_USER_TARGET_STARTFILE_SPEC \
-  "%{!shared: %{pg|p|profile:gcrt1.o%s;pie:Scrt1.o%s;:crt1.o%s}} \
-   crti.o%s %{static:crtbeginT.o%s;shared|pie:crtbeginS.o%s;:crtbegin.o%s} \
+  "%{!shared: 
%{pg|p|profile:gcrt1.o%s;pie:%{static:rcrt1.o%s;:Scrt1.o%s};:crt1.o%s}} \
+   crti.o%s %{shared|pie:crtbeginS.o%s;static:crtbeginT.o%s;:crtbegin.o%s} \
%{fvtable-verify=none:%s; \
  fvtable-verify=preinit:vtv_start_preinit.o%s; \
  fvtable-verify=std:vtv_start.o%s}"
--- gcc-5.2.0.orig/gcc/gcc.c2015-03-10 09:37:41.0 +
+++ gcc-5.2.0/gcc/gcc.c 2015-09-30 00:25:33.225927941 +
@@ -739,7 +739,7 @@
 
 #ifndef LINK_PIE_SPEC
 #ifdef HAVE_LD_PIE
-#define LINK_PIE_SPEC "%{pie:-pie} "
+#define LINK_PIE_SPEC "%{pie:-pie %{static:--no-dynamic-linker}} "
 #else
 #define LINK_PIE_SPEC "%{pie:} "
 #endif


[PATCH v2] fix TLS support detection for sh targets

2015-09-14 Thread Rich Felker
2015-09-14  Rich Felker  

* gcc/configure.ac: Change target pattern for sh TLS support
test from "sh[34]-*-*" to "sh[123456789lbe]*-*-*".
* gcc/configure: Regenerate.

diff --git a/gcc/configure b/gcc/configure
index 846c996..6fb11a7 100755
--- a/gcc/configure
+++ b/gcc/configure
@@ -23977,7 +23977,7 @@ foo:.long   25
tls_first_minor=14
tls_as_opt="-m64 -Aesame --fatal-warnings"
;;
-  sh-*-* | sh[34]-*-*)
+  sh-*-* | sh[123456789lbe]*-*-*)
 conftest_s='
.section ".tdata","awT",@progbits
 foo:   .long   25
diff --git a/gcc/configure.ac b/gcc/configure.ac
index 34c43d5..a6e078a 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -3325,7 +3325,7 @@ foo:  .long   25
tls_first_minor=14
tls_as_opt="-m64 -Aesame --fatal-warnings"
;;
-  sh-*-* | sh[34]-*-*)
+  sh-*-* | sh[123456789lbe]*-*-*)
 conftest_s='
.section ".tdata","awT",@progbits
 foo:   .long   25


Re: [PATCH] fix TLS support detection for sh targets

2015-09-14 Thread Rich Felker
On Mon, Sep 14, 2015 at 06:06:02PM +0100, Szabolcs Nagy wrote:
> On 14/09/15 17:58, Rich Felker wrote:
> >trunk. For the ChangeLog message, do I need to list both configure and
> >configure.ac or just the latter? And should configure be included in
> >the patch like I did, or regenerated when the patch is applied?
> 
> list both
> 
> i think it's ok to fix configure manually

OK thanks! Sorry I missed the thing about configure: Regenerate in the
second link. I'll format the patch properly and send it soon.

Rich


Re: [PATCH] fix TLS support detection for sh targets

2015-09-14 Thread Rich Felker
On Mon, Sep 14, 2015 at 07:08:33AM +0900, Kaz Kojima wrote:
> Rich Felker  wrote:
> > I'm pretty sure this will still apply to trunk, but I can check that
> > and add the changelog entry. Is there something I should read on the
> > form or just follow the example from my last patch where you added it?
> 
> The latter would be enough for this, though
> 
> https://gcc.gnu.org/wiki/ChangeLog
> https://sourceware.org/gdb/wiki/ContributionChecklist#Properly_Formatted_GNU_ChangeLog
> 
> will be handy instructions.

Thanks. I confirmed that the patch as submitted applies cleanly to
trunk. For the ChangeLog message, do I need to list both configure and
configure.ac or just the latter? And should configure be included in
the patch like I did, or regenerated when the patch is applied?

Rich


Re: Reviving SH FDPIC target

2015-09-13 Thread Rich Felker
On Fri, Sep 04, 2015 at 06:04:15PM -0500, Segher Boessenkool wrote:
> On Fri, Sep 04, 2015 at 04:16:40PM -0400, Rich Felker wrote:
> > One thing I've noticed that's odd is that gcc -mfdpic -fPIC produces
> > different (less efficient) code from just gcc -mfdpic, which seems
> > wrong, but agrees with sh.c which has a number of checks for flag_pic
> > not matched with a TARGET_FDPIC check.
> 
> Generic code tests flag_pic in important places as well.
> 
> > I'm thinking all of these
> > should either be flag_pic||TARGET_PIC or flag_pic&&!TARGET_FDPIC,
> > depending on whether the code applies to all PIC or is specific to the
> > non-FDPIC PIC model where r12 is call-saved. Does this sound correct?
> > I think we need spurious -fPIC to work (although it could be handled
> > with spec magic) and not pessimize code, since most library builds
> > will use -fPIC.
> 
> If you never want -fPIC (or -fpic) if fdpic is enabled, you can disable
> it (in sh_option_override)?

It turns out that with !flag_pic, gcc is generating broken code and/or
ICE, and this happens even after changing all the remaining flag_pic
tests in sh.c to flag_pic||TARGET_FDPIC. There are a few more in sh.md
I didn't try changing but they did not look relevant; the ICE came
via expand_binop in prepare_move_operands. Before I look at it
further, is there any reason to expect !flag_pic in the generic code
to break things when the target-specific code has PIC-like
constraints?

For now I just made sh_option_override force flag_pic when
TARGET_FDPIC is set. Note that flag_pic by itself is equivalent to
-fPIE; -fPIC also sets flag_shlib which affects other things like TLS
model and binds_locally interpretation. This seems like a viable
solution (and I got rid of the suboptimal codegen by fixing the
condition in sh_function_ok_for_sibcall so that flag_pic doesn't
preclude sibcalls when TARGET_FDPIC is set) but I'd still like to
figure out why gcc is breaking without flag_pic...

Rich


Re: [PATCH] fix TLS support detection for sh targets

2015-09-13 Thread Rich Felker
On Sun, Sep 13, 2015 at 06:55:56PM +0900, Kaz Kojima wrote:
> Rich Felker  wrote:
> > Bad patterns caused configure to always disable TLS for big-endian sh
> > targets and for anything other than sh 3/4.
> 
> Could you please give a patch for the trunk with an appropriate
> ChangeLog entry?

I'm pretty sure this will still apply to trunk, but I can check that
and add the changelog entry. Is there something I should read on the
form or just follow the example from my last patch where you added it?

Rich


[PATCH] fix TLS support detection for sh targets

2015-09-13 Thread Rich Felker
Bad patterns caused configure to always disable TLS for big-endian sh
targets and for anything other than sh 3/4.

Rich
--- gcc-5.2.0.base/gcc/configure.ac 2015-08-11 16:23:36.0 +
+++ gcc-5.2.0/gcc/configure.ac  2015-09-13 08:17:31.714972082 +
@@ -3300,7 +3300,7 @@
tls_first_minor=14
tls_as_opt="-m64 -Aesame --fatal-warnings"
;;
-  sh-*-* | sh[34]-*-*)
+  sh-*-* | sh[123456789lbe]*-*-*)
 conftest_s='
.section ".tdata","awT",@progbits
 foo:   .long   25
--- gcc-5.2.0.base/gcc/configure2015-08-11 16:23:35.0 +
+++ gcc-5.2.0/gcc/configure 2015-09-13 08:17:42.608304751 +
@@ -23754,7 +23754,7 @@
tls_first_minor=14
tls_as_opt="-m64 -Aesame --fatal-warnings"
;;
-  sh-*-* | sh[34]-*-*)
+  sh-*-* | sh[123456789lbe]*-*-*)
 conftest_s='
.section ".tdata","awT",@progbits
 foo:   .long   25


Re: Reviving SH FDPIC target

2015-09-10 Thread Rich Felker
On Thu, Sep 10, 2015 at 11:49:19PM -0400, Rich Felker wrote:
> On Fri, Sep 04, 2015 at 04:16:40PM -0400, Rich Felker wrote:
> > On Thu, Sep 03, 2015 at 11:53:45AM -0400, Rich Felker wrote:
> > > On Thu, Sep 03, 2015 at 02:58:39PM +, Joseph Myers wrote:
> > > > On Wed, 2 Sep 2015, Rich Felker wrote:
> > > > 
> > > > > So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> > > > > patchset, I think we can safely drop it. And patch 2/3 was already
> > > > > committed, so 3/3, the one I was originally looking at, seems to be
> > > > > all we need. It was approved at the time, so I'll proceed with merging
> > > > > it with 5.2.0.
> > > > 
> > > > Well, obviously if trying dropping patch 1/3 you need to remove 
> > > > everything 
> > > > related to use_initial_val (the feature added in patch 1/3) from patch 
> > > > 3/3.
> > > 
> > > As far as I can tell, the only "use" of use_initial_val is defining
> > > the pseudo-instruction in the md file, which causes the code in patch
> > > 1/3 to use it. I see no other references to it. As I understand, the
> > > breakage from not having it (in the original 4.5-era patch) would be
> > > when introducing references to __fpscr_values later, and no longer
> > > having the GOT pointer, but that code is gone now.
> > 
> > I have this basically working -- obviously no heavy testing yet, and
> > the specs glue is not sufficient to make it practical for others to
> > try it yet, so it'll be a little longer til I have something to post.
> 
> I'm running into a problem with crtstuff.c: the references to
> __TMC_END__, etc. end up using @GOTOFF relocations which the linker
> then rejects. Is this a linker bug or something I need to fix in the
> target code to avoid generating @GOTOFF relocations? Since crtbegin.o
> has no information about what section __TMC_END__ will be in when it's
> defined, I don't see a lot of options except to avoid using @GOTOFF
> entirely for symbols that aren't defined in the same TU.

Well, that was easier than I thought, at least for a quick hack: I
added:

  SYMBOL_REF_EXTERNAL_P(orig) || DECL_SECTION_NAME(SYMBOL_REF_DECL(orig))

to the list of conditions for using @GOT instead of @GOTOFF. This may
actually be optimal under the constraints we have...

Rich


Re: Reviving SH FDPIC target

2015-09-10 Thread Rich Felker
On Fri, Sep 04, 2015 at 04:16:40PM -0400, Rich Felker wrote:
> On Thu, Sep 03, 2015 at 11:53:45AM -0400, Rich Felker wrote:
> > On Thu, Sep 03, 2015 at 02:58:39PM +, Joseph Myers wrote:
> > > On Wed, 2 Sep 2015, Rich Felker wrote:
> > > 
> > > > So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> > > > patchset, I think we can safely drop it. And patch 2/3 was already
> > > > committed, so 3/3, the one I was originally looking at, seems to be
> > > > all we need. It was approved at the time, so I'll proceed with merging
> > > > it with 5.2.0.
> > > 
> > > Well, obviously if trying dropping patch 1/3 you need to remove 
> > > everything 
> > > related to use_initial_val (the feature added in patch 1/3) from patch 
> > > 3/3.
> > 
> > As far as I can tell, the only "use" of use_initial_val is defining
> > the pseudo-instruction in the md file, which causes the code in patch
> > 1/3 to use it. I see no other references to it. As I understand, the
> > breakage from not having it (in the original 4.5-era patch) would be
> > when introducing references to __fpscr_values later, and no longer
> > having the GOT pointer, but that code is gone now.
> 
> I have this basically working -- obviously no heavy testing yet, and
> the specs glue is not sufficient to make it practical for others to
> try it yet, so it'll be a little longer til I have something to post.

I'm running into a problem with crtstuff.c: the references to
__TMC_END__, etc. end up using @GOTOFF relocations which the linker
then rejects. Is this a linker bug or something I need to fix in the
target code to avoid generating @GOTOFF relocations? Since crtbegin.o
has no information about what section __TMC_END__ will be in when it's
defined, I don't see a lot of options except to avoid using @GOTOFF
entirely for symbols that aren't defined in the same TU.

Rich


Re: Reviving SH FDPIC target

2015-09-04 Thread Rich Felker
On Fri, Sep 04, 2015 at 06:04:15PM -0500, Segher Boessenkool wrote:
> On Fri, Sep 04, 2015 at 04:16:40PM -0400, Rich Felker wrote:
> > One thing I've noticed that's odd is that gcc -mfdpic -fPIC produces
> > different (less efficient) code from just gcc -mfdpic, which seems
> > wrong, but agrees with sh.c which has a number of checks for flag_pic
> > not matched with a TARGET_FDPIC check.
> 
> Generic code tests flag_pic in important places as well.

Hmm, these are probably correct: things like default TLS model,
whether external functions defined in the same TU are subject to
interposition, etc. So ignoring -fPIC would be incorrect, and in fact
the different/less-efficient codegen might be correct. But if -fPIE is
generating different code, that's probably a bug (or at least
unwanted).

> > I'm thinking all of these
> > should either be flag_pic||TARGET_PIC or flag_pic&&!TARGET_FDPIC,
> > depending on whether the code applies to all PIC or is specific to the
> > non-FDPIC PIC model where r12 is call-saved. Does this sound correct?
> > I think we need spurious -fPIC to work (although it could be handled
> > with spec magic) and not pessimize code, since most library builds
> > will use -fPIC.
> 
> If you never want -fPIC (or -fpic) if fdpic is enabled, you can disable
> it (in sh_option_override)?

Even if it weren't needed for the above reasons, having it generate an
error would be very problematic from a standpoint of being able to
build packages unmodified. So I probably just need to have all the
conditionals in sh.c right (checking either ||TARGET_FDPIC or
&&!TARGET_FDPIC).

Rich


Re: Reviving SH FDPIC target

2015-09-04 Thread Rich Felker
On Thu, Sep 03, 2015 at 11:53:45AM -0400, Rich Felker wrote:
> On Thu, Sep 03, 2015 at 02:58:39PM +, Joseph Myers wrote:
> > On Wed, 2 Sep 2015, Rich Felker wrote:
> > 
> > > So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> > > patchset, I think we can safely drop it. And patch 2/3 was already
> > > committed, so 3/3, the one I was originally looking at, seems to be
> > > all we need. It was approved at the time, so I'll proceed with merging
> > > it with 5.2.0.
> > 
> > Well, obviously if trying dropping patch 1/3 you need to remove everything 
> > related to use_initial_val (the feature added in patch 1/3) from patch 
> > 3/3.
> 
> As far as I can tell, the only "use" of use_initial_val is defining
> the pseudo-instruction in the md file, which causes the code in patch
> 1/3 to use it. I see no other references to it. As I understand, the
> breakage from not having it (in the original 4.5-era patch) would be
> when introducing references to __fpscr_values later, and no longer
> having the GOT pointer, but that code is gone now.

I have this basically working -- obviously no heavy testing yet, and
the specs glue is not sufficient to make it practical for others to
try it yet, so it'll be a little longer til I have something to post.

One thing I've noticed that's odd is that gcc -mfdpic -fPIC produces
different (less efficient) code from just gcc -mfdpic, which seems
wrong, but agrees with sh.c which has a number of checks for flag_pic
not matched with a TARGET_FDPIC check. I'm thinking all of these
should either be flag_pic||TARGET_PIC or flag_pic&&!TARGET_FDPIC,
depending on whether the code applies to all PIC or is specific to the
non-FDPIC PIC model where r12 is call-saved. Does this sound correct?
I think we need spurious -fPIC to work (although it could be handled
with spec magic) and not pessimize code, since most library builds
will use -fPIC.

Rich


Re: Reviving SH FDPIC target

2015-09-03 Thread Rich Felker
On Thu, Sep 03, 2015 at 02:58:39PM +, Joseph Myers wrote:
> On Wed, 2 Sep 2015, Rich Felker wrote:
> 
> > So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
> > patchset, I think we can safely drop it. And patch 2/3 was already
> > committed, so 3/3, the one I was originally looking at, seems to be
> > all we need. It was approved at the time, so I'll proceed with merging
> > it with 5.2.0.
> 
> Well, obviously if trying dropping patch 1/3 you need to remove everything 
> related to use_initial_val (the feature added in patch 1/3) from patch 
> 3/3.

As far as I can tell, the only "use" of use_initial_val is defining
the pseudo-instruction in the md file, which causes the code in patch
1/3 to use it. I see no other references to it. As I understand, the
breakage from not having it (in the original 4.5-era patch) would be
when introducing references to __fpscr_values later, and no longer
having the GOT pointer, but that code is gone now.

Rich


Re: Reviving SH FDPIC target

2015-09-02 Thread Rich Felker
On Wed, Sep 02, 2015 at 05:05:35PM -0400, Rich Felker wrote:
> On Wed, Sep 02, 2015 at 07:59:45PM +, Joseph Myers wrote:
> > On Wed, 2 Sep 2015, Rich Felker wrote:
> > 
> > > Also, according to Joseph Myers, there was some unresolved
> > > disagreement that stalled (and eventually sunk) the old patch, so if
> > > anyone's still around who has objections to it, could you speak up and
> > > let me know what's wrong? Kaz Kojima seems to have approved the patch
> > > at the time so I'm confused what the issue was/is.
> > 
> > It's patch 1/3 (architecture-independent) that had the disagreement (and 
> > patch 3/3 depends on patch 1/3).
> > 
> > https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01462.html
> 
> So this is only for __fpscr_values? In that case I think the right
> solution is just to follow up with getting rid of __fpscr_values, if
> it's not already done:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60138
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513
> 
> 53513 is marked fixed, but I didn't follow up to confirm that the
> actual problems I reported in 60138 are fixed; I'll do some more
> research on this. But if all goes well, we can just drop 1/3.

I've confirmed that gcc 5.2 does not produce references to
__fpscr_values; instead, it does:

mov.l   .L4,r3
...
sts fpscr,r1
xor r3,r1
lds r1,fpscr
...
.L4:
.long   524288

So if __fpscr_values was the only reason for patch 1/3 in the FDPIC
patchset, I think we can safely drop it. And patch 2/3 was already
committed, so 3/3, the one I was originally looking at, seems to be
all we need. It was approved at the time, so I'll proceed with merging
it with 5.2.0.

Rich


Re: Reviving SH FDPIC target

2015-09-02 Thread Rich Felker
On Wed, Sep 02, 2015 at 07:59:45PM +, Joseph Myers wrote:
> On Wed, 2 Sep 2015, Rich Felker wrote:
> 
> > Also, according to Joseph Myers, there was some unresolved
> > disagreement that stalled (and eventually sunk) the old patch, so if
> > anyone's still around who has objections to it, could you speak up and
> > let me know what's wrong? Kaz Kojima seems to have approved the patch
> > at the time so I'm confused what the issue was/is.
> 
> It's patch 1/3 (architecture-independent) that had the disagreement (and 
> patch 3/3 depends on patch 1/3).
> 
> https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01462.html

So this is only for __fpscr_values? In that case I think the right
solution is just to follow up with getting rid of __fpscr_values, if
it's not already done:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60138
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53513

53513 is marked fixed, but I didn't follow up to confirm that the
actual problems I reported in 60138 are fixed; I'll do some more
research on this. But if all goes well, we can just drop 1/3.

Thanks for the quick reply!

Rich


Reviving SH FDPIC target

2015-09-02 Thread Rich Felker
I've started work on reviving the FDPIC support patch for the SH
target, which was proposed upstream in 2010 then abandoned:

https://gcc.gnu.org/ml/gcc-patches/2010-08/msg01464.html

Right now I'm in the process of determining what parts can be applied
as-is to current gcc, and what parts need to be adapted or rewritten
to account for changes in gcc between the 4.5 era and now (5.2/6.0).

The original patch as posted contained a significant amount of changes
that were unrelated to FDPIC code generation but rather for adding
sh*-uclinux tuples, and some things that even look like they're
associated with the old bFLT format. I am omitting these parts for now
since I'm unfamiliar with the old uClinux stuff and it's unmaintained.
If anyone else wants to use it, I think it would make more sense
factored as a separate patch anyway.

The target I have in mind is SH-2/J2 with musl libc, but uClibc or
even glibc could be made to work with it. I will submit the patches
for musl support (basically, just hooking up the dynamic linker name
for fdpic) separately; I believe they'll need merging with the
already-pending musl support patch from Szabolcs Nagy.

One question I'd like to ask now in case it's a problem that takes a
while to work out -- is copyright assignment already handled for the
old patch? The contributors listed in it are (all codesourcery):

- Daniel Jacobowitz
- Joseph Myers
- Mark Shinwell
- Andrew Stubbs

Also, according to Joseph Myers, there was some unresolved
disagreement that stalled (and eventually sunk) the old patch, so if
anyone's still around who has objections to it, could you speak up and
let me know what's wrong? Kaz Kojima seems to have approved the patch
at the time so I'm confused what the issue was/is.

Rich


Re: [PATCH] add initial support for J2 core to sh target

2015-09-01 Thread Rich Felker
On Wed, Sep 02, 2015 at 01:24:55AM +0900, Oleg Endo wrote:
> > I'm not sure what the best way to achieve multiple goals is, but the
> > current behavior makes it so you need --isa=any (and a final binary
> > with weird ABI tag) to have a binary that supports atomic operations
> > on any SH model. musl libc already has such support (except the new J2
> > CAS instruction) and I would like to eventually provide a libatomic
> > approach for GCC too so that it's possible to use __sync/C11 atomics
> > and have the binary be safe to run on any model that supports the
> > baseline ISA & ABI you built for (e.g. all >=SH2 if you used -m2).
> 
> I don't know the details of your implementation. The compiler
> generated atomic sequences are not really compatible. The safest
> thing is not to enable any atomic model in GCC and let it emit
> function calls to __atomic*.

Exactly -- but then, libatomic.a needs to contain J2-specific cas.l
opcodes and SH4A-specific movli.l/movco.l opcodes and code that
selects at runtime which to use (or whether to use imask or gusa)
based on hwcap, etc. The point is that a mix of opcodes for different
ISA levels end up being in the final binary, which might otherwise be
targeted for SH-2 baseline so it can run on any of them.

> > I have a patch for that part, just not expanding the
> > already-very-complex SH "family-tree" of instruction support. However
> > it's likely that encoding details will change (the draft encoding
> > overlaps with something used by SH2A IIRC, and the intent was not to
> > have such overlap)
> 
> Yeah, it overlaps with the first 16 bit word of the 32 bit SH2A
> load/store insns.
> 
> > so I'm holding off on submitting it until the
> > hardware side works out this issue.
> 
> Sounds reasonable.

In the mean time, do you have any suggestsions on how the ISA level
stuff should be done to add J2 on the binutils side?

Thanks for reviewing.

Rich


Re: [PATCH] add initial support for J2 core to sh target

2015-09-01 Thread Rich Felker
On Tue, Sep 01, 2015 at 10:45:10PM +0900, Oleg Endo wrote:
> Hi Rich,
> 
> On 01 Sep 2015, at 02:49, Rich Felker  wrote:
> 
> > The J2 Core is an open hardware cpu implementing the SH-2 instruction
> > set, with the addition of barrel shift instructions and an atomic
> > compare-and-swap instruction. This patch adds a cpu model option -mj2
> > to the sh target. Presently all it does is enable use of the barrel
> > shift instructions (and turns off assembler checking of the ISA via
> > --isa=any) but I will eventually add support for the new CAS
> > instruction as a new -matomic-model for use by the __sync and __atomic
> > builtins.
> 
> It seems that this J2 atomic instruction(s ?) is not available to
> the public. I've skimmed through the currently available J2 hardware
> sources, but couldn't find anything about it. So it's just
> speculation, but probably you'll require a copyright assignment for
> the atomics parts.

It's still under development and I'm not closely following the
hardware side of things. I'll hold off on submitting the atomic
support until it's in the public release and tested. My hope was to
get the basic support upstream first (having -mj2 is very helpful for
building the kernel since it makes the libgcc issue we were dealing
with go away) and add the atomic part later.

>From the copyright side, though, IMO it's not a matter of whether the
feature is public but the non-triviality of the patch that would make
it require an assignment. My work on this is under contract with SE
Instruments and our arrangement is such that they're responsible for
copyright/assignment on the work I do on FSF projects. AFAIK they
don't yet have any assignment paperwork on file for this so I'll need
some guidance from whoever handles that for GCC.

> > I've used the the name "J2" and "-mj2" rather than treating it as a
> > submodel variant of "sh2" ("SH2J" and "-m2j") because the official
> > name of this cpu model is "J2", with the intent of not misrepresenting
> > it as a Renesas product.
> 
> But then, why do you add "shj2" in config.gcc?

Because the code it's added to just did s/m/sh/ as part of its text
processing. ;-) It's slightly ugly shell script but I'm just working
with what's there.

> I guess having an SH variant which doesn't match sh*-*-* would be
> odd. I'd rather name it "SHJ2" everywhere (except the -m options
> which don't have the "sh" prefix). Technically it really is just
> another SH variant so I'd suggest to treat it like that. Might be
> simpler and less confusing for other things than GCC, too.
> 
> Also you might want to add the new option to that block in config.gcc
> 
>   echo "Unknown CPU used in --with-cpu=$with_cpu, known values:"  1>&2
>   echo "m1 m2 m2e m3 m3e m4 m4-single m4-single-only m4-nofpu" 1>&2
>   echo "m4a m4a-single m4a-single-only m4a-nofpu m4al" 1>&2
>   echo "m2a m2a-single m2a-single-only m2a-nofpu" 1>&2
>   exit 1
>   ;;

Yes, I agree it could be odd from the GCC side. That's actually why I
omitted tuples for now and just wanted to use -mj2 or --with-cpu: I
was concerned configure scripts would blow up on seeing a cpu family
name they don't know.

If you want to just follow the existing naming pattern for tuples, I
think "sh2j" might make more sense than "shj2". It would both be
easier to match (there's probably a lot of software that accepts
sh[:digit:] but not sh[:alnum:]) and it's sufficiently different from
the actual name of the cpu so as not to give the impression that "J2"
is short for "SH J2" or something.

> > The --isa=any passed to the assembler probably needs to be fixed
> > before this patch is ready for upstream (although I'd really just
> > prefer _always_ passing --isa=any to the assembler, since the current
> > behavior breaks runtime-switching implementations, which I need).
> 
> I've added the --isa thing because LD also checks (or at least
> tries) compatibility of modules being linked together. Also, it
> makes it easier to detect wrong/unexpected instructions in the
> output, which can happen sometimes by a bug in GCC or with
> inline-asm. If you have other suggestions, please share.

I'm not sure what the best way to achieve multiple goals is, but the
current behavior makes it so you need --isa=any (and a final binary
with weird ABI tag) to have a binary that supports atomic operations
on any SH model. musl libc already has such support (except the new J2
CAS instruction) and I would 

[PATCH] add initial support for J2 core to sh target

2015-08-31 Thread Rich Felker
The J2 Core is an open hardware cpu implementing the SH-2 instruction
set, with the addition of barrel shift instructions and an atomic
compare-and-swap instruction. This patch adds a cpu model option -mj2
to the sh target. Presently all it does is enable use of the barrel
shift instructions (and turns off assembler checking of the ISA via
--isa=any) but I will eventually add support for the new CAS
instruction as a new -matomic-model for use by the __sync and __atomic
builtins.

I've used the the name "J2" and "-mj2" rather than treating it as a
submodel variant of "sh2" ("SH2J" and "-m2j") because the official
name of this cpu model is "J2", with the intent of not misrepresenting
it as a Renesas product. However I'd like feedback from GCC's side on
how GCC wants to identify J2 in cpu model options, tuples, and
internally; that part is not set in stone.

The --isa=any passed to the assembler probably needs to be fixed
before this patch is ready for upstream (although I'd really just
prefer _always_ passing --isa=any to the assembler, since the current
behavior breaks runtime-switching implementations, which I need). One
complication is that passing --isa=j2 will fail with old binutils; I'm
not sure if this should be detected and handled or if we should just
require up-to-date binutils to use -mj2. In any case, I don't yet have
an assembler-side patch to add the --isa level.

I know this isn't ready for upstream yet but I'd appreciate feedback
on the patch so far and anything that I need to change in order for it
to be acceptable.

Rich
--- gcc-5.2.0.orig/gcc/config.gcc
+++ gcc-5.2.0/gcc/config.gcc
@@ -2668,7 +2671,7 @@
  sh2a-single-only | sh2a-single | sh2a-nofpu | sh2a | \
  sh4a-single-only | sh4a-single | sh4a-nofpu | sh4a | sh4al | \
  sh4-single-only | sh4-single | sh4-nofpu | sh4 | sh4-300 | \
- sh3e | sh3 | sh2e | sh2 | sh1) ;;
+ sh3e | sh3 | sh2e | sh2 | sh1 | shj2 ) ;;
"") sh_cpu_default=${sh_cpu_target} ;;
*)  echo "with_cpu=$with_cpu not supported"; exit 1 ;;
esac
@@ -2687,9 +2690,9 @@
sh_multilibs="`echo $sh_multilibs|sed -e 
s/m4/sh4-nofpu/ -e s/,m4-[^,]*//g -e s/,m[23]e// -e s/m2a,m2a-single/m2a-nofpu/ 
-e s/m5-..m,//g`"
fi
fi
-   target_cpu_default=SELECT_`echo ${sh_cpu_default}|tr 
abcdefghijklmnopqrstuvwxyz- ABCDEFGHIJKLMNOPQRSTUVWXYZ_`
+   target_cpu_default=SELECT_`echo ${sh_cpu_default}|sed 's/^shj/j/'|tr 
abcdefghijklmnopqrstuvwxyz- ABCDEFGHIJKLMNOPQRSTUVWXYZ_`
tm_defines=${tm_defines}' SH_MULTILIB_CPU_DEFAULT=\"'`echo 
$sh_cpu_default|sed s/sh/m/`'\"'
-   tm_defines="$tm_defines SUPPORT_`echo $sh_cpu_default | sed 's/^m/sh/' 
| tr abcdefghijklmnopqrstuvwxyz- ABCDEFGHIJKLMNOPQRSTUVWXYZ_`=1"
+   tm_defines="$tm_defines SUPPORT_`echo $sh_cpu_default | sed -e 
's/^m/sh/' -e 's/^shj/j/' | tr abcdefghijklmnopqrstuvwxyz- 
ABCDEFGHIJKLMNOPQRSTUVWXYZ_`=1"
sh_multilibs=`echo $sh_multilibs | sed -e 's/,/ /g' -e 's/^[Ss][Hh]/m/' 
-e 's/ [Ss][Hh]/ m/g' | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ_ 
abcdefghijklmnopqrstuvwxyz-`
for sh_multilib in ${sh_multilibs}; do
case ${sh_multilib} in
@@ -4106,6 +4109,8 @@
;;
m4a | m4a-single | m4a-single-only | m4a-nofpu | m4al)
;;
+   mj2)
+   ;;
*)
echo "Unknown CPU used in --with-cpu=$with_cpu, known 
values:"  1>&2
echo "m1 m2 m2e m3 m3e m4 m4-single m4-single-only 
m4-nofpu" 1>&2
--- gcc-5.2.0.orig/gcc/config/sh/sh.h
+++ gcc-5.2.0/gcc/config/sh/sh.h
@@ -139,6 +139,7 @@
 #define SELECT_SH2A_SINGLE  (MASK_SH_E | MASK_HARD_SH2A \
  | MASK_FPU_SINGLE | MASK_HARD_SH2A_DOUBLE \
  | MASK_SH2 | MASK_SH1)
+#define SELECT_J2   (MASK_SH2 | MASK_J2 | SELECT_SH1)
 #define SELECT_SH3  (MASK_SH3 | SELECT_SH2)
 #define SELECT_SH3E (MASK_SH_E | MASK_FPU_SINGLE | SELECT_SH3)
 #define SELECT_SH4_NOFPU(MASK_HARD_SH4 | SELECT_SH3)
@@ -162,6 +163,9 @@
 #define SUPPORT_SH2 1
 #endif
 #if SUPPORT_SH2
+#define SUPPORT_J2 1
+#endif
+#ifdef SUPPORT_J2
 #define SUPPORT_SH3 1
 #define SUPPORT_SH2A_NOFPU 1
 #endif
@@ -211,7 +215,7 @@
 #define MASK_ARCH (MASK_SH1 | MASK_SH2 | MASK_SH3 | MASK_SH_E | MASK_SH4 \
   | MASK_HARD_SH2A | MASK_HARD_SH2A_DOUBLE | MASK_SH4A \
   | MASK_HARD_SH4 | MASK_FPU_SINGLE | MASK_SH5 \
-  | MASK_FPU_SINGLE_ONLY)
+  | MASK_FPU_SINGLE_ONLY | MASK_J2)
 
 /* This defaults us to big-endian.  */
 #ifndef TARGET_ENDIAN_DEFAULT
@@ -271,6 +275,7 @@
 %(subtarget_asm_isa_spec) %(subtarget_asm_spec) \
 %{m1:--isa=sh} \
 %{m2:--isa=sh2} \
+%{mj2:--isa=any} \
 %{m2e:--isa=sh2e} \
 %{m3:--isa=sh3} \
 %{m3e:--isa=sh3e} \
@@ -1834,7 +1839,7 @@
 
 /* Nonzero if the target suppor

[PATCH] fix --with-cpu for sh targets

2015-08-26 Thread Rich Felker
A missing * in the pattern for sh targets prevents the --with-cpu
configure option from being accepted for certain targets (e.g. ones
with explicit endianness, like sh2eb).

The latest config.sub should also be pulled from upstream since it has
a fix for related issues.

Rich
--- gcc-5.2.0.orig/gcc/config.gcc
+++ gcc-5.2.0/gcc/config.gcc
@@ -4096,7 +4099,7 @@
esac
;;
 
-   sh[123456ble]-*-* | sh-*-*)
+   sh[123456ble]*-*-* | sh-*-*)
supported_defaults="cpu"
case "`echo $with_cpu | tr ABCDEFGHIJKLMNOPQRSTUVWXYZ_ 
abcdefghijklmnopqrstuvwxyz- | sed s/sh/m/`" in
"" | m1 | m2 | m2e | m3 | m3e | m4 | m4-single | m4-single-only 
| m4-nofpu )


Re: [PATCH 7/13] powerpc musl support

2015-08-24 Thread Rich Felker
On Mon, Aug 24, 2015 at 06:25:14PM +0200, Szabolcs Nagy wrote:
> * David Edelsohn  [2015-08-24 10:21:05 -0400]:
> > Patch v2.
> > 
> > Powerpc does not include the top level linux.h, so I had to
> > repeat the include order fixes from there in rs6000/sysv4.h.
> > 
> > I corrected the endianness handling (the "le" suffix should
> > be added correctly now).
> > 
> > gcc/Changelog:
> > 
> > 2015-04-24  Gregor Richards  
> >Szabolcs Nagy  
> > 
> > * config.gcc (secure_plt): Add *-linux*-musl*.
> > * config/rs6000/linux64.h (MUSL_DYNAMIC_LINKER32): Define.
> > (MUSL_DYNAMIC_LINKER64): Define.
> > (GNU_USER_DYNAMIC_LINKER32): Update.
> > (GNU_USER_DYNAMIC_LINKER64): Update.
> > (CHOOSE_DYNAMIC_LINKER): Update.
> > 
> > * config/rs6000/secureplt.h (LINK_SECURE_PLT_DEFAULT_SPEC): Define.
> > * config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Update.
> > (MUSL_DYNAMIC_LINKER, MUSL_DYNAMIC_LINKER_E,)
> > (INCLUDE_DEFAULTS_MUSL_GPP, INCLUDE_DEFAULTS_MUSL_LOCAL,)
> > (INCLUDE_DEFAULTS_MUSL_PREFIX, INCLUDE_DEFAULTS_MUSL_CROSS,)
> > (INCLUDE_DEFAULTS_MUSL_TOOL, INCLUDE_DEFAULTS_MUSL_NATIVE): Define.
> > (LINK_SECURE_PLT_DEFAULT_SPEC): Define.
> > (CHOOSE_DYNAMIC_LINKER, LINK_TARGET_SPEC, LINK_OS_LINUX_SPEC): Update.
> > 
> > * config/rs6000/sysv4le.h (MUSL_DYNAMIC_LINKER_E): Define.
> > 
> > This patch is generally okay, but why all of the secure PLT changes?
> > 
> 
> i'm not sure.. i kept that from a previous version of the patch
> 
> musl dynamic linker only supports secure plt and it seems
> gcc (configured with --enable-secureplt) does not pass
> --secure-plt to ld, the ld manual says this flag is only
> needed for non-pic code otherwise it will figure out the
> plt abi, so it is probably not needed.  i will check.

Dynamic-linked code can be non-PIC (in the main program) so maybe
that's the problem? I would make sure to test all cases before
removing this.

> >  /* Override the default target of the linker.  */
> >  #define LINK_TARGET_SPEC \
> > -  ENDIAN_SELECT("", " --oformat elf32-powerpcle", "")
> > +  ENDIAN_SELECT("", " --oformat elf32-powerpcle", "") \
> > +  "%{!mbss-plt: %{!msecure-plt: %(link_secure_plt_default)}}"
> > 
> > elf32-powerpcle is not defined and does not exist.  The original
> > change in sysv4.h seems to be an errant search-and-replace error.
> > 
> 
> ok so that line should be removed.
> 
> meanwhile i found another bug in my patch:
> 
> soft float abi is different so it needs a different dynamic
> linker name for musl.  this is problematic because i don't
> see an easy way to determine if the target is soft-float.
> (i will probably have to have a long list of cpu types and
> handle -msoft-float, -mhard-float too which will be ugly in
> the link spec syntax).

Yes. We don't have soft-float ABI support yet, but it's an open
request on the musl mailing list and not having GCC logic for choosing
the right dynamic linker is what's blocking my acceptance of the
patch. Once it looks like the path for doing that is clear on the GCC
side I'll go ahead and add it in musl.

Rich


Re: confirm subscribe to gcc-patches@gcc.gnu.org

2015-06-01 Thread Rich Felker


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-20 Thread Rich Felker
On Wed, May 20, 2015 at 02:10:41PM +0200, Michael Matz wrote:
> Hi,
> 
> On Tue, 19 May 2015, Richard Henderson wrote:
> 
> > It is.  The relaxation that HJ is working on requires that the reads 
> > from the got not be hoisted.  I'm not especially convinced that what 
> > he's working on is a win.
> > 
> > With LTO, the compiler can do the same job that he's attempting in the 
> > linker, without an extra nop.  Without LTO, leaving it to the linker 
> > means that you can't hoist the load and hide the memory latency.
> 
> Well, hoisting always needs a register, and if hoisted out of a loop 
> (which you all seem to be after) that register is live through the whole 
> loop body.  You need a register for each different called function in such 
> loop, trading the one GOT pointer with N other registers.  For 
> register-starved machines this is a real problem, even x86-64 doesn't have 
> that many.  I.e. I'm not convinced that this hoisting will really be much 
> of a win that often, outside toy examples.  Sure, the compiler can hoist 
> function addresses trivially, but I think it will lead to spilling more 
> often than not, or alternatively the hoisting will be undone by the 
> register allocators rematerialization.  Of course, this would have to be 
> measured for real not hand-waved, but, well, I'd be surprised if it's not 
> so.

The obvious example where it's useful on x86_64 is a major class:
anything where the majority of the callee's data is floating point and
thus kept in xmm registers. In that case register pressure is a lot
lower, and there's also an obvious class of cross-DSO functions calls
you'd be making over and over again: anything from libm.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 05:10:11PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:54 PM, Rich Felker  wrote:
> > On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> >> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  
> >> >> wrote:
> >> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson 
> >> >> >>  wrote:
> >> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >> >> >>>> I'm still mildly worried that concerns for supporting
> >> >> >>>> relaxation might lead to decisions not to optimize code in ways 
> >> >> >>>> that
> >> >> >>>> would be difficult to relax (e.g. certain types of address load
> >> >> >>>> reordering or hoisting) but I don't understand GCC internals
> >> >> >>>> sufficiently to know if this concern is warranted or not.
> >> >> >>>
> >> >> >>> It is.  The relaxation that HJ is working on requires that the 
> >> >> >>> reads from the
> >> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
> >> >> >>> working on is
> >> >> >>> a win.
> >> >> >>>
> >> >> >>> With LTO, the compiler can do the same job that he's attempting in 
> >> >> >>> the linker,
> >> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
> >> >> >>> that you
> >> >> >>> can't hoist the load and hide the memory latency.
> >> >> >>>
> >> >> >>
> >> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> >> It simply turns indirect branch into direct branch with a nop prefix 
> >> >> >> at
> >> >> >> link-time.  I am having a hard time to understand why we shouldn't 
> >> >> >> do it.
> >> >> >
> >> >> > I well understand what you're doing.
> >> >> >
> >> >> > But my point is that the only time the compiler should present you 
> >> >> > with the
> >> >> > form of indirect branch you're looking for is when there's no place 
> >> >> > to hoist
> >> >> > the load.
> >> >> >
> >> >> > At which point, is it really worth adding a new relocation to the 
> >> >> > ABI?  Is it
> >> >> > really worth adding new code to the linker that won't be exercised 
> >> >> > often?
> >> >>
> >> >> I believe there are plenty of indirect branches via GOT when compiling
> >> >> PIE/PIC with -fno-plt:
> >> >>
> >> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> >> extern void foo (void);
> >> >>
> >> >> void
> >> >> bar (void)
> >> >> {
> >> >>   foo ();
> >> >> }
> >> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> >> [hjl@gnu-6 gcc]$ cat x.s
> >> >> ..file "x.c"
> >> >> ..section .text.unlikely,"ax",@progbits
> >> >> ..LCOLDB0:
> >> >> ..text
> >> >> ..LHOTB0:
> >> >> ..p2align 4,,15
> >> >> ..globl bar
> >> >> ..type bar, @function
> >> >> bar:
> >> >> ..LFB0:
> >> >> ..cfi_startproc
> >> >> jmp *foo@GOTPCREL(%rip)
> >> >> ..cfi_endproc
> >> >> ..LFE0:
> >> >> ..size bar, .-bar
> >> >
> >> > I agree these exist. What I question is whether the savings from the
> >> > linker being able to relax this to a direct call in the case where the
> >> > programmer failed to let the compiler make it a direct call to begin
> >> > with (by using hidden or protected visibility) are worth the cost of
> >> > not being able to hoist the load out of loops or schedule it earlier
> >> > in cases where relaxation is not possible because the call target is
> >> > not defined in the same DSO.
> >>
> >> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> >>
> >> [hjl@gnu-mic-2 gas]$ file as-new
> >> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> >> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> >> stripped
> >> [hjl@gnu-mic-2 gas]$
> >>
> >> There are 43:
> >>
> >> ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 
> >> <_DYNAMIC+0x1e8>
> >>
> >> and 1983
> >>
> >> ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 
> >> <_DYNAMIC+0x370>
> >
> > How many of those would be relaxed? I suspect it depends a lot on
> > whether libbfd is static or shared.
> 
> When shared libraries are enabled, there are 177 indirect branches
> to locally defined functions.  Call to any locally defined functions,
> which aren't compiled with LTO, is indirect.

And are the above indirect calls/jumps (1983+43) candidates for
scheduling/hoisting the address load (that's not being done yet), or
are they the ones the compiler opted not to schedule/hoist? The win
from relaxation seems small here, but as long as you're not going to
block optimizations that would preclude relaxing, I don't see any
disadvantages to doing it.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 01:27:06PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 1:15 PM, Rich Felker  wrote:
> > On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  
> >> wrote:
> >> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
> >> >> wrote:
> >> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >> >>>> I'm still mildly worried that concerns for supporting
> >> >>>> relaxation might lead to decisions not to optimize code in ways that
> >> >>>> would be difficult to relax (e.g. certain types of address load
> >> >>>> reordering or hoisting) but I don't understand GCC internals
> >> >>>> sufficiently to know if this concern is warranted or not.
> >> >>>
> >> >>> It is.  The relaxation that HJ is working on requires that the reads 
> >> >>> from the
> >> >>> got not be hoisted.  I'm not especially convinced that what he's 
> >> >>> working on is
> >> >>> a win.
> >> >>>
> >> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >> >>> linker,
> >> >>> without an extra nop.  Without LTO, leaving it to the linker means 
> >> >>> that you
> >> >>> can't hoist the load and hide the memory latency.
> >> >>>
> >> >>
> >> >> My relax approach won't take away any optimization done by compiler.
> >> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> >> link-time.  I am having a hard time to understand why we shouldn't do 
> >> >> it.
> >> >
> >> > I well understand what you're doing.
> >> >
> >> > But my point is that the only time the compiler should present you with 
> >> > the
> >> > form of indirect branch you're looking for is when there's no place to 
> >> > hoist
> >> > the load.
> >> >
> >> > At which point, is it really worth adding a new relocation to the ABI?  
> >> > Is it
> >> > really worth adding new code to the linker that won't be exercised often?
> >>
> >> I believe there are plenty of indirect branches via GOT when compiling
> >> PIE/PIC with -fno-plt:
> >>
> >> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> >> extern void foo (void);
> >>
> >> void
> >> bar (void)
> >> {
> >>   foo ();
> >> }
> >> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> >> [hjl@gnu-6 gcc]$ cat x.s
> >> ..file "x.c"
> >> ..section .text.unlikely,"ax",@progbits
> >> ..LCOLDB0:
> >> ..text
> >> ..LHOTB0:
> >> ..p2align 4,,15
> >> ..globl bar
> >> ..type bar, @function
> >> bar:
> >> ..LFB0:
> >> ..cfi_startproc
> >> jmp *foo@GOTPCREL(%rip)
> >> ..cfi_endproc
> >> ..LFE0:
> >> ..size bar, .-bar
> >
> > I agree these exist. What I question is whether the savings from the
> > linker being able to relax this to a direct call in the case where the
> > programmer failed to let the compiler make it a direct call to begin
> > with (by using hidden or protected visibility) are worth the cost of
> > not being able to hoist the load out of loops or schedule it earlier
> > in cases where relaxation is not possible because the call target is
> > not defined in the same DSO.
> 
> Just for fun.  I compiled binutils as PIE with -fno-plt -flto:
> 
> [hjl@gnu-mic-2 gas]$ file as-new
> as-new: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV),
> dynamically linked (uses shared libs), for GNU/Linux 2.6.32, not
> stripped
> [hjl@gnu-mic-2 gas]$
> 
> There are 43:
> 
> ff 25 21 93 2d 00 jmpq   *0x2d9321(%rip)# 3d5f58 <_DYNAMIC+0x1e8>
> 
> and 1983
> 
> ff 15 eb f4 38 00 callq  *0x38f4eb(%rip)# 3d60e0 <_DYNAMIC+0x370>

How many of those would be relaxed? I suspect it depends a lot on
whether libbfd is static or shared.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 12:17:18PM -0700, H.J. Lu wrote:
> On Tue, May 19, 2015 at 12:11 PM, Richard Henderson  wrote:
> > On 05/19/2015 12:06 PM, H.J. Lu wrote:
> >> On Tue, May 19, 2015 at 11:59 AM, Richard Henderson  
> >> wrote:
> >>> On 05/19/2015 11:06 AM, Rich Felker wrote:
> >>>> I'm still mildly worried that concerns for supporting
> >>>> relaxation might lead to decisions not to optimize code in ways that
> >>>> would be difficult to relax (e.g. certain types of address load
> >>>> reordering or hoisting) but I don't understand GCC internals
> >>>> sufficiently to know if this concern is warranted or not.
> >>>
> >>> It is.  The relaxation that HJ is working on requires that the reads from 
> >>> the
> >>> got not be hoisted.  I'm not especially convinced that what he's working 
> >>> on is
> >>> a win.
> >>>
> >>> With LTO, the compiler can do the same job that he's attempting in the 
> >>> linker,
> >>> without an extra nop.  Without LTO, leaving it to the linker means that 
> >>> you
> >>> can't hoist the load and hide the memory latency.
> >>>
> >>
> >> My relax approach won't take away any optimization done by compiler.
> >> It simply turns indirect branch into direct branch with a nop prefix at
> >> link-time.  I am having a hard time to understand why we shouldn't do it.
> >
> > I well understand what you're doing.
> >
> > But my point is that the only time the compiler should present you with the
> > form of indirect branch you're looking for is when there's no place to hoist
> > the load.
> >
> > At which point, is it really worth adding a new relocation to the ABI?  Is 
> > it
> > really worth adding new code to the linker that won't be exercised often?
> 
> I believe there are plenty of indirect branches via GOT when compiling
> PIE/PIC with -fno-plt:
> 
> [hjl@gnu-6 gcc]$ cat /tmp/x.c
> extern void foo (void);
> 
> void
> bar (void)
> {
>   foo ();
> }
> [hjl@gnu-6 gcc]$ ./xgcc -B./ -fPIC -O3 -S /tmp/x.c -fno-plt
> [hjl@gnu-6 gcc]$ cat x.s
> ..file "x.c"
> ..section .text.unlikely,"ax",@progbits
> ..LCOLDB0:
> ..text
> ..LHOTB0:
> ..p2align 4,,15
> ..globl bar
> ..type bar, @function
> bar:
> ..LFB0:
> ..cfi_startproc
> jmp *foo@GOTPCREL(%rip)
> ..cfi_endproc
> ..LFE0:
> ..size bar, .-bar

I agree these exist. What I question is whether the savings from the
linker being able to relax this to a direct call in the case where the
programmer failed to let the compiler make it a direct call to begin
with (by using hidden or protected visibility) are worth the cost of
not being able to hoist the load out of loops or schedule it earlier
in cases where relaxation is not possible because the call target is
not defined in the same DSO.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 11:59:00AM -0700, Richard Henderson wrote:
> On 05/19/2015 11:06 AM, Rich Felker wrote:
> > I'm still mildly worried that concerns for supporting
> > relaxation might lead to decisions not to optimize code in ways that
> > would be difficult to relax (e.g. certain types of address load
> > reordering or hoisting) but I don't understand GCC internals
> > sufficiently to know if this concern is warranted or not.
> 
> It is.  The relaxation that HJ is working on requires that the reads from the
> got not be hoisted.  I'm not especially convinced that what he's working on is
> a win.

Well as long as -fno-plt actually generates a load from the GOT like
what would be done for data access, and does not go out of its way to
produce something compatible with relaxation, my hope is that it would
not affected by the pessimization. I'm not sure if that's the case
though.

> With LTO, the compiler can do the same job that he's attempting in the linker,
> without an extra nop.  Without LTO, leaving it to the linker means that you
> can't hoist the load and hide the memory latency.

Yes, this is my feeling too. Alexander Monakov have been discussing it
on #musl a bit and I think the conclusion we reached is that
relaxation is possibly a significant real-world win for non-PIC main
executables, where it's very likely that addresses will be resolved at
ld-time and for the programmer not to specifically annotate this with
protected visibility. In such a case, you get either a direct call or
a direct address load and indirect call, rather than hitting an extra
cache line in the PLT thunk to do the address load and indirect call.
Note that, being non-PIC, there is no GOT register involved here.

> > I would still like to see the @GOTPCREL stuff added and used instead
> > of @GOT, as I mentioned earlier in the thread, but I agree that's
> > independent of relaxation support and shouldn't block it.
> 
> I don't think that @GOTPCREL for 32-bit is a good idea.  This is the scheme
> that Darwin uses, so we do have some experience with it.
> 
> In order for it to work you've got to have a pointer to a random address in 
> the
> function.  It means that you can only "easily" compute the address once.  If
> you need the value again you wind up with the same "extra" addl insn that we
> have with the current GOT pointer.

Why would you recompute it (this requires a fairly expensive call that
reads or pops its own return address) rather than simply spilling the
already-computed value and reloading it from the stack?

The only example I can think of where it might make sense is when you
don't want to load the address unconditionally because there are
shrink-wrappable code paths that don't need it, but multple code paths
that do, in which case they would each load different values. Is this
the concern you have in mind?

> We've just started to do inter-function register allocation.  The next step
> along those lines is to share the computation of GOT between multiple
> functions.  At which point it really helps to have one global base address to
> talk about.

I see -- that would be another case where it simplifies things.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 06:01:07PM +0200, Michael Matz wrote:
> Hi,
> 
> On Tue, 19 May 2015, Jeff Law wrote:
> 
> > > > Forget lazy binding. It's dead anyway because serious distros want 
> > > > PIE+relro+bindnow+...
> > > 
> > > You keep saying this, but I can't help the feeling it's mostly because 
> > > musl doesn't support it ;-)
> > 
> > FWIW, Red Hat is pushing PIE & partial RELRO deeper and deeper into the 
> > distribution.
> 
> Yeah, us as well, though I don't necessarily see the point for most 
> packages; feels a bit like a checkmark item :)

These days it's fairly rare to have software which does not interact
at all with untrusted data. Consider how much user-facing application
software that was not previously considered security-critical is
making network connections using complex protocols (e.g. anything with
TLS, IM protocols, ...), opening image files from random sources
(attachments, files that happen to be in a browsed-to directory, on
USB sticks, etc.), and so on. I think it's smart to be hardening
everything, at least for distros providing all sorts of random
unvetted software.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-19 Thread Rich Felker
On Tue, May 19, 2015 at 04:43:53PM +0200, Michael Matz wrote:
> Hi,
> 
> On Fri, 15 May 2015, Rich Felker wrote:
> 
> > Forget lazy binding. It's dead anyway because serious distros want
> > PIE+relro+bindnow+...
> 
> You keep saying this, but I can't help the feeling it's mostly because 
> musl doesn't support it ;-)

Well the reasons musl doesn't support it are partly the above, and
partly that it's been a continuous source of subtle bugs in glibc --
things like clobbering new vector registers, missing synchronization,
failures to be async-signal-safe, etc. So it's not that I think lazy
binding is bad because musl doesn't support it, but rather that musl
doesn't support lazy binding because I think it's bad. :-)

> No, you don't have to use bindnow to get the effects of relro.  Sure 
> there's more parts of the GOT protected with it, but if that's really that 
> much more hardened is up for debate.

Normally it's function addresses that you care about protecting --
they're the easy vector for arbitrary code execution -- and they're
unprotected without bindnow. Addresses of global data could also be an
attack vector, but a more difficult one to exploit.

> > If people really want lazy binding, they can use options which support 
> > it, but I don't want to keep suffering the codegen cost of lazy binding 
> > despite never using it.
> 
> > There should be an option to generate optimal code equivalent to what 
> > you get with Alexander Monakov's patches for those of us who aren't 
> > trying to support this legacy feature that precludes good performance 
> > and precludes hardening.
> 
> H.J.'s branch is for _improving_ code on top of the no-plt code, it's not 
> replacing it or an alternative for it.

Thanks for the clarification -- this was the part I was failing to
understand. I'm still mildly worried that concerns for supporting
relaxation might lead to decisions not to optimize code in ways that
would be difficult to relax (e.g. certain types of address load
reordering or hoisting) but I don't understand GCC internals
sufficiently to know if this concern is warranted or not. As long as
his work isn't interfering with the ability of -fno-plt to generate
optimal code, I agree it's both inappropriate and counter-productive
for me to be objecting to part or all of it.

I would still like to see the @GOTPCREL stuff added and used instead
of @GOT, as I mentioned earlier in the thread, but I agree that's
independent of relaxation support and shouldn't block it.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-16 Thread Rich Felker
On Sat, May 16, 2015 at 11:59:56AM -0700, H.J. Lu wrote:
> On Sat, May 16, 2015 at 7:19 AM, H.J. Lu  wrote:
> > On Fri, May 15, 2015 at 4:49 PM, Rich Felker  wrote:
> >> On Fri, May 15, 2015 at 04:34:57PM -0700, H.J. Lu wrote:
> >>> On Fri, May 15, 2015 at 4:30 PM, H.J. Lu  wrote:
> >>> > On Fri, May 15, 2015 at 4:14 PM, H.J. Lu  wrote:
> >>> >> My relax branch proposal works even without LTO.
> >>> >>
> >>> >
> >>> > I will borrow GOTPCREL from x86-64 and do
> >>> >
> >>> > [hjl@gnu-6 relax-4]$ cat b.S
> >>> > call *foo@GOTPCREL(%eax)
> >>>
> >>> call *foo@GOTPLT(%eax)
> >>>
> >>> is a better choice.
> >>
> >> foo@GOTPCREL is preferable (but does not yet exist for ia32, so the
> >> reloc type would have to be added) since it saves a useless add.
> >> Instead of:
> >>
> >> call __x86.get_pc_thunk.ax
> >> addl $_GLOBAL_OFFSET_TABLE_, %eax
> >> call *foo@GOTPLT(%eax)
> >>
> >> you can just do:
> >>
> >> call __x86.get_pc_thunk.ax
> >> call *foo@GOTPCREL(%eax)
> >>
> >> Note that it also works to have extra instructions between:
> >>
> >> call __x86.get_pc_thunk.ax
> >> 1:  ...
> >> call *foo@GOTPCREL+(1b-.)(%eax)
> >>
> >> I may not have gotten the syntax quite right, but hopefully yoy get
> >> the idea. This same approach (with GOTPCREL) can be used for _all_ GOT
> >> accesses, including global data, to eliminate the useless add.
> >>
> >
> > This is a good idea.  But I'd like to use something for both i386 and
> > x86-64.  I am proposing
> >
> > call/jmp *foo@GOTPCRELAX+addend(%reg)
> >
> > It is similar to @GOTPCREL, but with a new relax relocation.  Before
> > I can do that, I need to fix
> 
> It doesn't work.  REG must hold GOT base for other GOT relocations.
> We need to keep
> 
> addl $_GLOBAL_OFFSET_TABLE_, %eax

Like I just said, all foo@GOT(%gotreg) can be replaced with
foo@GOTPCREL+[label-.](%labelreg) where %labelreg is a register
pointing to the referenced label (the point at which the program
counter was saved). This is a minor but useful optimization that can
be made for all GOT accesses, not just ones for [relaxable] function
calls.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-15 Thread Rich Felker
On Fri, May 15, 2015 at 04:34:57PM -0700, H.J. Lu wrote:
> On Fri, May 15, 2015 at 4:30 PM, H.J. Lu  wrote:
> > On Fri, May 15, 2015 at 4:14 PM, H.J. Lu  wrote:
> >> My relax branch proposal works even without LTO.
> >>
> >
> > I will borrow GOTPCREL from x86-64 and do
> >
> > [hjl@gnu-6 relax-4]$ cat b.S
> > call *foo@GOTPCREL(%eax)
> 
> call *foo@GOTPLT(%eax)
> 
> is a better choice.

foo@GOTPCREL is preferable (but does not yet exist for ia32, so the
reloc type would have to be added) since it saves a useless add.
Instead of:

call __x86.get_pc_thunk.ax
addl $_GLOBAL_OFFSET_TABLE_, %eax
call *foo@GOTPLT(%eax)

you can just do:

call __x86.get_pc_thunk.ax
call *foo@GOTPCREL(%eax)

Note that it also works to have extra instructions between:

call __x86.get_pc_thunk.ax
1:  ...
call *foo@GOTPCREL+(1b-.)(%eax)

I may not have gotten the syntax quite right, but hopefully yoy get
the idea. This same approach (with GOTPCREL) can be used for _all_ GOT
accesses, including global data, to eliminate the useless add.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-15 Thread Rich Felker
On Fri, May 15, 2015 at 04:14:07PM -0700, H.J. Lu wrote:
> On Fri, May 15, 2015 at 4:08 PM, Jan Hubicka  wrote:
> > Hello,
> >>
> >> There are codes like
> >>
> >> extern void foo (void);
> >>
> >> void
> >> bar (void)
> >> {
> >>   foo ();
> >> }
> >>
> >> Even with LTO, compiler may have to assume foo is external
> >> when foo is compiled with LTO.
> >
> > This is not exactly true if FOO is defined in other translation unit
> > compiled with LTO and hidden visibility.
> 
> I was meant to say " when foo is compiled without LTO.".
> 
> > OK, so as I get it, we get the following cases:
> >
> >  1) compiler knows it is generating call to a local symbol a current
> > unit (binds_to_current_def_p returns true).
> >
> > We handle this correctly by doing IP relative call.
> >
> >  2) compiler knows it is generating call to a local symbol in DSO
> > (binds_local_p return true)
> > Currently I think this is only the -fno-pic case or case of explicit
> > hidden visibility and in this case we do IP relative call.
> >
> > We may want to propose plugin API update adding PREVAILING_DEF_EXP.
> > So copiler would be able to default to this case for PREVAILING_DEF
> > and we will also catch cases where the symbol is defined in current
> > DSO as weak symbol, but the definition is not LTO.
> > This would be also way to communicate -Bsymbolic[-functions] across
> > the plugin API.
> >
> >  3) compiler knows there is going to be definition in the current DSO
> > (by seeing a COMDAT function body or resolution info) that is 
> > interposable
> > but because the function is inline or -fno-semantic-interposition 
> > happens,
> > the semantics will not change.
> >
> > In this case it would be nice to arrange IP relative call to the
> > hidden alias.  This may require an extension both on compiler and linker
> > side.
> >
> > I was thinking of doing so for comdats by adding hidden alias with
> > fixed mangling, like __gnu_.hiddenalias, and referring it.
> > But I think it is not safe as linker may throw away section that
> > is produced by GCC and prevail section that is not leaving to an 
> > undefined
> > symbol?
> >
> > I think this is rather common case in C++ (never made any stats) because
> > uninlined comdats are quite common.
> >
> >  4) compiler has no clue but linker may know better
> >
> > Here we traditionally always produce a PLT call.  In cases the call
> > is known to be hot in the program it makes sense to trade lazy binding
> > for performance and produce call via GOT reference (-fno-plt).
> > I also see that H.J.'s branch helps us to actually avoid the GOT
> > reference in cases the symbol ends up binding locally. How the lazy
> > binding with relaxation works?
> 
> If there is no GOT slot allocated for symbol foo, linker should resolve
> foo@GOTPLT(%ebx) to to its PLT slot address + 6, which is the push
> instruction, to support  lazy binding.  Otherwise, linker should resolve it
> to its GOT slot address.

Forget lazy binding. It's dead anyway because serious distros want
PIE+relro+bindnow+... If people really want lazy binding, they can use
options which support it, but I don't want to keep suffering the
codegen cost of lazy binding despite never using it. There should be
an option to generate optimal code equivalent to what you get with
Alexander Monakov's patches for those of us who aren't trying to
support this legacy feature that precludes good performance and
precludes hardening.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-15 Thread Rich Felker
On Fri, May 15, 2015 at 01:35:14PM -0700, H.J. Lu wrote:
> On Fri, May 15, 2015 at 1:23 PM, Rich Felker  wrote:
> > On Fri, May 15, 2015 at 01:08:15PM -0700, H.J. Lu wrote:
> >> With relax branch in 32-bit, there are 2 cases:
> >>
> >> 1. PIC or PIE:  We generate
> >>
> >> set up EBX
> >> relax call foo@PLT
> >>
> >> It is almost the same as we do now, except for the relax prefix.
> >> If foo is defined in another shared library or may be preempted,
> >> linker will generate
> >>
> >> call *foo@GOTPLT(%ebx)
> >>
> >> If foo turns out local, linker will output
> >>
> >> relax call foo
> >
> > This does not address the initial and primary motivation for no-plt on
> > 32-bit: eliminating the awful codegen constraint costs of the
> > GOT-register (ebx, and equivalent on other targets) ABI for calling
> > PLT entries. If instead you generated code that sets up an expression
> > for the GOT slot using arbitrary registers, and relaxed it to a direct
> > call (possibly rendering the register setup useless), it would be
> > comparable to the no-plt approach. So for example:
> >
> > set up ecx (or whatever register)
> > relax call *foo@GOT(%ecx)
> >
> > and relax to:
> >
> > set up ecx (or whatever register; now useless)
> > relax call foo
> >
> > But the no-plt approach is still superior in that the address load
> > from the GOT can be hoisted out of loops, etc., resulting in something
> > like:
> >
> > call *%esi
> >
> > This could be valuable in loops calling a math function repeatedly,
> > for example.
> >
> > Overall I'm still not a fan of the relaxation approach. There are very
> > few places it would actually help that couldn't already be improved
> > better with use of visibility, and it can't give codegen as good as
> > no-plt option.
> 
> With no-plt option, compiler has to know if a function is external
> or may be preempted.

I still don't see significant practical cases where the linker would
know this but the compiler can't. If you use visibility properly, the
compiler knows, and if you do LTO and -Bsymbolic[-functions], the
compiler should have that information available at LTO time (this is
an enhancement that needs to be made, though).

> If compiler guessed wrong, the generated
> DSO or executable will always go through indirect branch even
> though the target is local.

The only way this is avoided now is with -Bsymbolic[-functions] which
is not widely used. Otherwise interposition is always allowed for
default-visibility functions, so I don't see how the indirect branch
here is suboptimal.

> With relax branch, the decision is left
> to linker.  Of course, EBX must be used unless we add a new PLT
> relocation for each register used to to hold GOT base, like
> 
> relax call foo@PLT_ECX
> relax call foo@PLT_EDX

No, that's not needed. If the linker doesn't make the relaxation, the
instruction the compiler generated remains in place, and has the
effective address expression using whichever register it wanted:

relax call *foo@GOT(%ecx)
relax call *foo@GOT(%edx)
etc.

If the linker chooses to relax it to a direct call, no register at all
is needed, so the linker can just throw this away and use:

call foo

for all of them.

Rich


Re: [PATCH i386] Allow sibcalls in no-PLT PIC

2015-05-15 Thread Rich Felker
On Fri, May 15, 2015 at 01:08:15PM -0700, H.J. Lu wrote:
> With relax branch in 32-bit, there are 2 cases:
> 
> 1. PIC or PIE:  We generate
> 
> set up EBX
> relax call foo@PLT
> 
> It is almost the same as we do now, except for the relax prefix.
> If foo is defined in another shared library or may be preempted,
> linker will generate
> 
> call *foo@GOTPLT(%ebx)
> 
> If foo turns out local, linker will output
> 
> relax call foo

This does not address the initial and primary motivation for no-plt on
32-bit: eliminating the awful codegen constraint costs of the
GOT-register (ebx, and equivalent on other targets) ABI for calling
PLT entries. If instead you generated code that sets up an expression
for the GOT slot using arbitrary registers, and relaxed it to a direct
call (possibly rendering the register setup useless), it would be
comparable to the no-plt approach. So for example:

set up ecx (or whatever register)
relax call *foo@GOT(%ecx)

and relax to:

set up ecx (or whatever register; now useless)
relax call foo

But the no-plt approach is still superior in that the address load
from the GOT can be hoisted out of loops, etc., resulting in something
like:

call *%esi

This could be valuable in loops calling a math function repeatedly,
for example.

Overall I'm still not a fan of the relaxation approach. There are very
few places it would actually help that couldn't already be improved
better with use of visibility, and it can't give codegen as good as
no-plt option.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-11 Thread Rich Felker
On Mon, May 11, 2015 at 01:48:03PM +0200, Michael Matz wrote:
> Hi,
> 
> On Wed, 6 May 2015, Rich Felker wrote:
> 
> > I don't see how this case is improved unless GCC is failing to consider 
> > strong definitions in the same TU as locally-binding.
> 
> Interposition of non-static non-inline non-weak symbols is supported 
> independend of if they are defined in the same TU or not (if you're 
> producing a shared lib, that is).  I.e. no, they are not considered 
> locally-binding (for instance, they aren't automatically inlined).
>
> > If this is the case, is there a reason for that behavior?
> 
> Because IMHO interposition is orthogonal to TU placement, and hence 
> shouldn't be influenced by it.  There's visibility, inline hints or 
> static-ness to achieve different effects.  (perhaps the real reason is: 
> because it always worked like that :) )
> 
> > IMO it's wrong.
> 
> Why?  I think it's right.

I see it as an unnecessary pessimization. The ELF shared library
semantics for allowing interposition were designed to avoid behavioral
regressions versus static linking, and this is not such a case. With
an archive-type library, it's possible to cause whole TUs to be
omitted when linking as long as whatever symbol(s) may have been
needed from them are already defined elsewhere; interposition makes
the same possible with dynamic linking. But if symbols A and B were
both in the same TU, having A defined prior to searching an archive
but B undefined will cause the TU that defines both to be pulled in,
and is such a linking error (multiple definitions). So I'm not sure
why it's desirable to support this.

The "it always worked like that" argument may suffice if people are
depending on this behavior now (OTOH I'd rather see it break so they
fix their breakage of static linking) but I suspect the historical
reason it worked like that is that compilers were not smart enough to
process whole TUs at a time but just worked with one function at a
time and did not know that a referenced symbol was in the same TU.

BTW visibility can't really address the issue except with hacks
(hidden aliases) or protected visibility (which is hard to use because
it's broken on lots of toolchain versions).

Rich


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-05-11 Thread Rich Felker
On Mon, May 11, 2015 at 12:31:51PM +0200, Jakub Jelinek wrote:
> On Mon, May 11, 2015 at 11:20:15AM +0100, Szabolcs Nagy wrote:
> > 
> > 
> > On 09/05/15 19:57, Szabolcs Nagy wrote:
> > > * H.J. Lu  [2015-05-09 10:41:41 -0700]:
> > >> There are
> > >>
> > >>  4: 2b70   806 FUNCGLOBAL DEFAULT   12
> > >> __cpu_indicator_init@GCC_4.8.0
> > >> 38: 002153d016 OBJECT  GLOBAL DEFAULT   25 
> > >> __cpu_model@GCC_4.8.0
> > >>
> > >> and
> > >>
> > >> 0215000  00040001 R_X86_64_64
> > >> 2b70 __cpu_indicator_init@GCC_4.8.0 + 0
> > >> 00215220  00260006 R_X86_64_GLOB_DAT
> > >> 002153d0 __cpu_model@GCC_4.8.0 + 0
> > >>
> > >> in libgcc_s.so.1.  Musl ld.so must be fixed to handle it.
> > >>
> > 
> > Rich is looking at how to do this non-intrusively, but
> > it seems non-trivial (some users of musl prefer not to
> > resolve such versioned symbols).
> > 
> > > 
> > > i think it might be enough to add __cpu_indicator_init_local
> > > as an alias to __cpu_indicator_init in libgcc.a and then use
> > > the *_local symbol from the ifunc resolver, that way no new
> > > dependency is added to libgcc_s.so handling.
> > 
> > i tried this approach and it seems to work: passes all
> > multiversioning tests on x86_64.
> > 
> > i think it's no worse than the symver approach.
> > 
> > is it ok to change the current fix to this?
> 
> No.  Instead of piling hacks like this just fix it in musl.

I wouldn't call it piling hacks; it's an improvement as far as I can
tell since it remove symbolic relocations and replaces them with
relative ones.

> libgcc certainly isn't the only library that uses @ symbol versions,
> e.g. libstdc++ does as well, as well as many other shared libraries.

We haven't encountered such issues there.

Rich


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-05-09 Thread Rich Felker
On Sat, May 09, 2015 at 10:41:41AM -0700, H.J. Lu wrote:
> On Sat, May 9, 2015 at 7:31 AM, Szabolcs Nagy  wrote:
> > * H.J. Lu  [2015-04-17 05:36:30 -0700]:
> >> On Fri, Apr 17, 2015 at 4:59 AM, Jakub Jelinek  wrote:
> >> > On Fri, Apr 17, 2015 at 04:48:48AM -0700, H.J. Lu wrote:
> >> >> > I don't like it.  Nonshared libgcc is libgcc.a, period.  No sense in
> >> >> > creating yet another library for that.
> >> >> > So, IMHO beyond making the __cpu* entrypoints compat symbols only (@ 
> >> >> > instead
> >> >> > of @@ symbol versions) the right fix is simply tweak init_gcc_spec, 
> >> >> > so that
> >> >> > static_name is always linked in, in the switch combinations that it 
> >> >> > isn't
> >> >> > right now of course after shared_name rather than before that.
> >> >> > I thought we've fixed that years ago...
> >> >> >
> >
> > I think the patch committed for this is suboptimal.
> > (it breaks with musl libc on x86 if libgcc_s is linked into a binary)
> >
> > the patch:
> > http://gcc.gnu.org/ml/gcc-patches/2015-04/msg00878.html
> > original thread:
> > http://gcc.gnu.org/ml/gcc-patches/2015-03/msg01520.html
> >
> > The symbol versioning hack for __cpu_model and __cpu_indicator_init
> > makes them invisible to the musl dynamic linker so their relocation
> > fails with 'symbol not found' error.
> > (affects anything linked with -lgcc_s)
> 
> There are
> 
>  4: 2b70   806 FUNCGLOBAL DEFAULT   12
> __cpu_indicator_init@GCC_4.8.0
> 38: 002153d016 OBJECT  GLOBAL DEFAULT   25 
> __cpu_model@GCC_4.8.0
> 
> and
> 
> 0215000  00040001 R_X86_64_64
> 2b70 __cpu_indicator_init@GCC_4.8.0 + 0
> 00215220  00260006 R_X86_64_GLOB_DAT
> 002153d0 __cpu_model@GCC_4.8.0 + 0
> 
> in libgcc_s.so.1.  Musl ld.so must be fixed to handle it.

The intent in musl was always not to support symbol versioning. There
are various reasons for this, which I could go into, but I'd rather
not turn this thread into an argument about the merits of symbol
versioning.

Originally, musl ignored the version data completely, and would
happily have resolved the above symbol, but this also led to problems
where third-party libraries used symbol versioning and at runtime we
got the oldest-versioned symbol instead of the desired current one. So
I changed the lookup to use the same logic as ld, rejecting all
symbols with the hidden bit set, and getting us the newest one (which
is the only one we intended to support).

We can't just do the same version processing as glibc because we want
symbols in libc itself to resolve regardless of the version in the
reference (this is needed for the glibc ABI compat we provide). It
might work to honor symbol versions only when the library being
searched has versions, and otherwise allow symbols to match any
version requested, but I haven't thought this through entirely yet.

In any case I'd like the decision for what musl does about symbol
versions (which are not intended to be supported, anyway) to be
independent of libgcc's solution of this problem. We can solve it with
a hack on our side (just providing dummy symbols by those names) but
that's ugly too and I'd rather not do it.

Could you clarify the reasoning for why libgcc is using this hack with
a reference to an 'obsolete' symbol version rather than just
visibility?

Rich


Re: PATCH] PR target/65612: Multiversioning doesn't work with DSO nor PIE

2015-05-08 Thread Rich Felker
On Fri, Apr 17, 2015 at 05:36:30AM -0700, H.J. Lu wrote:
> On Fri, Apr 17, 2015@4:59 AM, Jakub Jelinek  wrote:
> > On Fri, Apr 17, 2015 at 04:48:48AM -0700, H.J. Lu wrote:
> >> > I don't like it.  Nonshared libgcc is libgcc.a, period.  No sense in
> >> > creating yet another library for that.
> >> > So, IMHO beyond making the __cpu* entrypoints compat symbols only (@ 
> >> > instead
> >> > of @@ symbol versions) the right fix is simply tweak init_gcc_spec, so 
> >> > that
> >> > static_name is always linked in, in the switch combinations that it isn't
> >> > right now of course after shared_name rather than before that.
> >> > I thought we've fixed that years ago...
> >> >
> >>
> >> We never pass -lgcc to linker when building C++ DSO:
> >>
> >>  /usr/libexec/gcc/x86_64-redhat-linux/4.9.2/collect2 -plugin
> >> /usr/libexec/gcc/x86_64-redhat-linux/4.9.2/liblto_plugin.so
> >> -plugin-opt=/usr/libexec/gcc/x86_64-redhat-linux/4.9.2/lto-wrapper
> >> -plugin-opt=-fresolution=/tmp/ccZC7iqy.res
> >> -plugin-opt=-pass-through=-lgcc_s -plugin-opt=-pass-through=-lc
> >> -plugin-opt=-pass-through=-lgcc_s --build-id --no-add-needed
> >> --eh-frame-hdr --hash-style=gnu -m elf_x86_64 -shared
> >> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64/crti.o
> >> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/crtbeginS.o
> >> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2
> >> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64
> >> -L/lib/../lib64 -L/usr/lib/../lib64
> >> -L/usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../.. x.o -lstdc++ -lm
> >> -lgcc_s -lc -lgcc_s /usr/lib/gcc/x86_64-redhat-linux/4.9.2/crtendS.o
> >> /usr/lib/gcc/x86_64-redhat-linux/4.9.2/../../../../lib64/crtn.o
> >> [hjl@gnu-32 tmp]$
> >>
> >> That is why libgcc_nonshared.a is needed.
> >
> > See what I wrote.  I think it is a bug that we don't do that, in your case
> > we should pass -lgcc_s -lgcc -lc -lgcc_s -lgcc.
> > Or, if you don't want to change that, as the multi-versioning change is
> > i386/x86_64 only change, just ensure that those targets have
> > t-slibgcc-libgcc in libgcc/config.host and thus behave like most other linux
> > targets where -lgcc is linked in always after -lgcc_s.
> >
> > Jakub
> 
> This patch works for me.  OK for trunk?
> 
> gcc/testsuite/
> 
> PR target/65612
> * g++.dg/ext/mv18.C: New test.
> * g++.dg/ext/mv19.C: Likewise.
> * g++.dg/ext/mv20.C: Likewise.
> * g++.dg/ext/mv21.C: Likewise.
> * g++.dg/ext/mv22.C: Likewise.
> * g++.dg/ext/mv23.C: Likewise.
> 
> libgcc/
> 
> PR target/65612
> * config.host (tmake_file): Add t-slibgcc-libgcc for Linux/x86.
> * config/i386/cpuinfo.c (__cpu_model): Initialize.
> (__cpu_indicator_init@GCC_4.8.0): New.
> (__cpu_model@GCC_4.8.0): Likewise.
> * config/i386/t-linux (HOST_LIBGCC2_CFLAGS): Add
> -DUSE_ELF_SYMVER.
> 
> Thanks.

This patch seems to be making some trouble for musl's dynamic linker,
which I could go into details on if necessary, but I'm wondering if
there's any reason why simple visibility wasn't used if you want to
hide the symbols from the linker. Could you explain the motivation for
doing it this way? Is the intent for libgcc_s.so to refer to its own
internal copy of __cpu_model and __cpu_indicator_init? If so, it seems
wrong to have a relocation referring to them symbolically at all; this
should just be a relative relocation.

Rich


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 04:50:28PM +, Joseph Myers wrote:
> On Fri, 8 May 2015, Rich Felker wrote:
> 
> > On Fri, May 08, 2015 at 03:41:31PM +0100, Szabolcs Nagy wrote:
> > > > I.e. as it stands this patch is not OK for backporting to GCC 5
> > > > without further discussion.
> > > > 
> > > > There is also the perspective that we should be able to aim for
> > > > an ABI variant agnostic dynamic linker at some point over the next
> > > > year by working towards a build that truly uses no float and is
> > > > hence compatible with all the ABI variants.
> > > 
> > > i'm not sure what you mean by 'a build that truly uses no float'
> > > 
> > > i thought the direction is to have a potentially hard float abi
> > > with kernel emulation when the fpu is not present.
> > 
> > I think Matthew's idea was that the dynamic linker could be agnostic
> > since it doesn't need floating point arithmetic itself, then load
> 
> Note that however the dynamic linker does properly need to save and 
> restore call-clobbered registers used for argument passing (because of 
> IFUNCs, user-provided malloc, audit hooks etc. that might affect them even 
> if the dynamic linker itself doesn't); see 
> <https://sourceware.org/ml/libc-alpha/2014-01/msg00673.html>.  So any 
> floating-point-agnostic dynamic linker would, if fixing the bugs around 
> not saving / restoring such registers, need to have runtime-conditional 
> code to save and restore them rather than simple compile-time 
> conditionals.

FWIW, this also doesn't apply to musl; we don't do lazy binding and
there's no resolver function. The dynamic linker never calls into code
provided by the application except for executing init/fini functions.
IFUNC may be provided at some point but it wouldn't be lazy, so
call-clobbered registers aren't relevant; right now the lack of any
specification for what an IFUNC callback is permitted to do (in the
form of "if you do anything else, the behavior is undefined") is
what's blocking support.

Rich


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 03:41:31PM +0100, Szabolcs Nagy wrote:
> > I.e. as it stands this patch is not OK for backporting to GCC 5
> > without further discussion.
> > 
> > There is also the perspective that we should be able to aim for
> > an ABI variant agnostic dynamic linker at some point over the next
> > year by working towards a build that truly uses no float and is
> > hence compatible with all the ABI variants.
> 
> i'm not sure what you mean by 'a build that truly uses no float'
> 
> i thought the direction is to have a potentially hard float abi
> with kernel emulation when the fpu is not present.

I think Matthew's idea was that the dynamic linker could be agnostic
since it doesn't need floating point arithmetic itself, then load
appropriate libraries depending on the ABI of the application
(presumably determined by some flags in _DYNAMIC or perhaps the main
ELF header). Of course with some familiarity with musl it becomes
clear why this is not an option, but to answer things like this we
need to think from a standpoint of non-familiarity with musl. :-)

Rich


Re: [PATCH 6/13] mips musl support

2015-05-08 Thread Rich Felker
On Fri, May 08, 2015 at 02:25:11PM +, Matthew Fortune wrote:
> H.J. Lu  writes:
> > On Mon, Apr 27, 2015 at 7:40 AM, Szabolcs Nagy 
> > wrote:
> > >
> > >
> > > On 21/04/15 15:59, Matthew Fortune wrote:
> > >> Rich Felker  writes:
> > >>> On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
> > >>>> There does however appear to be both soft and hard float variants
> > >
> > > Patch v2.
> > >
> > > Now all the ABI variants musl plans to support are represented.
> > >
> > > gcc/Changelog:
> > >
> > > 2015-04-27  Gregor Richards  
> > > Szabolcs Nagy  
> > >
> > > * config/mips/linux.h (MUSL_DYNAMIC_LINKER32): Define.
> > > (MUSL_DYNAMIC_LINKER64, MUSL_DYNAMIC_LINKERN32): Define.
> > > (GNU_USER_DYNAMIC_LINKERN32): Update.
> > 
> > You checked in config/linux.h CHOOSE_DYNAMIC_LINKER change without
> > config/mips/linux.h change.  Now linux-mips is broken.
> 
> The MIPS patch is OK. I am concerned that you are aiming for one
> dynamic linker per ABI variant in musl but are not accounting for
> soft-float up front in n32/n64. There is time to reconsider this
> before any of this code gets to a versioned GCC release though.

I'm not aware of whether there are mips64 chips for which softfloat
would be desirable, so I don't know if it's an ABI we'll ever have,
but I'm not opposed to adding it here just to be safe (in case we need
it).

> I.e. as it stands this patch is not OK for backporting to GCC 5
> without further discussion.
> 
> There is also the perspective that we should be able to aim for
> an ABI variant agnostic dynamic linker at some point over the next
> year by working towards a build that truly uses no float and is
> hence compatible with all the ABI variants.

For musl that's not going to happen. The dynamic linker and shared
libc are one file, which therefore has lots of public interfaces that
depend on the argument passing ABI.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-06 Thread Rich Felker
On Wed, May 06, 2015 at 12:05:20PM -0700, H.J. Lu wrote:
> >> -Bsymbolic will bind all references to local definitions in shared 
> >> libraries,
> >> with and without visibility, weak or non-weak.  Compiler can use it
> >> in binds_tls_local_p and we can generate much better codes in shared
> >> libraries.
> >
> > Yes, I'm aware of what it does. But at compile-time the compiler can't
> > know whether the referenced symbol will be defined in the same DSO
> > unless this is visibility annotation telling it. Even when linking a
> > shared library using -Bsymbolic, the library code can still make calls
> > (or data references) to symbols in other DSOs.
> 
> Even without LTO, -fsymbolic -fPIC will generate better codes for
> 
> ---
> int glob_a = 1;
> 
> int foo ()
> {
>   return glob_a;
> }
> ---

I see how this case is improved, but it depends on the dubious (and
undocumented?) behavior of -Bsymbolic breaking copy relocations.

> and
> 
> ---
> int glob_a (void)
> {
>   return -1;
> }
> 
> int foo ()
> {
>   return glob_a ();
> }
> ---

I don't see how this case is improved unless GCC is failing to
consider strong definitions in the same TU as locally-binding. If this
is the case, is there a reason for that behavior? IMO it's wrong.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-06 Thread Rich Felker
On Wed, May 06, 2015 at 11:44:57AM -0700, H.J. Lu wrote:
> On Wed, May 6, 2015 at 11:37 AM, Rich Felker  wrote:
> > On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote:
> >> On Wed, May 6, 2015 at 10:35 AM, Rich Felker  wrote:
> >> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
> >> >> On Wed, 6 May 2015, Jakub Jelinek wrote:
> >> >> > The linker would know very well what kind of relocations are used for
> >> >> > particular PLT slot, and for the new relocations which would resolve 
> >> >> > to the
> >> >> > address of the .got.plt slot it could just tweak corresponding 3rd 
> >> >> > insn
> >> >> > in the slot, to not jump to first plt slot - 16, but a few bytes 
> >> >> > before that
> >> >> > that would just load the address of _G_O_T_ into %ebx and then 
> >> >> > fallthru
> >> >> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few 
> >> >> > ticks
> >> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
> >> >>
> >> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT 
> >> >> stub.
> >> >
> >> > Indeed. And the situation is the same on almost all targets. The only
> >> > exceptions are those with direct PC-relative addressing (like x86_64)
> >> > and those with reserved inter-procedural linkage registers and
> >> > efficient PC-relative address loading via them (like ARM and AArch64).
> >> > MIPS (o32) is also an interesting exception in that the normal ABI is
> >> > already PLT-free, and while callees need a PIC register loaded, it's a
> >> > call-clobbered register, not a call-saved one, so it doesn't make the
> >> > same kind of trouble,
> >> >
> >> > I really don't see a need to make no-PLT code gen support lazy binding
> >> > when it's necessarily going to be costly to do so, and precludes most
> >> > of the benefits of the no-PLT approach. Anyone still wanting/needing
> >> > lazy binding semantics can use PLT, and can even choose on a per-TU
> >> > basis (or maybe even more fine-grained with pragmas/attributes?).
> >> > Those of us who are suffering the cost of PLT with no benefits
> >> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
> >> > adding -fno-plt) and enjoy something like a 10% performance boost in
> >> > PIC/PIE.
> >> >
> >>
> >> There are things compiler can do for performance and correctness
> >> if it is told what options will be passed to linker.  -z now is one and
> >> -Bsymbolic is another one:
> >>
> >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886
> >>
> >> I think we should add -fnow and -fsymbolic.  Together with LTO,
> >> we can generate faster executables as well as shared libraries.
> >
> > I don't see how knowing about -Bsymbolic can help the compiler
> > optimize. Without visibility, it can't know whether the symbols will
> > be defined in the same DSO. With visibility, it can already do the
> > equivalent hints. Perhaps it helps in the case where the symbol is
> > already defined (and non-weak) in the same TU, but I think in this
> > case it should already be optimizing the reference. Symbol
> > interposition over top of a non-weak symbol from the same TU is always
> > invalid and the compiler should not be pessimizing code to make it
> > work.
> 
> -Bsymbolic will bind all references to local definitions in shared libraries,
> with and without visibility, weak or non-weak.  Compiler can use it
> in binds_tls_local_p and we can generate much better codes in shared
> libraries.

Yes, I'm aware of what it does. But at compile-time the compiler can't
know whether the referenced symbol will be defined in the same DSO
unless this is visibility annotation telling it. Even when linking a
shared library using -Bsymbolic, the library code can still make calls
(or data references) to symbols in other DSOs.

> > As for -fnow, I haven't thought about it much but I also don't see
> > many places where it could help. The only benefit that comes to mind
> > is on targets with weak memory order, where it would eliminate some of
> > the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's
> > work on AArch64). It might also benefit PLT calls on such targets, but
> > you would get a lot more benefit from -fno-plt, and in that case -fnow
> > would not allow any further optimization.
> 
> -fno-plt doesn't work with lazy binding.  -fnow tells compiler that
> lazy binding is not used and it can optimize without PLT.  With
> -flto -fnow, compiler can make much better choices.

Ah, I see now you had LTO in mind. In that case the compiler does know
when the symbol is defined in the same DSO for -Bsymbolic. So that
clears up the usefulness of your proposed -fsymbolic. I still don't
see how -fnow would have a lot of practical usefulness, but I'm
certainly not opposed to it.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-06 Thread Rich Felker
On Wed, May 06, 2015 at 11:26:29AM -0700, H.J. Lu wrote:
> On Wed, May 6, 2015 at 10:35 AM, Rich Felker  wrote:
> > On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
> >> On Wed, 6 May 2015, Jakub Jelinek wrote:
> >> > The linker would know very well what kind of relocations are used for
> >> > particular PLT slot, and for the new relocations which would resolve to 
> >> > the
> >> > address of the .got.plt slot it could just tweak corresponding 3rd insn
> >> > in the slot, to not jump to first plt slot - 16, but a few bytes before 
> >> > that
> >> > that would just load the address of _G_O_T_ into %ebx and then fallthru
> >> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
> >> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
> >>
> >> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT 
> >> stub.
> >
> > Indeed. And the situation is the same on almost all targets. The only
> > exceptions are those with direct PC-relative addressing (like x86_64)
> > and those with reserved inter-procedural linkage registers and
> > efficient PC-relative address loading via them (like ARM and AArch64).
> > MIPS (o32) is also an interesting exception in that the normal ABI is
> > already PLT-free, and while callees need a PIC register loaded, it's a
> > call-clobbered register, not a call-saved one, so it doesn't make the
> > same kind of trouble,
> >
> > I really don't see a need to make no-PLT code gen support lazy binding
> > when it's necessarily going to be costly to do so, and precludes most
> > of the benefits of the no-PLT approach. Anyone still wanting/needing
> > lazy binding semantics can use PLT, and can even choose on a per-TU
> > basis (or maybe even more fine-grained with pragmas/attributes?).
> > Those of us who are suffering the cost of PLT with no benefits
> > (because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
> > adding -fno-plt) and enjoy something like a 10% performance boost in
> > PIC/PIE.
> >
> 
> There are things compiler can do for performance and correctness
> if it is told what options will be passed to linker.  -z now is one and
> -Bsymbolic is another one:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=65886
> 
> I think we should add -fnow and -fsymbolic.  Together with LTO,
> we can generate faster executables as well as shared libraries.

I don't see how knowing about -Bsymbolic can help the compiler
optimize. Without visibility, it can't know whether the symbols will
be defined in the same DSO. With visibility, it can already do the
equivalent hints. Perhaps it helps in the case where the symbol is
already defined (and non-weak) in the same TU, but I think in this
case it should already be optimizing the reference. Symbol
interposition over top of a non-weak symbol from the same TU is always
invalid and the compiler should not be pessimizing code to make it
work.

As for -fnow, I haven't thought about it much but I also don't see
many places where it could help. The only benefit that comes to mind
is on targets with weak memory order, where it would eliminate some of
the cost of synchronizing TLSDESC lazy bindings (see Szabolcs Nagy's
work on AArch64). It might also benefit PLT calls on such targets, but
you would get a lot more benefit from -fno-plt, and in that case -fnow
would not allow any further optimization.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-06 Thread Rich Felker
On Wed, May 06, 2015 at 07:43:58PM +0300, Alexander Monakov wrote:
> On Wed, 6 May 2015, Jakub Jelinek wrote:
> > The linker would know very well what kind of relocations are used for
> > particular PLT slot, and for the new relocations which would resolve to the
> > address of the .got.plt slot it could just tweak corresponding 3rd insn
> > in the slot, to not jump to first plt slot - 16, but a few bytes before that
> > that would just load the address of _G_O_T_ into %ebx and then fallthru
> > into the 0x4c2b7310 snippet above.  The lazy binding would be a few ticks
> > slower in that case, but no requirement on %ebx to contain _G_O_T_.
> 
> No, %ebx is callee-saved, so you can't outright overwrite it in the PLT stub.

Indeed. And the situation is the same on almost all targets. The only
exceptions are those with direct PC-relative addressing (like x86_64)
and those with reserved inter-procedural linkage registers and
efficient PC-relative address loading via them (like ARM and AArch64).
MIPS (o32) is also an interesting exception in that the normal ABI is
already PLT-free, and while callees need a PIC register loaded, it's a
call-clobbered register, not a call-saved one, so it doesn't make the
same kind of trouble,

I really don't see a need to make no-PLT code gen support lazy binding
when it's necessarily going to be costly to do so, and precludes most
of the benefits of the no-PLT approach. Anyone still wanting/needing
lazy binding semantics can use PLT, and can even choose on a per-TU
basis (or maybe even more fine-grained with pragmas/attributes?).
Those of us who are suffering the cost of PLT with no benefits
(because we use -Wl,-z,relro -Wl,-z,now) can just be rid of it (by
adding -fno-plt) and enjoy something like a 10% performance boost in
PIC/PIE.

Rich


Re: [PATCH] Expand PIC calls without PLT with -fno-plt

2015-05-05 Thread Rich Felker
On Mon, May 04, 2015 at 11:42:20AM -0600, Jeff Law wrote:
> On 05/04/2015 11:39 AM, Jakub Jelinek wrote:
> >On Mon, May 04, 2015 at 11:34:05AM -0600, Jeff Law wrote:
> >>On 05/04/2015 10:37 AM, Alexander Monakov wrote:
> >>>This patch introduces option -fno-plt that allows to expand calls that 
> >>>would
> >>>go via PLT to load the address of the function immediately at call site 
> >>>(which
> >>>introduces a GOT load).  Cover letter explains the motivation for this 
> >>>patch.
> >>>
> >>>New option documentation for invoke.texi is missing from the patch; if 
> >>>this is
> >>>accepted I'll be happy to send a v2 with documentation added.
> >>>
> >>>   * calls.c (prepare_call_address): Transform PLT call to GOT lookup and
> >>>   indirect call by forcing address into a pseudo with -fno-plt.
> >>>   * common.opt (flag_plt): New option.
> >>OK once you cobble together the invoke.texi changes.
> >
> >Isn't what Michael/Alan suggested better?  I mean as/ld/compiler changes to
> >inline the plt slot's first part, then lazy binding will work fine.
> I must have missed Alan/Michael's message.
> 
> ISTM the win here is that by going through the GOT, you can CSE the
> GOT reference and possibly get some more register allocation
> freedom.  Is that still the case with Alan/Michael's approach?

There are many advantages to 'going through the GOT'. CSE'ing the
reference is just one. The biggest (IMO) is that you can avoid the bad
PLT ABI that most targets have, where making a call to a PLT slot
requires the GOT address to be pre-loaded into a fixed, call-saved
register. This precludes sibcalls and forces many functions which
otherwise would not need their own stack frames to create one for
saving the old value of the GOT register. See my blog entry on the
topic here: http://ewontfix.com/18/

Anyone who really wants lazy binding can use -fplt (which is
presumably still the default; I didn't check) but lazy binding should
largely be considered deprecated anyway since effective use of relro
protection requires -z now too, in which case you're paying all the
costs (which are considerable!) for lazy binding support even though
you won't get it.

Rich


Re: [PATCH 6/13] mips musl support

2015-04-21 Thread Rich Felker
On Tue, Apr 21, 2015 at 01:58:02PM +, Matthew Fortune wrote:
> Szabolcs Nagy  writes:
> > Set up dynamic linker name for mips.
> > 
> > gcc/Changelog:
> > 
> > 2015-04-16  Gregor Richards  
> > 
> > * config/mips/linux.h (MUSL_DYNAMIC_LINKER): Define.
> 
> I understand that mips musl is o32 only currently is that correct?

This is correct. Other ABIs if/when we support them will have
different names.

> There does however appear to be both soft and hard float variants
> listed in the musl docs. Do you plan on using the same dynamic linker
> name for both float variants? No problem if so but someone must have
> decided to have unique names for big and little endian so I thought
> it worth checking.

No, it's supposed to be different (-sf suffix for soft float; see
arch/mips/reloc.h in musl source). If this didn't make it into the
patches it's an omission, probably because we didn't officially
support the sf ABI at all for a long time.

> Also, are you aware of the two nan encoding formats that MIPS has
> and the support present in glibc's dynamic linker to deal with it?

I am aware but somewhat skeptical of treating it as yet another
dimension to ABI and the resulting ABI combinatorics. The vast
majority of programs couldn't care less which is which and whether a
NAN is quiet or signaling. Officially we just use the classic mips ABI
(with qnan/snan swapped vs other archs) but there's no harm in
somebody doing the opposite if they really know what they're doing.

> I wonder if it would be wise to refuse to target musl unless the
> ABI is known to be supported so that we can avoid compatibility
> issues when different ABI variants are added in musl.

Possibly, though this might make bootstrapping new ABIs harder.

Rich


Re: RFC: PATCHES: Properly handle reference to protected data on x86

2015-03-05 Thread Rich Felker
On Thu, Mar 05, 2015 at 06:39:10AM -0800, H.J. Lu wrote:
> On Wed, Mar 4, 2015 at 3:26 PM, H.J. Lu  wrote:
> > Protected symbol means that it can't be pre-emptied.  It
> > doesn't mean its address won't be external.  This is true
> > for pointer to protected function.  With copy relocation,
> > address of protected data defined in the shared library may
> > also be external.  We only know that for sure at run-time.
> > Here are patches for glibc, binutils and GCC to handle it
> > properly.
> >
> > Any comments?
> 
> This is the binutils patch I checked in.  It basically reverted
> the change for
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=15228
> 
> on x86.  Copy relocations against protected symbols should
> work.

Does it actually work now though? Last I checked gcc was generating
wrong code too -- GOT-relative accesses rather than accessing them
through the GOT. If that's the case, ld has no way to fix the problem.

Rich


Re: [PATCH] proposed fix for bug # 61144

2014-07-22 Thread Rich Felker
On Tue, Jul 22, 2014 at 09:17:12PM +0400, Alexander Monakov wrote:
> On Tue, 22 Jul 2014, Alexander Monakov wrote:
> > I'd like to push this topic forward a bit.  I've bootstrapped and regtested 
> > a
> > version of the patch based on the initial proposal to check DECL_WEAK.  The
> > approach with decl_replaceable_p looks not that easy; I'll expand in a
> > followup email.
> 
> The problem with the patch below using decl_replaceable_p is that it regresses
> the following C++ testcase:
> 
> struct z {
>   static const int aaa = 1;
> };
> 
> //const int z::aaa;
> 
> int foo(int x)
> {
>   return x ? z::aaa : x;
> }
> 
> Here decl_replaceable_p is 'true' for z::aaa.  With the patch the reference to
> z::aaa is not folded, but its definition is not emitted either, so a undefined
> reference error is produced at link time.  But naturally
> varpool_ctor_useable_for_folding_p for z::aaa must stay true in the first 
> place.
> 
> In a way z::aaa is "replaceable" in the sense that the compiler is not going
> to emit a definition, so if anything references z::aaa in the current
> translation unit (if the address is taken), the definition must come from
> another TU.  Nevertheless, references to the value can be folded.  I'm
> unsure what the correct test would look like.  Any advice?

Why is this considered "replaceable"? That sounds like a bug to me.

Anyway, in looking for a solution, wouldn't it be helpful to look at
how this was treated in 4.8 and earlier, before bug 61144 was
introduced?

Rich


Re: [PATCH] proposed fix for bug # 61144

2014-06-16 Thread Rich Felker
On Mon, Jun 16, 2014 at 06:05:19PM +0200, Jan Hubicka wrote:
> > > This needs your decl_replaceable change to not be optimized to if (0),
> > > because of the explicit const modifier.
> > 
> > The case I care about actually has "dummy" as const (with the intent
> > that it be allocated in a read-only section if the dummy definition is
> > used). So for me it's important that this regression be fixed too.
> 
> Yep, GCC since 90's was optimizing reads from weak const attributes, but it
> because worse because I added code walking through aliases.

Ah, yes. BTW the reason I've avoided weak references and straight-out
weak definitions in favor of using weak aliases (aside from aliases
saving space in some cases) is that weak-alias seems to be the
baseline feature that's "always worked safely" whereas the others have
had bugs on and off in the past. And I really hope we can avoid having
more than a single release where it's broken. Some distros are already
pulling in 4.9.0 and I'm getting more bug reports.

> > > I did not change ctor_for_folding to reject variables above as I was not 
> > > quite
> > > sure we want to support this kind of interposition and I am still not 
> > > quite certain.
> > > C++ is quite clear about the transformation replacing initialized const 
> > > by its value.
> > 
> > My concern is about C, not C++. This kind of interposition has always
> > been supported in unix C, even prior to GCC, going back to sysv or
> > earlier, as a documented feature (historically #pragma weak). It
> > should not regress. If fixing it results in an regression with regards
> > to optimizability of C++, perhaps this could be made
> > language-specific, or (better) the C++ front-end could add an
> > additional internal-use-only attribute to weak definitions it
> > generates internally that permits constant-folding them, while not
> > breaking the semantics for weak definitions provided by the user at
> > the source level.
> 
> Yes, I see your point and clearly we should not optimize with explicit weak 
> attribute.
> I wonder if decl_replaceable_p is however correct check here or if we want 
> explicit check
> for weak visibility.

Isn't anything weak inherently replacable? It seems like using
decl_replacable_p, if that predicate is correctly implemented, would
be completely safe, since it should be true for a superset of weak.

> I am concerned about const variables w/o weak attribute with -fPIC (because 
> for
> those decl_replaceable_p returns true, too). Consider following testcase:
> struct t
> {
> static const int dummy=0;
> const int *m();
> } t;
> int
> main()
> {
>   return *t.m();
> }
> int
> main2()
> {
>   return t.dummy;
> }
> const int *
> t::m()
> {
>   return &dummy;
> }
> 
> Here main2 is optimized by C++ FE to return 0, while backend is affraid to 
> optimize
> main() after inlining anticipating that dummy may be interposed. However 
> moving t::m
> inside of the unit will make dummy comdat and it will get optimizing.
> Adding another method and keying the t into other unit will make it 
> optimized, too.
> 
> This is not very consistent. But perhaps we need a flag from C++ FE to tell us
> what variables may not be interposed, because perhaps the c variant with -fPIC
> const int dummy=0;
> int
> main()
> {
>   return t;
> }
> 
> Jason?
> 
> A C variant of the testcase:
> 
> const int dummy=0;
> 
> const static int * d=&dummy;
> int
> main()
> {
>   return dummy;
> }
> int
> main2()
> {
>   return *d;
> }
> 
> seems optimized to return 0 (with -fPIC) for ages, too, but here at least
> frontend won't substitute first dummy for 0.

I see the issue of whether interposition should be supported in
non-weak situations like this as completely separate from pr61144.
Strictly speaking, it's undefined behavior to have multiple
definitions of the same symbol. ELF semantics allow such interposition
for shared library use mainly because creating a shared library hides
translation-unit boundaries whereby symbol replacement with the
equivalent static library may have had well-defined behavior due to
the conflicting translation units not getting pulled in by the linker.
But there's no fundamental reason to need to support interposition
against symbols defined in the same translation unit, since in the
corresponding static-library usage it would result in a link-time
error.

With the weak symbols issue (pr61144), on the other hand, the bug is
that the well-established semantics of weak binding are not being
honored by the compiler.

Of course it may still be _desirable_ to support the sort of
interposition you described for definitions in the same translation
unit, but whether you want to support that should be treated as a
separate issue IMO, for the above reasons.

Rich


Re: [PATCH] proposed fix for bug # 61144

2014-06-16 Thread Rich Felker
On Mon, Jun 16, 2014 at 11:06:04AM +0200, Jan Hubicka wrote:
> > 
> > Are the attached files acceptable?
> 
> The testcase looks OK to me, but it already should be fixed on mainline
> by patch https://gcc.gnu.org/ml/gcc-patches/2014-05/msg01315.html that
> prevents dummy to be marked as constant. 
> 
> You can however modify the testcase to have
> __attribute__ ((weak)) const int foo=0;

And the same for weak alias rather than straight weak definition like
the above?

> This needs your decl_replaceable change to not be optimized to if (0),
> because of the explicit const modifier.

The case I care about actually has "dummy" as const (with the intent
that it be allocated in a read-only section if the dummy definition is
used). So for me it's important that this regression be fixed too.

> I did not change ctor_for_folding to reject variables above as I was not quite
> sure we want to support this kind of interposition and I am still not quite 
> certain.
> C++ is quite clear about the transformation replacing initialized const by 
> its value.

My concern is about C, not C++. This kind of interposition has always
been supported in unix C, even prior to GCC, going back to sysv or
earlier, as a documented feature (historically #pragma weak). It
should not regress. If fixing it results in an regression with regards
to optimizability of C++, perhaps this could be made
language-specific, or (better) the C++ front-end could add an
additional internal-use-only attribute to weak definitions it
generates internally that permits constant-folding them, while not
breaking the semantics for weak definitions provided by the user at
the source level.

Rich


Re: [PATCH] proposed fix for bug # 61144

2014-06-14 Thread Rich Felker
Ping. Do you have any feedback on my tests? What is the next step?

Rich

On Mon, Jun 09, 2014 at 03:40:44PM +0400, Alexander Monakov wrote:
> 
> 
> On Fri, 6 Jun 2014, Rich Felker wrote:
> 
> > On Fri, May 23, 2014 at 12:26:18PM -0600, Jeff Law wrote:
> > > On 05/21/14 21:59, Rich Felker wrote:
> > > >On Wed, May 21, 2014 at 11:17:53AM +0200, Richard Biener wrote:
> > > >>On Wed, May 21, 2014 at 3:59 AM, Rich Felker  wrote:
> > > >>>Bug # 61144 is a regression in 4.9.0 that breaks building of musl libc
> > > >>>due to aggressive and semantically-incorrect constant folding of weak
> > > >>>aliases. The attached patch seems to fix the issue. A weak alias
> > > >>>should never be a candidate for constant folding because it may always
> > > >>>be replaced by a strong definition from another translation unit.
> > > >>>
> > > >>>For details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144
> > > >>>
> > > >>>I do not have a copyright assignment on file but this patch should be
> > > >>>sufficiently trivial not to require it.
> > > >>
> > > >>Please add a testcase.  Also I wonder why it isn't better to generalize
> > > >
> > > >How should a testcase be done? On the PR there's a testcase that shows
> > > >the problem in the generated code, but no automated check for it.
> > > >Testing this is actually a bit of a pain unless you're allowed to run
> > > >the generated program.
> > > You can run the test program.  Have it exit (0) on success, abort ()
> > > on failure if at all possible.  Then drop the test source file into
> > > gcc/testsuite/gcc.c-torture/execute/pr61144.c
> > 
> > The test needs to be two translation units linked together: one with
> > a weak definition of the object as 0, and the other with a strong
> > definition. The test should show the weak value being used rather than
> > the strong one. But I'm not sure how this should be integrated with
> > the build process.
> 
> Please have a look at gcc/testsuite/gcc.dg/special/wkali-2{,a,b}.c.  This is a
> three-TU test for weak aliases, so you should be able to very easily adjust it
> for this bug.
> 
> Thanks.
> Alexander


Re: [PATCH] proposed fix for bug # 61144

2014-06-09 Thread Rich Felker
On Mon, Jun 09, 2014 at 03:40:44PM +0400, Alexander Monakov wrote:
> 
> 
> On Fri, 6 Jun 2014, Rich Felker wrote:
> 
> > On Fri, May 23, 2014 at 12:26:18PM -0600, Jeff Law wrote:
> > > On 05/21/14 21:59, Rich Felker wrote:
> > > >On Wed, May 21, 2014 at 11:17:53AM +0200, Richard Biener wrote:
> > > >>On Wed, May 21, 2014 at 3:59 AM, Rich Felker  wrote:
> > > >>>Bug # 61144 is a regression in 4.9.0 that breaks building of musl libc
> > > >>>due to aggressive and semantically-incorrect constant folding of weak
> > > >>>aliases. The attached patch seems to fix the issue. A weak alias
> > > >>>should never be a candidate for constant folding because it may always
> > > >>>be replaced by a strong definition from another translation unit.
> > > >>>
> > > >>>For details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144
> > > >>>
> > > >>>I do not have a copyright assignment on file but this patch should be
> > > >>>sufficiently trivial not to require it.
> > > >>
> > > >>Please add a testcase.  Also I wonder why it isn't better to generalize
> > > >
> > > >How should a testcase be done? On the PR there's a testcase that shows
> > > >the problem in the generated code, but no automated check for it.
> > > >Testing this is actually a bit of a pain unless you're allowed to run
> > > >the generated program.
> > > You can run the test program.  Have it exit (0) on success, abort ()
> > > on failure if at all possible.  Then drop the test source file into
> > > gcc/testsuite/gcc.c-torture/execute/pr61144.c
> > 
> > The test needs to be two translation units linked together: one with
> > a weak definition of the object as 0, and the other with a strong
> > definition. The test should show the weak value being used rather than
> > the strong one. But I'm not sure how this should be integrated with
> > the build process.
> 
> Please have a look at gcc/testsuite/gcc.dg/special/wkali-2{,a,b}.c.  This is a
> three-TU test for weak aliases, so you should be able to very easily adjust it
> for this bug.

Are the attached files acceptable?

Rich
/* { dg-do run } */
/* { dg-require-weak "" } */
/* { dg-require-alias "" } */
/* { dg-additional-sources "wkali-3a.c" } */

#include 

static int dummy = 0;
extern int foo __attribute__((__weak__, __alias__("dummy")));

int main(void) {

if (foo)
exit(0);
else
abort();
}
/* { dg-do run } */

int foo = 1;


Re: [PATCH] proposed fix for bug # 61144

2014-06-06 Thread Rich Felker
On Fri, May 23, 2014 at 12:26:18PM -0600, Jeff Law wrote:
> On 05/21/14 21:59, Rich Felker wrote:
> >On Wed, May 21, 2014 at 11:17:53AM +0200, Richard Biener wrote:
> >>On Wed, May 21, 2014 at 3:59 AM, Rich Felker  wrote:
> >>>Bug # 61144 is a regression in 4.9.0 that breaks building of musl libc
> >>>due to aggressive and semantically-incorrect constant folding of weak
> >>>aliases. The attached patch seems to fix the issue. A weak alias
> >>>should never be a candidate for constant folding because it may always
> >>>be replaced by a strong definition from another translation unit.
> >>>
> >>>For details see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61144
> >>>
> >>>I do not have a copyright assignment on file but this patch should be
> >>>sufficiently trivial not to require it.
> >>
> >>Please add a testcase.  Also I wonder why it isn't better to generalize
> >
> >How should a testcase be done? On the PR there's a testcase that shows
> >the problem in the generated code, but no automated check for it.
> >Testing this is actually a bit of a pain unless you're allowed to run
> >the generated program.
> You can run the test program.  Have it exit (0) on success, abort ()
> on failure if at all possible.  Then drop the test source file into
> gcc/testsuite/gcc.c-torture/execute/pr61144.c

The test needs to be two translation units linked together: one with
a weak definition of the object as 0, and the other with a strong
definition. The test should show the weak value being used rather than
the strong one. But I'm not sure how this should be integrated with
the build process.

Is anyone from the gcc side willing to help on this? I'd really like
to get it fixed before the bug propagates into more releases and makes
longstanding weak alias semantics unreliable, but I'm not familiar
with gcc internals well enough to know if my existing patch is the
preferred fix or not, nor how to make a good test case.

Rich


  1   2   >