Re: [VERY RFC PATCH 2/2] hurd: Make it possible to call memcpy very early

Sergey Bugaev Fri, 21 Apr 2023 04:52:16 -0700

Hello,

On Thu, Apr 20, 2023 at 11:26 PM H.J. Lu <hjl.to...@gmail.com> wrote:
> Doesn't it disable IFUNC for memcpy and stpncpy?


I was hoping you'd tell me whether it does :|

I *think* on i386 it does indeed (so I'd need to rework that part of
the patch), but not on x86_64. This is based on the following
observations:

1. in glibc source, sysdeps/i386/dl-irel.h does this:

static inline void
__attribute ((always_inline))
elf_irel (const Elf32_Rel *reloc)
{
  Elf32_Addr *const reloc_addr = (void *) reloc->r_offset;
  const unsigned long int r_type = ELF32_R_TYPE (reloc->r_info);

  if (__glibc_likely (r_type == R_386_IRELATIVE))
    {
      Elf32_Addr value = elf_ifunc_invoke(*reloc_addr);
      *reloc_addr = value;
    }
  else
    __libc_fatal ("Unexpected reloc type in static binary.\n");
}

i.e. reading the ifunc ptr from the relocated address itself, whereas
on x86_64 it's:

static inline void
__attribute ((always_inline))
elf_irela (const ElfW(Rela) *reloc)
{
  ElfW(Addr) *const reloc_addr = (void *) reloc->r_offset;
  const unsigned long int r_type = ELFW(R_TYPE) (reloc->r_info);

  if (__glibc_likely (r_type == R_X86_64_IRELATIVE))
    {
      ElfW(Addr) value = elf_ifunc_invoke(reloc->r_addend);
      *reloc_addr = value;
    }
  else
    __libc_fatal ("Unexpected reloc type in static binary.\n");
}

i.e. the ifunc resolver is stored in the addend, and the initial value
of *reloc_addr is ignored.

Checking arm and aarch64, I see that arm uses *reloc_addr like i386,
and aarch64 uses r_addend like x86_64. But (unlike i386 and like
x86_64) arm also has an ifunc relocation for memcpy, so (if someone
was to work on a arm-gnu port) we would still have the same issue
there, and this approach wouldn't work -- but see below.

2. When dumping relocations with readelf --wide --relocs, for the
   x86_64-gnu build I see the addends vary, but for i386-gnu they're
   just empty. That means readelf considers R_X86_64_IRELATIVE to
   be a rela, and R_386_IRELATIVE to be a rel.

3. When looking at the initial values of the GOT entries, on i386
   they do point to ifunc resolvers; on x86_64 they don't seem to be.

4. I've now tried asking qemu for a better CPU, and sure enough, I
   get the GOT entry pointing to __memcpy_avx_unaligned_erms.

Here's a little demo:

(gdb) bt
#0  __memcpy_avx_unaligned_erms () at
../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:256
#1  0x00000000004b48d8 in __device_write_inband
(device=device@entry=6, mode=mode@entry=0, recnum=recnum@entry=0,
data=data@entry=0x4db0f2 <crlf> "\r\n",
    dataCnt=dataCnt@entry=2,
bytes_written=bytes_written@entry=0xbffffcdc) at
/home/sergey/dev/crosshurd64/src/glibc/build/mach/RPC_device_write_inband.c:158
#2  0x0000000000400d6b in write_some (to_write=2, p=0x4db0f2 <crlf>
"\r\n") at devstream.c:45
#3  write_crlf () at devstream.c:58
#4  devstream_write (cookie=<optimized out>, buffer=0x20000d30 "Well
hello friends!\n", n=20) at devstream.c:70
#5  0x0000000000424841 in _IO_cookie_write (fp=0x20000c10,
buf=<optimized out>, size=20) at iofopncook.c:59
#6  0x0000000000425234 in new_do_write (fp=0x20000c10, data=0x20000d30
"Well hello friends!\n", to_do=to_do@entry=20)
    at /home/sergey/dev/crosshurd64/src/glibc/libio/libioP.h:1031
#7  0x0000000000425959 in _IO_new_do_write (fp=<optimized out>,
data=<optimized out>, to_do=20) at fileops.c:425
#8  0x00000000004266e0 in _IO_new_file_sync (fp=0x20000c10) at fileops.c:798
#9  0x0000000000424542 in _IO_fflush (fp=0x20000c10) at
/home/sergey/dev/crosshurd64/src/glibc/libio/libioP.h:1031
#10 0x0000000000400bea in main (argc=2, argv=0xbfffffa8) at
/home/sergey/dev/mach-bootstrap-hello.c:69
#11 0x000000000040de73 in __libc_start_call_main (argv=0xbfffffa8,
argc=2, main=0x400ad2 <main>) at
../sysdeps/generic/libc_start_call_main.h:23
#12 __libc_start_main_impl (main=<optimized out>, argc=2,
argv=0xbfffffa8, init=<optimized out>, fini=<optimized out>,
rtld_fini=<optimized out>,
    stack_end=<optimized out>) at ../csu/libc-start.c:360
#13 0x0000000000400961 in _start1 () at ../sysdeps/x86_64/start.S:115

Actually maybe we could make this hack work for the architectures that
do use *reloc_addr too: instead of just rewriting the GOT entry and
leaving it at that, we'd restore the original pointer (i.e.
__memcpy_ifunc) after we've done the early Hurd-specific setup, right
before jumping to _start1.

Maybe this awesome/horrible trick could be even used to enable
ifunc-selected memcpy for i386 -- and not only on the Hurd, but for
i386-linux-gnu as well? To reiterate: set the GOT entry to a
known-good baseline version very early, then call memcpy the usual way
all you like, then before doing _dl_relocate_static_pie reset the GOT
entry back to the ifunc resolver.

Please tell me if what I'm saying makes sense, I may sound confident,
but I'm really not. This really needs someone way more experienced
than me to look into it and judge.

Sergey

Re: [VERY RFC PATCH 2/2] hurd: Make it possible to call memcpy very early

Reply via email to