Hello, On Thu, Apr 20, 2023 at 11:26 PM H.J. Lu <hjl.to...@gmail.com> wrote: > Doesn't it disable IFUNC for memcpy and stpncpy?
I was hoping you'd tell me whether it does :| I *think* on i386 it does indeed (so I'd need to rework that part of the patch), but not on x86_64. This is based on the following observations: 1. in glibc source, sysdeps/i386/dl-irel.h does this: static inline void __attribute ((always_inline)) elf_irel (const Elf32_Rel *reloc) { Elf32_Addr *const reloc_addr = (void *) reloc->r_offset; const unsigned long int r_type = ELF32_R_TYPE (reloc->r_info); if (__glibc_likely (r_type == R_386_IRELATIVE)) { Elf32_Addr value = elf_ifunc_invoke(*reloc_addr); *reloc_addr = value; } else __libc_fatal ("Unexpected reloc type in static binary.\n"); } i.e. reading the ifunc ptr from the relocated address itself, whereas on x86_64 it's: static inline void __attribute ((always_inline)) elf_irela (const ElfW(Rela) *reloc) { ElfW(Addr) *const reloc_addr = (void *) reloc->r_offset; const unsigned long int r_type = ELFW(R_TYPE) (reloc->r_info); if (__glibc_likely (r_type == R_X86_64_IRELATIVE)) { ElfW(Addr) value = elf_ifunc_invoke(reloc->r_addend); *reloc_addr = value; } else __libc_fatal ("Unexpected reloc type in static binary.\n"); } i.e. the ifunc resolver is stored in the addend, and the initial value of *reloc_addr is ignored. Checking arm and aarch64, I see that arm uses *reloc_addr like i386, and aarch64 uses r_addend like x86_64. But (unlike i386 and like x86_64) arm also has an ifunc relocation for memcpy, so (if someone was to work on a arm-gnu port) we would still have the same issue there, and this approach wouldn't work -- but see below. 2. When dumping relocations with readelf --wide --relocs, for the x86_64-gnu build I see the addends vary, but for i386-gnu they're just empty. That means readelf considers R_X86_64_IRELATIVE to be a rela, and R_386_IRELATIVE to be a rel. 3. When looking at the initial values of the GOT entries, on i386 they do point to ifunc resolvers; on x86_64 they don't seem to be. 4. I've now tried asking qemu for a better CPU, and sure enough, I get the GOT entry pointing to __memcpy_avx_unaligned_erms. Here's a little demo: (gdb) bt #0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:256 #1 0x00000000004b48d8 in __device_write_inband (device=device@entry=6, mode=mode@entry=0, recnum=recnum@entry=0, data=data@entry=0x4db0f2 <crlf> "\r\n", dataCnt=dataCnt@entry=2, bytes_written=bytes_written@entry=0xbffffcdc) at /home/sergey/dev/crosshurd64/src/glibc/build/mach/RPC_device_write_inband.c:158 #2 0x0000000000400d6b in write_some (to_write=2, p=0x4db0f2 <crlf> "\r\n") at devstream.c:45 #3 write_crlf () at devstream.c:58 #4 devstream_write (cookie=<optimized out>, buffer=0x20000d30 "Well hello friends!\n", n=20) at devstream.c:70 #5 0x0000000000424841 in _IO_cookie_write (fp=0x20000c10, buf=<optimized out>, size=20) at iofopncook.c:59 #6 0x0000000000425234 in new_do_write (fp=0x20000c10, data=0x20000d30 "Well hello friends!\n", to_do=to_do@entry=20) at /home/sergey/dev/crosshurd64/src/glibc/libio/libioP.h:1031 #7 0x0000000000425959 in _IO_new_do_write (fp=<optimized out>, data=<optimized out>, to_do=20) at fileops.c:425 #8 0x00000000004266e0 in _IO_new_file_sync (fp=0x20000c10) at fileops.c:798 #9 0x0000000000424542 in _IO_fflush (fp=0x20000c10) at /home/sergey/dev/crosshurd64/src/glibc/libio/libioP.h:1031 #10 0x0000000000400bea in main (argc=2, argv=0xbfffffa8) at /home/sergey/dev/mach-bootstrap-hello.c:69 #11 0x000000000040de73 in __libc_start_call_main (argv=0xbfffffa8, argc=2, main=0x400ad2 <main>) at ../sysdeps/generic/libc_start_call_main.h:23 #12 __libc_start_main_impl (main=<optimized out>, argc=2, argv=0xbfffffa8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:360 #13 0x0000000000400961 in _start1 () at ../sysdeps/x86_64/start.S:115 Actually maybe we could make this hack work for the architectures that do use *reloc_addr too: instead of just rewriting the GOT entry and leaving it at that, we'd restore the original pointer (i.e. __memcpy_ifunc) after we've done the early Hurd-specific setup, right before jumping to _start1. Maybe this awesome/horrible trick could be even used to enable ifunc-selected memcpy for i386 -- and not only on the Hurd, but for i386-linux-gnu as well? To reiterate: set the GOT entry to a known-good baseline version very early, then call memcpy the usual way all you like, then before doing _dl_relocate_static_pie reset the GOT entry back to the ifunc resolver. Please tell me if what I'm saying makes sense, I may sound confident, but I'm really not. This really needs someone way more experienced than me to look into it and judge. Sergey