On Fri, Jun 16, 2017 at 10:44 AM, H.J. Lu <[email protected]> wrote: > On Fri, Jun 16, 2017 at 9:38 AM, Andy Lutomirski <[email protected]> wrote: >> On Fri, Jun 16, 2017 at 9:17 AM, H.J. Lu <[email protected]> wrote: >>> On Fri, Jun 16, 2017 at 9:01 AM, Andy Lutomirski <[email protected]> wrote: >>>> On Thu, Jun 15, 2017 at 9:34 PM, H.J. Lu <[email protected]> wrote: >>>>> On Thu, Jun 15, 2017 at 8:05 PM, Andy Lutomirski <[email protected]> wrote: >>>>>> On Thu, Jun 15, 2017 at 7:17 PM, H.J. Lu <[email protected]> wrote: >>>>>>> On Thu, Jun 15, 2017 at 4:28 PM, Andy Lutomirski <[email protected]> >>>>>>> wrote: >>>>>>>> On Thu, Jun 15, 2017 at 4:11 PM, H.J. Lu <[email protected]> wrote: >>>>>>>>> It is used for lazy binding the first time when an external function >>>>>>>>> is called. >>>>>>>>> >>>>>>>> >>>>>>>> Maybe I'm just being dense, but why? What does ld.so need to do to >>>>>>>> resolve a symbol and update the GOT that requires using extended >>>>>>>> state? >>>>>>> >>>>>>> Since the first 8 vector registers are used to pass function parameters >>>>>>> and ld.so uses vector registers, _dl_runtime_resolve needs to preserve >>>>>>> the first 8 vector registers when transferring control to ld.so. >>>>>>> >>>>>> >>>>>> Wouldn't it be faster and more future-proof to recompile the relevant >>>>>> parts of ld.so to avoid using extended state? >>>>>> >>>>> >>>>> Are you suggesting not to use vector in ld.so? >>>> >>>> Yes, exactly. >>>> >>>>> We used to do that >>>>> several years ago, which leads to some subtle bugs, like >>>>> >>>>> https://sourceware.org/bugzilla/show_bug.cgi?id=15128 >>>> >>>> I don't think x86_64 has the issue that ARM has there. The Linux >>>> kernel, for example, has always been compiled to not use vector or >>>> floating point registers on x86 (32 and 64), and it works fine. Linux >>>> doesn't save extended regs on kernel entry and it doesn't restore them >>>> on exit. >>>> >>>> I would suggest that ld.so be compiled without use of vector >>>> registers, that the normal lazy binding path not try to save any extra >>>> regs, and that ifuncs be called through a thunk that saves whatever >>>> registers need saving, possibly just using XSAVEOPT. After all, ifunc >>>> is used for only a tiny fraction of symbols. >>> >>> x86-64 was the only target which used FOREIGN_CALL macros >>> in ld.so, FOREIGN_CALL macros were the cause of race condition >>> in ld.so: >>> >>> https://sourceware.org/bugzilla/show_bug.cgi?id=11214 >>> >>> Not to save and restore the first 8 vector registers means that >>> FOREIGN_CALL macros have to be used. We don't want to >>> do that on x86-64. >>> >>> >> >> You're talking about this, right: >> >> commit f3dcae82d54e5097e18e1d6ef4ff55c2ea4e621e >> Author: H.J. Lu <[email protected]> >> Date: Tue Aug 25 04:33:54 2015 -0700 >> >> Save and restore vector registers in x86-64 ld.so >> >> It seems to me that the problem wasn't that the save/restore happened >> on some of the time -- it was that the save and restore code used a >> TLS variable to track its own state. Shouldn't it have been a stack >> variable or even just implicit in the control flow? > > No, it can't use stack variable since _dl_runtime_resolve never > returns.
I haven't dug all the way through the source, but surely ifuncs are CALLed, not JMPed to. That means you have a stack somewhere. This stuff is mostly written in C, and local variables should work just fine. > >> In any case, glibc is effectively doing a foreign call anyway, right? > > No. > >> It's doing the foreign call to itself on every lazy binding >> resolution, though, which seems quite expensive. I'm saying that it >> seems like it would be more sensible to do the complicated foreign >> call logic only when doing the dangerous case, which is when lazy >> binding calls an ifunc. >> >> If I were to rewrite this, I would do it like this: >> >> void *call_runtime_ifunc(void (*ifunc)()); // or whatever the >> signature needs to be > > It is unrelated to IFUNC. This is how external function call works. External function call to what external function? Are there any calls to any non-IFUNC external functions that are triggered by runtime resolution? In any event, I still don't understand the issue. The code does this, effectively: PLT -> GOT GOT points to a stub that transfers control to ld.so ld.so resolves the symbol (_dl_fixup, I think) ld.so patches the GOT ld.so jumps to the resolved function As far as I can tell, the only part of the whole process that might touch vector registers at all is elf_ifunc_invoke(). Couldn't all the register saving and restoring be moved to elf_ifunc_invoke()? > >> call_runtime_ifunc would be implemented in asm (or maybe even C!) and >> would use XSAVEOPT or similar to save the state to a buffer on the >> stack. Then it would call the ifunc and restore the state. No TLS >> needed, so there wouldn't be any races. In fact, it would work very >> much like your current save/restore code, except that it wouldn't need >> to be as highly optimized because it would be called much less >> frequently. This should improve performance and could be quite a bit >> simpler. >> >> As an aside, why is saving the first eight registers enough? I don't >> think there's any particular guarantee that a call through the GOT >> uses the psABI, is there? Compilers can and do produce custom calling >> conventions, and ISTM that some day a compiler might do that between >> DSOs. Or those DSOs might not be written in C in the first place. > > The result is undefined if psABI isn't followed. That's unfortunate. Does that mean that, if you use a custom ABI across DSO boundaries, you have to use -z now? > > -- > H.J.

