Le 07/10/2021 à 16:32, Alex Bennée a écrit : > Hi, > > I came across a use-case this week for ARM although this may be also > applicable to architectures where QEMU's emulation is ahead of the > hardware currently widely available - for example if you want to > exercise SVE code on AArch64. When the linux-user architecture is not > the same as the host architecture then binfmt_misc works perfectly fine. > > However in the case you are running same-on-same you can't use > binfmt_misc to redirect execution to using QEMU because any attempt to > trap native binaries will cause your userspace to hang as binfmt_misc > will be invoked to run the QEMU binary needed to run your application > and a deadlock ensues. > > There are some hacks you can apply at a local level like tweaking the > elf header of the binaries you want to run under emulation and adjusting > the binfmt_mask appropriately. This works but is messy and a faff to > set-up. > > An ideal setup would be would be for the kernel to catch a SIGILL from a > failing user space program and then to re-launch the process using QEMU > with the old processes maps and execution state so it could continue. > However I suspect there are enough moving parts to make this very > fragile (e.g. what happens to the results of library feature probing > code). So two approaches I can think of are: > > Trap execve in QEMU linux-user > ------------------------------ > > We could add a flag to QEMU so at the point of execve it manually > invokes the new process with QEMU, passing on the flag to persist this > behaviour.
Another approach can be to use ptrace(PTRACE_SYSEMU) to catch syscalls. We need a wrapper that loads the first target binary and fork, it attach a ptrace() process and intercept the syscalls to emulate them as we do in usermode linux. I was thinking to this solution for instance to execute big-endian program (like ppc64) on little-endian system (ppc64le). But I'm not sure it fits in what you need... > > Add path mask to binfmt_misc > ---------------------------- > > The other option would be to extend binfmt_misc to have a path mask so > it only applies it's alternative execution scheme to binaries in a > particular section of the file-system (or maybe some sort of pattern?). > > Are there any other approaches you could take? Which do you think has > the most merit? I don't know if it can apply to what you want, but I wrote years ago a binfmt namespace that applies binfmt configuration only on a container but I didn't finish the work (it seems there can be some security issues in what I did): https://lore.kernel.org/lkml/20191216091220.465626-2-laur...@vivier.eu/T/ Thanks, Laurent