On 6/22/2026 9:34 AM, Ziyang Zhang wrote: > Hi, > > This patch adds a single plugin, contrib/plugins/dlcall.c (~240 lines, > no changes to QEMU core), that lets a linux-user guest call functions in the > host's native shared libraries instead of emulating them. > > It is the natural next step on top of the vCPU syscall-filter callback that I > contributed and that was merged earlier: > > > https://lore.kernel.org/qemu-devel/[email protected]/ > > Why bother? Because it turns slow, instruction-by-instruction emulation of a > library into a native host call. Some results, all on completely unmodified > guest binaries: > > * minizip (the stock zlib utility) compresses several times faster, because > the actual deflate runs natively on the host instead of being translated. > * Real OpenGL/Vulkan games run under qemu-user: SuperTuxKart and Hollow > Knight are playable, with their graphics calls going straight to the host > GPU. > > You can watch the demos build, run, and report timings in CI, without checking > anything out: > > https://github.com/rover2024/qemu-passthrough-test/actions/runs/27671747420 > > How it works > ============ > > The guest makes a system call with a reserved number (4096 by default) that no > real Linux ABI uses. Its first argument selects a pass-through operation; the > rest carry operands: > > syscall(4096, op, arg1, arg2, ...) > | | \............ operands (pointers / values) > | \................. which pass-through operation > \....................... the reserved "magic" number > > The plugin registers a vCPU syscall filter: before QEMU forwards a syscall to > the host kernel, the filter runs, sees 4096, performs the operation on the > host, writes the result back, and tells QEMU the syscall is consumed, so the > real kernel never sees it. > > The whole interface is just a handful of primitives: > > * query a host attribute > * dlopen / dlclose a host shared library > * dlsym a symbol, and read the last dlerror > * invoke a resolved host function with a void(void *, void *) signature > > That is all the plugin does. It knows nothing about zlib, X11 or OpenGL, or > about any library's calling convention. > > The same machinery also runs in reverse: when a host function needs to call > back into the guest (a qsort comparator, an allocator, a GUI or game > callback), > control re-enters the guest to run the callback and then resumes the suspended > host call. This reentry is what lets stateful, callback-driven APIs work, not > just leaf functions. > > Why the plugin belongs in QEMU, and the rest does not > ===================================================== > > Only the plugin lives in the tree. Everything else is ordinary userspace: > > --- userspace (out of tree, not tied to any DBT) ------------- > guest: unmodified program -> guest runtime + thunk libs > -------------------------------------------------------------- > | syscall(4096, op, args) (only crossing point) > v > === inside QEMU: THIS PATCH, ~240 lines ====================== > dlcall plugin: dlopen / dlsym / invoke a host fn > ============================================================== > | > v > --- userspace (out of tree) ---------------------------------- > host: host runtime + thunk libs -> real libz / libGL ... > -------------------------------------------------------------- > > A complete reference implementation, with the minizip and OpenGL/X11 examples > above, is here: > > https://github.com/rover2024/qemu-passthrough-test > > The split is deliberate, and it is why only this one file is proposed for the > tree: > > * This plugin defines the most general interaction interface for native > pass-through: the magic-syscall ABI between an emulated guest and its > emulator. That contract is what every pass-through implementation builds > on, so it belongs in a stable, shared place. > * It is also the only piece that is inherently QEMU-specific: it plugs into > QEMU's syscall-filter hook and runs inside the QEMU process. The argument > marshalling, calling conventions, callbacks/reentry and per-library > coverage are not tied to any particular DBT and behave as ordinary > userspace, so they should stay out of tree rather than couple QEMU to > them. > > Background: we presented this approach at KVM Forum 2025, "Lorelei: Enable > QEMU > to Leverage Native Shared Libraries": > > https://www.youtube.com/watch?v=_jioQFm7wyU > > A note on automation > ==================== > > The userspace thunks in that reference implementation are currently > hand-written rather than generated by the LLVM-based toolchain from the talk. > That is a deliberate choice for a demo: the automated toolchain pulls in a > full > LLVM installation, which adds substantial setup time, and the example projects > are slow to build and cannot be reduced to a single Makefile. Hand-writing the > thunks was the cheaper path to a self-contained, reproducible demo, and it was > already enough to get minizip working end to end. > > For real, large-scale use I will rely on the automated toolchain. The key > point > is that hand-written vs. generated thunks is entirely decoupled from this > plugin > and from the magic-syscall interface it defines: the toolchain only emits > out-of-tree userspace code and never touches the in-tree plugin. Automation > becomes mandatory for complex, callback-heavy targets such as the > OpenGL/Vulkan > games, which is the direction this work is heading next. > > It is fully opt-in (loaded with -plugin) and targets linux-user, where the > guest and host already share a trust domain. The test cases use x86_64 guests > and run on x86_64, arm64 and riscv64 Linux hosts. > > The changes to the documentation (docs/about/emulation.rst) will be sent as a > separate patch. > > Feedback on the plugin and on the pass-through approach is welcome. > > Changes since v2: > > * Dropped the RFC tag; the approach was positively received on v2. > * Rebased on master and adjusted the syscall-filter callback to its updated > signature (int64_t sysret, added userdata, dropped the plugin id > argument). > > Changes since v1: > > * Renamed the plugin from "passthrough" to "dlcall" (Pierrick Bouvier). > The old name was too generic; "dlcall" reflects what the plugin actually > does (dlopen/dlsym a host symbol and call it) and avoids confusion with > QEMU's existing plugin hostcall concept (QEMU_PLUGIN_*_HOSTCALL). > * Made the magic syscall number configurable at load time via the > "syscall_num=N" argument, defaulting to 4096 and rejecting values low > enough to clash with a real syscall (Pierrick Bouvier). > > v1: > https://lore.kernel.org/qemu-devel/[email protected]/ > v2: > https://lore.kernel.org/qemu-devel/[email protected]/ > > Thanks, > Ziyang Zhang > > Ziyang Zhang (1): > contrib/plugins: add a minimal dlcall plugin > > contrib/plugins/dlcall.c | 238 ++++++++++++++++++++++++++++++++++++ > contrib/plugins/meson.build | 5 + > 2 files changed, 243 insertions(+) > create mode 100644 contrib/plugins/dlcall.c >
Thanks for the update. At this stage, even though the plugin in itself looks good, it's not something we can really merge in, as it's relying on external behavior that is not specified or being a standalone tool. For instance, if we have a public Lorelei toolchain that implements this for libraries, it would definitely have it's place upstream IMHO. Also, the same kind of contribution can be done for other instrumentation frameworks, so we can end up having a single interface (and set of libraries) shared among all of them. What do you think? Regards, Pierrick
