On Fri, Jan 06, 2023 at 11:55:16AM +0800, Paul Wise wrote: > The new io_uring_spawn mechanism for spawning processes without forking > should be more efficient than fork+exec, especially when starting small > processes from large processes. Also posix_spawn and vfork+exec exist. > > https://lwn.net/Articles/908268/ > > I think the order of preference for spawning processes should be: > > * io_uring_spawn: this is Linux-only and only in new versions. Prefer > this over posix_spawn in case of an old glibc and new Linux kernel.
What I've heard of this sounds good, but as far as I can tell this is not in upstream Linux, there are no patches anywhere to be found, no API documentation, and the only references to the feature I can find anywhere in a web search are all references to this one presentation with no further detail. It's of course possible that I've missed something, but from what I can see it's far too early to even be able to decide whether this would be usable, never mind being able to make use of it on real systems. > * posix_spawn: this uses the appropriate mechanisms on each platform, > glibc might be changing this to use io_uring_spawn where possible. I can see a few limitations here: * The standard API offers no way to set the working directory of the child process, which would be needed for pipecmd_chdir and pipecmd_fchdir. However, glibc 2.29 added posix_spawn_file_actions_addchdir_np and posix_spawn_file_actions_addfchdir_np as GNU extensions. * I'm not totally sure how to translate pipecmd_nice into posix_spawn-speak; the documentation is, uh, opaque. It's probably possible. * This wouldn't be usable for pipeline commands created using pipecmd_new_sequence, as posix_spawn isn't guaranteed to be async-signal-safe so can't be called between fork and exec, unlike fork. However, we could always just restrict the conditions under which posix_spawn is used, much as GLib's g_spawn_* functions do. None of the above features are used on mandb's hot path, for instance. > * vfork+exec: this is similar to what glibc does for posix_spawn. If somebody were to present me with a patch for this then I suppose I might at least consider it (though with a healthy amount of scepticism!); but it's difficult, and I'm not sure I have the necessary skills to review it properly. glibc's posix_spawn implementation has this moderately fearsome comment at the top: /* The Linux implementation of posix_spawn{p} uses the clone syscall directly with CLONE_VM and CLONE_VFORK flags and an allocated stack. The new stack and start function solves most the vfork limitation (possible parent clobber due stack spilling). The remaining issue are: 1. That no signal handlers must run in child context, to avoid corrupting parent's state. 2. The parent must ensure child's stack freeing. 3. Child must synchronize with parent to enforce 2. and to possible return execv issues. The first issue is solved by blocking all signals in child, even the NPTL-internal ones (SIGCANCEL and SIGSETXID). The second and third issue is done by a stack allocation in parent, and by using a field in struct spawn_args where the child can write an error code. CLONE_VFORK ensures that the parent does not run until the child has either exec'ed successfully or exited. */ Do I really want that complexity in libpipeline? I'm not sure that I do. It's certainly not close to being a drop-in replacement for fork. posix_spawn, maybe with GNU extensions, looks like a more appealing option. -- Colin Watson (he/him) [cjwat...@debian.org]