New submission from Alexey Izbyshev <izbys...@ispras.ru>:

This issue is to propose a (complementary) alternative to the usage of 
posix_spawn() in subprocess (see bpo-35537).

As mentioned by Victor Stinner in msg332236, posix_spawn() has the potential of 
being faster and safer than fork()/exec() approach. However, some of the 
currently available implementations of posix_spawn() have technical problems 
(this mostly summarizes discussions in bpo-35537):

* In glibc < 2.24 on Linux, posix_spawn() doesn't report errors to the parent 
properly, breaking existing subprocess behavior.

* In glibc >= 2.25 on Linux, posix_spawn() doesn't report errors to the parent 
in certain environments, such as QEMU user-mode emulation and Windows subsystem 
for Linux.

* In FreeBSD, as of this writing, posix_spawn() doesn't block signals in the 
child process, so a signal handler executed between vfork() and execve() may 
change memory shared with the parent [1].

Regardless of implementation, posix_spawn() is also unsuitable for some 
subprocess use cases:

* posix_spawnp() can't be used directly to implement file searching logic of 
subprocess because of different semantics, requiring workarounds.

* posix_spawn() has no standard way to specify the current working directory 
for the child.

* posix_spawn() has no way to close all file descriptors > 2 in the child, 
which is the *default* mode of operation of subprocess.Popen().

May be even more importantly, fundamentally, posix_spawn() will always be less 
flexible than fork()/exec() approach. Any additions will have to go through 
POSIX standardization or be unportable. Even if approved, a change will take 
years to get to actual users because of the requirement to update the C 
library, which may be more than a decade behind in enterprise Linux distros. 
This is in contrast to having an addition implemented in CPython. For example, 
a setrlimit() action for posix_spawn() is currently rejected in POSIX[2], 
despite being trivial to add.

I'm interested in avoiding posix_spawn() problems on Linux while still 
delivering comparable performance and safety. To that end I've studied 
implementations of posix_spawn() in glibc[3] and musl[4], which use 
vfork()/execve()-like approach, and investigated challenges of using vfork() 
safely on Linux (e.g. [5]) -- all of that for the purpose of using 
vfork()/exec() instead of fork()/exec() or posix_spawn() in subprocess where 
possible.

The unique property of vfork() is that the child shares the address space 
(including heap and stack) as well as thread-local storage with the parent, 
which means that the child must be very careful not to surprise the parent by 
changing the shared resources under its feet. The parent is suspended until the 
child performs execve(), _exit() or dies in any other way.

The most safe way to use vfork() is if one has access to the C library 
internals and can do the the following:

1) Disable thread cancellation before vfork() to ensure that the parent thread 
is not suddenly cancelled by another thread with pthread_cancel() while being 
in the middle of child creation.

2) Block all signals before vfork(). This ensures that no signal handlers are 
run in the child. But the signal mask is preserved by execve(), so the child 
must restore the original signal mask. To do that safely, it must reset 
dispositions of all non-ignored signals to the default, ensuring that no signal 
handlers are executed in the window between restoring the mask and execve().

Note that libc-internal signals should be blocked too, in particular, to avoid 
"setxid problem"[5].

3) Use a separate stack for the child via clone(CLONE_VM|CLONE_VFORK), which 
has exactly the same semantics as vfork(), but allows the caller to provide a 
separate stack. This way potential compiler bugs arising from the fact that 
vfork() returns twice to the same stack frame are avoided.

4) Call only async-signal-safe functions in the child.

In an application, only (1) and (4) can be done easily.

One can't disable internal libc signals for (2) without using syscall(), which 
requires knowledge of the kernel ABI for the particular architecture.

clone(CLONE_VM) can't be used at least before glibc 2.24 because it corrupts 
the glibc pid/tid cache in the parent process[6,7]. (As may be guessed, this 
problem was solved by glibc developers when they implemented posix_spawn() via 
clone()). Even now, the overall message seems to be that clone() is a low-level 
function not intended to be used by applications.

Even with the above, I still think that in context of subprocess/CPython the 
sufficient vfork()-safety requirements are provided by the following.

Despite being easy, (1) seems to be not necessary: CPython never uses 
pthread_cancel() internally, so Python code can't do that. A non-Python thread 
in an embedding app could try, but cancellation, in my knowledge, is not 
supported by CPython in any case (there is no way for an app to cleanup after 
the cancelled thread), so subprocess has no reason to care.

For (2), we don't have to worry about the internal signal used for thread 
cancellation because of the above. The only other internal signal is used for 
setxid syncronization[5]. The "setxid problem" is mitigated in Python because 
the spawning thread holds GIL, so Python code can't call os.setuid() 
concurrently. Again, a non-Python thread could, but I argue that an application 
that spawns a child and calls setuid() in non-synchronized manner is not worth 
supporting: a child will have "random" privileges depending on who wins the 
race, so this is hardly a good security practice. Even if such apps are 
considered worthy to support, we may limit vfork()/exec() path only to the 
non-embedded use case.

For (3), with production-quality compilers, using vfork() should be OK. Both 
GCC and Clang recognize it and handle in a special way (similar to setjmp(), 
which also has "returning twice" semantics). The supporting evidence is that 
Java has been using vfork() for ages, Go has migrated to vfork(), and, 
coincidentally, dotnet is doing it right now[8].

(4) is already done in _posixsubprocess on Linux.

I've implemented a simple proof-of-concept that uses vfork() in subprocess on 
Linux by default in all cases except if preexec_fn is not None. It passes all 
tests on OpenSUSE (Linux 4.15, glibc 2.27) and Ubuntu 14.04 (Linux 4.4, glibc 
2.19), but triggers spurious GCC warnings, probably due to a long-standing GCC 
bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=21161

I've also run a variant of subprocess_bench.py (by Victor Stinner from 
bpo-35537) with close_fds=False and restore_signals=False removed on OpenSUSE:

$ env/bin/python -m perf compare_to fork.json vfork.json
Mean +- std dev: [fork] 154 ms +- 18 ms -> [vfork] 1.23 ms +- 0.04 ms: 125.52x 
faster (-99%)

Compared to posix_spawn, the results on the same machine are similar:

$ env/bin/python -m perf compare_to posix_spawn.json vfork.json
Mean +- std dev: [posix_spawn] 1.24 ms +- 0.04 ms -> [vfork] 1.22 ms +- 0.05 
ms: 1.02x faster (-2%)

Note that my implementation should work even for QEMU user-mode (and probably 
WSL) because it doesn't rely on address space sharing.

Things to do:

* Decide whether pthread_setcancelstate() should be used. I'd be grateful for 
opinions from Python threading experts.

* Decide whether "setxid problem"[5] is important enough to worry about.

* Deal with GCC warnings.

* Test in user-mode QEMU and WSL.

[1] 
https://svnweb.freebsd.org/base/head/lib/libc/gen/posix_spawn.c?view=markup&pathrev=326193
[2] http://austingroupbugs.net/view.php?id=603
[3] 
https://sourceware.org/git/?p=glibc.git;a=history;f=sysdeps/unix/sysv/linux/spawni.c;h=353bcf5b333457d191320e358d35775a2e9b319b;hb=HEAD
[4] http://git.musl-libc.org/cgit/musl/log/src/process/posix_spawn.c
[5] https://ewontfix.com/7
[6] https://sourceware.org/bugzilla/show_bug.cgi?id=10311
[7] https://sourceware.org/bugzilla/show_bug.cgi?id=18862
[8] https://github.com/dotnet/corefx/pull/33289

----------
components: Extension Modules
messages: 334336
nosy: gregory.p.smith, izbyshev, pablogsal, vstinner
priority: normal
severity: normal
status: open
title: Use vfork() in subprocess on Linux
type: enhancement
versions: Python 3.8

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35823>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to