[issue35657] multiprocessing.Process.join() ignores timeout if child process use os.exec*()

Josh Rosenberg Fri, 04 Jan 2019 07:20:35 -0800


Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment:


Looks like the cause of the change was when os.pipe was changed to create 
non-inheritable pipes by default; if I monkey-patch 
multiprocessing.popen_fork.Popen._launch to use os.pipe2(0) instead of 
os.pipe() to get inheritable descriptors or just clear FD_CLOEXEC in the child 
with fcntl.fcntl(child_w, fcntl.F_SETFD, 0), the behavior returns to Python 2's 
behavior.

The problem is caused by the mismatch in lifetimes between the pipe fd and the 
child process itself; normally the pipe lives as long as the child process 
(it's never actually touched in the child process at all, so it just dies with 
the child), but when exec gets involved, the pipe is closed long before the 
child ends.

The code in Popen.wait that is commented with "This shouldn't block if wait() 
returned successfully" is probably the issue; wait() first waits on the parent 
side of the pipe fd, which returns immediately when the child execs and the 
pipe is closed. The code is assumes the poll on the process itself can be run 
in blocking (since the process should have ended already) but this assumption 
is wrong of course.

Possible solutions:

1. No code changes; document that exec in worker processes is unsupported (use 
subprocess, possibly with a preexec_fn, for this use case).

2. Precede the call to process_obj._bootstrap() in the child with 
fcntl.fcntl(child_w, fcntl.F_SETFD, 0) to clear the CLOEXEC flag on the child's 
descriptor, so the file descriptor remains open in the child post-exec. Using 
os.pipe2(0) instead of os.pipe() in _launch would also work and restore the 
precise 3.3 and earlier behavior, but it would introduce reintroduce race 
conditions with parent threads, so it's better to limit the scope to the child 
process alone, for the child's version of the fd alone.

3. Change multiprocessing.popen_fork.Popen.wait to use os.WNOHANG for all calls 
with a non-None timeout (not just timeout=0.0), rather than trusting 
multiprocessing.connection.wait's return value (which only says whether the 
pipe is closed, not whether the process is closed). Problem is, this would just 
change the behavior from waiting for the lifetime of the child no matter what 
to waiting until the exec and then returning immediately, even well before the 
timeout; it might also introduce race conditions if the fd registers as being 
closed before the process is fully exited. Point is, this approach would likely 
require a lot of subtle tweaks to make it work.

I'm in favor of either #1 or #2. #2 feels like a intentionally opening a 
resource leak on the surface, but I think it's actually fine, since we already 
signed up for a file descriptor that would live for the life of the process; 
the fact that it's exec-ed seems sort of irrelevant.

----------
keywords: +3.4regression

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35657>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35657] multiprocessing.Process.join() ignores timeout if child process use os.exec*()

Reply via email to