New submission from Richard Purdie <[email protected]>:
We're having some problems with multiprocessing.Queue where the parent process
ends up hanging with zombie children. The code is part of bitbake, the task
execution engine behind OpenEmbedded/Yocto Project.
I've cut down our code to the pieces in question in the attached file. It
doesn't give a runnable test case unfortunately but does at least show what
we're doing. Basically, we have a set of items to parse, we create a set of
multiprocessing.Process() processes to handle the parsing in parallel. Jobs are
queued in one queue and results are fed back to the parent via another. There
is a quit queue that takes sentinels to cause the subprocesses to quit.
If something fails to parse, shutdown with clean=False is called, the sentinels
are sent. the Parser() process calls results.cancel_join_thread() on the
results queue. We do this since we don't care about the results any more, we
just want to ensure everyting exits cleanly. This is where things go wrong. The
Parser processes and their queues all turn into zombies. The parent process
ends up stuck in self.result_queue.get(timeout=0.25) inside shutdown().
strace shows its acquired the locks and is doing a read() on the os.pipe() it
created. Unfortunately since the parent still has a write channel open to the
same pipe, it hangs indefinitely.
If I change the code to do:
self.result_queue._writer.close()
while True:
try:
self.result_queue.get(timeout=0.25)
except (queue.Empty, EOFError):
break
i.e. close the writer side of the pipe by poking at the queue internals, we
don't see the hang. The .close() method would close both sides.
We create our own process pool since this code dates from python 2.x days and
multiprocessing pools had issues back when we started using this. I'm sure it
would be much better now but we're reluctant to change what has basically been
working. We drain the queues since in some cases we have clean shutdowns where
cancel_join_thread() hasn't been used and we don't want those cases to block.
My question is whether this is a known issue and whether there is some kind of
API to close just the write side of the Queue to avoid problems like this?
----------
components: Library (Lib)
files: simplified.py
messages: 376350
nosy: rpurdie
priority: normal
severity: normal
status: open
title: multiprocessing.Queue deadlock
type: crash
versions: Python 3.6
Added file: https://bugs.python.org/file49444/simplified.py
_______________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue41714>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com