[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: Oooh, thanks. I'll use that. > But really, this sounds rather fragile. Absolutely. I concur there is no good way to do this. -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: > I don't see how using os.fork() would make things any easier. In either > case you need to prepare a list of fds which the child process should > close before it starts, or alternatively a list of fds *not* to close. With fork() I control where the processes diverge much more readily. I could create the pipe in the main process, fork, close unnecessary fds, then call into the class that represents the operation of the subprocess. (ie: do it the c way). This way the class never needs to know about pipes it doesnt care about and I can ensure that unnecessary pipes get closed. So I get the clean, understandable semantics I was after and my pipes get closed. The only thing I lose is windows interoperability. I could reimplement the close_all_fds_except() call (in straight python, using os.closerange()). That seems like a reasonable solution, if a bit of a hack. However, given that pipes are exposed by multiprocessing, it might make sense to try to get this function incorperated into the main version of it? I also think that with introspection it would be possible for the subprocessing module to be aware of which file descriptors are still actively referenced. (ie: 0,1,2 always referenced, introspect through objects in the child to see if they have the file.fileno() method) However, I can't state this as a certainty without going off and actually implementing such a version. Additionally, I can make absolutely no promises as to the speed of this. Perhaps, if it functioned, it would be an option one could turn on for cases like mine. -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: I'm actually a nix programmer by trade, so I'm pretty familiar with that behavior =p However, I'm also used to inheriting some way to refer to these fds, so that I can close them. Perhaps I've just missed somewhere a call to ask the process for a list of open fds? This would, to me, be an acceptable workaround - I could close all the fds I didn't wish to inherit. Whats really bugging me is that it remains open and I can't fetch a reference. If I could do either of these, I'd be happy. Maybe this is more an issue with the semantics of multiprocessing? In that this behavior is perfectly reasonable with os.fork() but makes some difficulty here. Perhaps I really want to be implementing with os.fork(). Sigh, I was trying to save myself some effort... -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: >> So you're telling me that when I spawn a new child process, I have to >> deal with the entirety of my parent process's memory staying around >> forever? > > With a copy-on-write implementation of fork() this quite likely to use > less memory than starting a fresh process for the child process. And > it is certainly much faster. Fair enough. >> I would have expected this to call to fork(), which gives the child >> plenty of chance to clean up, then call exec() which loads the new >> executable. > > There is an experimental branch (http://hg.python.org/sandbox/sbt) > which optionally behaves like that. Note that "clean up" means close > all fds not explcitly passed, and has nothing to do with garbage > collection. I appreciate the pointer, but I am writing code intended for distribution - using an experimental branch isn't useful. What I'm still trying to grasp is why Python explicitly leaves the parent processes info around in the child. It seems like there is no benefit (besides, perhaps, speed) and that this choice leads to non-intuitive behavior - like this. -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: So you're telling me that when I spawn a new child process, I have to deal with the entirety of my parent process's memory staying around forever? I would have expected this to call to fork(), which gives the child plenty of chance to clean up, then call exec() which loads the new executable. Either that or the same instance of the python interpreter is used, just with the knowledge that it should execute the child function and then exit. Keeping all the state that will never be used in the second case seems sloppy on the part of python. The semantics in this case are much better if the pipe gets GC'd. I see no reason my child process should have to know about pipe ends it never uses in order to close them. -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: The difference is that nonfunctional.py does not pass the write end of the parent's pipe to the child. functional.py does, and closes it immediately after breaking into a new process. This is what you mentioned to me as a workaround. Corrected code (for indentation) attached. Why SHOULDN'T I expect this pipe to be closed automatically in the child? Per the documentation for multiprocessing.Connection.close(): "This is called automatically when the connection is garbage collected." The write end of that pipe goes out of scope and has no references in the child thread. Therefore, per my understanding, it should be garbage collected (in the child thread). Where am I wrong about this? -- Added file: http://bugs.python.org/file30449/bugon.tar.gz ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
spresse1 added the comment: Now also tested with source-built python 3.3.2. Issue still exists, same example files. -- ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process
New submission from spresse1: [Code demonstrating issue attached] When overloading multiprocessing.Process and using pipes, a reference to a pipe spawned in the parent is not properly garbage collected in the child. This causes the write end of the pipe to be held open with no reference to it in the child process, and therefore no way to close it. Therefore, it can never throw EOFError. Expected behavior: 1. Create a pipe with multiprocessing.Pipe(False) 2. Pass read end to a class which subclasses multiprocessing.Process 3. Close write end in parent process 4. Receive EOFError from read end Actual behavior: 1. Create a pipe with multiprocessing.Pipe(False) 2. Pass read end to a class which subclasses multiprocessing.Process 3. Close write end in parent process 4. Never receive EOFError from read end Examining the processes in /proc/[pid]/fds/ indicates that a write pipe is still open in the child process, though none should be. Additionally, no write pipe is open in the parent process. It is my belief that this is the write pipe spawned in the parent, and is remaining around incorrectly in the child, though there are no references to it. Tested on 2.7.3 and 3.2.3 -- components: Library (Lib) files: bugon.tar.gz messages: 190492 nosy: spresse1 priority: normal severity: normal status: open title: multiprocessing: garbage collector fails to GC Pipe() end when spawning child process versions: Python 2.7, Python 3.2 Added file: http://bugs.python.org/file30448/bugon.tar.gz ___ Python tracker <http://bugs.python.org/issue18120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com