[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

Actually, you can use gc.get_referents(obj) which returns the direct children 
of obj (and is presumably implemented using tp_traverse).

I will close.

--
resolution:  -> rejected
stage:  -> committed/rejected
status: open -> closed

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread spresse1

spresse1 added the comment:

Oooh, thanks.  I'll use that.

> But really, this sounds rather fragile.

Absolutely.  I concur there is no good way to do this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 03/06/2013 3:07pm, spresse1 wrote:
> I could reimplement the close_all_fds_except() call (in straight python, using
> os.closerange()).  That seems like a reasonable solution, if a bit of a hack.
> However, given that pipes are exposed by multiprocessing, it might make sense
> to try to get this function incorperated into the main version of it?

close_all_fds_except() is already pure python:

try:
MAXFD = os.sysconf("SC_OPEN_MAX")
except:
MAXFD = 256

def close_all_fds_except(fds):
fds = list(fds) + [-1, MAXFD]
fds.sort()
for i in range(len(fds) - 1):
os.closerange(fds[i]+1, fds[i+1])

> I also think that with introspection it would be possible for the 
> subprocessing
> module to be aware of which file descriptors are still actively referenced.
> (ie: 0,1,2 always referenced, introspect through objects in the child to see 
> if
> they have the file.fileno() method) However, I can't state this as a certainty
> without going off and actually implementing such a version.  Additionally, I 
> can
> make absolutely no promises as to the speed of this.  Perhaps, if it 
> functioned,
> it would be an option one could turn on for cases like mine.

So you want a way to visit all objects directly or indirectly referenced 
by the process object, so you can check whether they have a fileno() 
method?  At the C level all object types which support GC define a 
tp_traverse function, so maybe that could be made available from pure 
Python.

But really, this sounds rather fragile.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread spresse1

spresse1 added the comment:

> I don't see how using os.fork() would make things any easier.  In either 
> case you need to prepare a list of fds which the child process should 
> close before it starts, or alternatively a list of fds *not* to close.

With fork() I control where the processes diverge much more readily.  I could 
create the pipe in the main process, fork, close unnecessary fds, then call 
into the class that represents the operation of the subprocess.  (ie: do it the 
c way).  This way the class never needs to know about pipes it doesnt care 
about and I can ensure that unnecessary pipes get closed.  So I get the clean, 
understandable semantics I was after and my pipes get closed.  The only thing I 
lose is windows interoperability.

I could reimplement the close_all_fds_except() call (in straight python, using 
os.closerange()).  That seems like a reasonable solution, if a bit of a hack.  
However, given that pipes are exposed by multiprocessing, it might make sense 
to try to get this function incorperated into the main version of it?

I also think that with introspection it would be possible for the subprocessing 
module to be aware of which file descriptors are still actively referenced.  
(ie: 0,1,2 always referenced, introspect through objects in the child to see if 
they have the file.fileno() method) However, I can't state this as a certainty 
without going off and actually implementing such a version.  Additionally, I 
can make absolutely no promises as to the speed of this.  Perhaps, if it 
functioned, it would be an option one could turn on for cases like mine.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-03 Thread Richard Oudkerk

Richard Oudkerk added the comment:

On 03/06/2013 1:02am, spresse1 wrote:
> Whats really bugging me is that it remains open and I can't fetch a reference.
> If I could do either of these, I'd be happy.
> ...
> Perhaps I really want to be implementing with os.fork().  Sigh, I was trying 
> to
> save myself some effort...

I don't see how using os.fork() would make things any easier.  In either 
case you need to prepare a list of fds which the child process should 
close before it starts, or alternatively a list of fds *not* to close.

The real issue is that there is no way for multiprocessing (or 
os.fork()) to automatically infer which fds the child process is going 
to use: if don't explicitly close unneeded ones then the child process 
will inherit all of them.

It might be helpful if multiprocessing exposed a function to close all 
fds except those specified -- see close_all_fds_except() at

http://hg.python.org/sandbox/sbt/file/5d4397a38445/Lib/multiprocessing/popen_spawn_posix.py#l81

Remembering not to close stdout (fd=1) and stderr (fd=2), you could use 
it like

 def foo(reader):
 close_all_fds_except([1, 2, reader.fileno()])
 ...

 r, w = Pipe(False)
 p = Process(target=foo, args=(r,))

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

I'm actually a nix programmer by trade, so I'm pretty familiar with that 
behavior =p  However, I'm also used to inheriting some way to refer to these 
fds, so that I can close them.  Perhaps I've just missed somewhere a call to 
ask the process for a list of open fds?  This would, to me, be an acceptable 
workaround - I could close all the fds I didn't wish to inherit.

Whats really bugging me is that it remains open and I can't fetch a reference.  
If I could do either of these, I'd be happy.

Maybe this is more an issue with the semantics of multiprocessing?  In that 
this behavior is perfectly reasonable with os.fork() but makes some difficulty 
here.

Perhaps I really want to be implementing with os.fork().  Sigh, I was trying to 
save myself some effort...

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

> What I'm still trying to grasp is why Python explicitly leaves the
> parent processes info around in the child.  It seems like there is
> no benefit (besides, perhaps, speed) and that this choice leads to
> non-intuitive behavior - like this.

The Windows implementation does not use fork() but still exhibits the 
same behaviour in this respect (except in the experimental branch 
mentioned before).  The real issue is that fds/handles will get 
inherited by the child process unless you explicitly close them. 
(Actually on Windows you need to find a way to inject specific handles 
from the parent to child process).

The behaviour you call non-intuitive is natural to someone used to using 
fork() and pipes on Unix.  multiprocessing really started as a 
cross-platform work-around for the lack of fork() on Windows.

Using fork() is also a lot more flexible: many things that work fine on 
Unix will not work correctly on Windows because of pickle-issues.

The main problem with fork() is that forking a process with multiple 
threads can be problematic.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

>> So you're telling me that when I spawn a new child process, I have to 
>> deal with the entirety of my parent process's memory staying around 
>> forever?
>
> With a copy-on-write implementation of fork() this quite likely to use 
> less memory than starting a fresh process for the child process.  And 
> it is certainly much faster.

Fair enough.

>> I would have expected this to call to fork(), which gives the child 
>> plenty of chance to clean up, then call exec() which loads the new 
>> executable.
>
> There is an experimental branch (http://hg.python.org/sandbox/sbt) 
> which optionally behaves like that.  Note that "clean up" means close 
> all fds not explcitly passed, and has nothing to do with garbage 
> collection.

I appreciate the pointer, but I am writing code intended for distribution - 
using an experimental branch isn't useful.

What I'm still trying to grasp is why Python explicitly leaves the parent 
processes info around in the child.  It seems like there is no benefit 
(besides, perhaps, speed) and that this choice leads to non-intuitive behavior 
- like this.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

> So you're telling me that when I spawn a new child process, I have to 
> deal with the entirety of my parent process's memory staying around 
> forever?

With a copy-on-write implementation of fork() this quite likely to use less 
memory than starting a fresh process for the child process.  And it is 
certainly much faster.

> I would have expected this to call to fork(), which gives the child 
> plenty of chance to clean up, then call exec() which loads the new 
> executable.

There is an experimental branch (http://hg.python.org/sandbox/sbt) which 
optionally behaves like that.  Note that "clean up" means close all fds not 
explcitly passed, and has nothing to do with garbage collection.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

So you're telling me that when I spawn a new child process, I have to deal with 
the entirety of my parent process's memory staying around forever?  I would 
have expected this to call to fork(), which gives the child plenty of chance to 
clean up, then call exec() which loads the new executable.  Either that or the 
same instance of the python interpreter is used, just with the knowledge that 
it should execute the child function and then exit.  Keeping all the state that 
will never be used in the second case seems sloppy on the part of python.

The semantics in this case are much better if the pipe gets GC'd.  I see no 
reason my child process should have to know about pipe ends it never uses in 
order to close them.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

> The write end of that pipe goes out of scope and has no references in the 
> child thread.  Therefore, per my understanding, it should be garbage 
> collected (in the child thread).  Where am I wrong about this?

The function which starts the child process by (indirectly) invoking os.fork() 
never gets a chance to finish in the child process, so nothing "goes out of 
scope".

Anyway, relying on garbage collection to close resources for you is always a 
bit dodgy.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

The difference is that nonfunctional.py does not pass the write end of the 
parent's pipe to the child.  functional.py does, and closes it immediately 
after breaking into a new process.  This is what you mentioned to me as a 
workaround.  Corrected code (for indentation) attached.

Why SHOULDN'T I expect this pipe to be closed automatically in the child?  Per 
the documentation for multiprocessing.Connection.close():
"This is called automatically when the connection is garbage collected."

The write end of that pipe goes out of scope and has no references in the child 
thread.  Therefore, per my understanding, it should be garbage collected (in 
the child thread).  Where am I wrong about this?

--
Added file: http://bugs.python.org/file30449/bugon.tar.gz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Richard Oudkerk

Richard Oudkerk added the comment:

The way to deal with this is to pass the write end of the pipe to the child 
process so that the child process can explicitly close it -- there is no reason 
to expect garbage collection to make this happen automatically.

You don't explain the difference between functional.py and nonfunctional.py.  
The most obvious thing is the fact that nonfunctional.py seems to have messed 
up indentation: you have a while loop in the class declaration instead of in 
the run() method.

--
nosy: +sbt

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

spresse1 added the comment:

Now also tested with source-built python 3.3.2.  Issue still exists, same 
example files.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread Matthias Lee

Changes by Matthias Lee :


--
nosy: +madmaze

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18120] multiprocessing: garbage collector fails to GC Pipe() end when spawning child process

2013-06-02 Thread spresse1

New submission from spresse1:

[Code demonstrating issue attached]

When overloading multiprocessing.Process and using pipes, a reference to a pipe 
spawned in the parent is not properly garbage collected in the child.  This 
causes the write end of the pipe to be held open with no reference to it in the 
child process, and therefore no way to close it.  Therefore, it can never throw 
EOFError.

Expected behavior:
1. Create a pipe with multiprocessing.Pipe(False)
2. Pass read end to a class which subclasses multiprocessing.Process
3. Close write end in parent process
4. Receive EOFError from read end

Actual behavior:
1. Create a pipe with multiprocessing.Pipe(False)
2. Pass read end to a class which subclasses multiprocessing.Process
3. Close write end in parent process
4. Never receive EOFError from read end

Examining the processes in /proc/[pid]/fds/ indicates that a write pipe is 
still open in the child process, though none should be.  Additionally, no write 
pipe is open in the parent process.  It is my belief that this is the write 
pipe spawned in the parent, and is remaining around incorrectly in the child, 
though there are no references to it.

Tested on 2.7.3 and 3.2.3

--
components: Library (Lib)
files: bugon.tar.gz
messages: 190492
nosy: spresse1
priority: normal
severity: normal
status: open
title: multiprocessing: garbage collector fails to GC Pipe() end when spawning 
child process
versions: Python 2.7, Python 3.2
Added file: http://bugs.python.org/file30448/bugon.tar.gz

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com