Bugs item #1183780, was opened at 2005-04-15 16:27
Message generated for change (Comment added) made by loewis
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1183780&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Submitted By: Taale Skogan (tskogan)
Assigned to: Neal Norwitz (nnorwitz)
Summary: Popen4 wait() fails sporadically with threads

Initial Comment:
Calling wait() on a popen2.Popen4 object fails 
intermittently with the error

Traceback (most recent call last):
  ...
  File "/usr/local/lib/python2.3/popen2.py", line 90, in wait
    pid, sts = os.waitpid(self.pid, 0)
OSError: [Errno 10] No child processes

when using threads. 

The problem seems to be a race condition when a thread 
calls wait() on a popen2.Popen4 object. This also apllies 
to Popen3 objects. 

The constructor of Popen4. calls _cleanup() which calls 
poll() which calls the system call waitpid() for all acitve 
child processes. If another thread calls poll() before the 
current thread calls wait() on it's child process and the 
child process has terminated, the child process is no 
longer waitable and the second call to wait() fails.

Code to replicate this behavoir is attached in popen_bug.
py.

Solution: Popen4 and Popen3 should be threadsafe.

Related modules: A seemingly related error occurs with 
Popen from the new subprocess module. Use the -s 
option in the popen_bug.py script to test this. 

Tested on Linux RedHat Enterprise 3 for Python 2.3.3, 
Python 2.3.5 and Python 2.4.1 and Solaris for Python 2.
4.1. The error did not occur on a RedHat 7.3 machine 
with Python 2.3.5. See the attached file popen_bug.py for 
details on the platforms.



----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2006-03-24 01:37

Message:
Logged In: YES 
user_id=21627

I don't understand why you are setting self.sts to 0 if wait
fails: most likely, there was a simultaneous call to .poll,
which should have set self.sts to the real return value. So
we should return that instead.

I think the whole issue can be avoid if we use resurrection:
If __del__ would put unwaited objects into _active, rather
than __init__, it would not happen that _cleanup polls a pid
which a thread still intends to wait for. In fact, it would
be sufficient to only put the pid into _active (avoiding the
need for resurrection).

If then a thread calls poll explicitly, and another calls
wait, they deserve to lose (with ECHILD). I would claim the
same error exists if part of the application calls
os.wait[3,4](), and another calls .wait later - they also
deserve the exception.

With that approach, I don't think further thread
synchronization would be needed.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-23 09:41

Message:
Logged In: YES 
user_id=33168

The attached patch fixes the problem for me.  It also
addresses another issue where wait could be called from
outside the popen2 module.  I'm not sure this is the best
solution.  I'm not sure there really is a good solution. 
Perhaps it's best to allow an exception to be raised?

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1183780&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to