[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: Looks like Mark's patch was submitted as part of r66115 by Ben accidentally (as part of reverting r66114). I can confirm this patch resolves the incref issue as-is. I've run test_multiprocessing in a loop for about an hour now with no failures. Marking closed/fixed -- resolution: - fixed status: open - closed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: r66161 on py3k merges forward the fix for this ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Changes by Mark Dickinson [EMAIL PROTECTED]: -- keywords: +needs review ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Changes by Barry A. Warsaw [EMAIL PROTECTED]: -- priority: deferred blocker - release blocker ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Here's a patch that fixes the incref problem for me. This *definitely* needs review from someone more familiar with the multiprocessing module than I am; I'm not at all confident that it won't break something else. The original problem: the Server.create method creates shared objects with an effective reference count of 0, making them vulnerable to premature garbage collection. The solution in the patch: create shared objects with a reference count of 1, and make it the responsibility of the caller of Server.create() to do a decref later (once a ProxyObject for the created shared object has been initialized). In effect, Server.create creates and returns a reference, and the caller then owns that reference and is responsible for disposing of it. Added file: http://bugs.python.org/file11138/issue3419.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: [jnoller] As for if this should block the release, it should be resolved before final, but should not block the next beta. So the priority should be 'deferred blocker', no? [cartman] With trunk when running test_multiprocessing in a tight loop I saw another problem: I think this really belongs in a separate issue. See issue 3578. -- priority: - deferred blocker ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Ismail Donmez [EMAIL PROTECTED] added the comment: With trunk when running test_multiprocessing in a tight loop I saw another problem: test_multiprocessing Process Process-61: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap Process Process-60: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap Process Process-62: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 229, in _bootstrap util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers util._run_after_forkers() File /Users/cartman/Sources/py3k/Lib/multiprocessing/util.py, line 138, in _run_after_forkers items = list(_afterfork_registry.items()) items = list(_afterfork_registry.items()) File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items items = list(_afterfork_registry.items()) File /Users/cartman/Sources/py3k/Lib/weakref.py, line 103, in items for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration for key, wr in self.data.items(): RuntimeError: dictionary changed size during iteration The original problem itself seems to be fixed, so cheers for that! ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: Hey Mark - I took a look at the mp_nohang patch you added - and you're right, the connection refused error numbers are different from platform to platform. So rather than use a static list, I switched the code to use the errno module. Can you apply this and confirm it works for you? Added file: http://bugs.python.org/file11100/mp_nohang_jnoller.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: So rather than use a static list, I switched the code to use the errno module. Yup. That's definitely a better solution. Your patch fixed the problem for me. Thanks! ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: Thanks Mark, I've checked in the connection refused fix in r65645 on trunk ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Changes by Jesse Noller [EMAIL PROTECTED]: Removed file: http://bugs.python.org/file11038/mp_nohang.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Hmmm. That last message wasn't too coherent; I really shouldn't try to post at 2:30am. Summary: the refcounting logic in the Server class is flawed. In Server.create(), the initial refcount of a newly-created shared object is set to None. This is dangerous: the moment another thread gets a look-in, that refcount can be incremented to 1, then decremented to 0, at which point the shared object gets disposed of (by the code in Server.decref). And all this can happen before the Proxy object for the shared object gets initialized, at which point the KeyError occurs. (Not much better.) Can anyone suggest a way to fix this? I can't see any easy fix. Nevertheless, I consider this a serious flaw that should be fixed before 2.6 and 3.0 are released. I've attached a minimal file that produces the incref error (on about 1 run out of every 5) for me. Added file: http://bugs.python.org/file11075/multibug3.py ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Adding Richard Oudkerk to the nosy list in case he can shed any light on this. Should this be considered a release-blocker? -- nosy: +roudkerk ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: On Aug 7, 2008, at 5:26 PM, Mark Dickinson [EMAIL PROTECTED] wrote: Mark Dickinson [EMAIL PROTECTED] added the comment: Adding Richard Oudkerk to the nosy list in case he can shed any light on this. Should this be considered a release-blocker? I believe Richard is on summer break - I've dropped him a line on the open issues I need his input on. As for if this should block the release, it should be resolved before final, but should not block the next beta. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: I believe the conn refused error is another race, the child processes are connecting to a manager which is shutting down/gone After some research, I think it actually has to do with the value of the 'backlog' parameter when creating a Listener instance: if there are too many queued requests to a socket then further connection requests are refused; the 'backlog' parameter appears to set the maximum size of the queue. ...or something like that. I'm a novice here... :-) Anyway, in the __init__ method of the Server class, in managers.py, the Server creates a Listener instance with the line self.listener = Listener(address=address, backlog=5) When I change backlog to 50 instead of 5, I don't see the connection refused error any more. I'm not suggesting that the backlog value should be changed---it probably shouldn't. But this does at least explain why connections can be refused, and why they should be retried in that case. I do still think that the patch, or something like it, should be applied. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Here's what's going on with the incref error: Look in the Server class, in managers.py: consider a shared object with id 'id'. When a reference to the shared object is created, its id is added to the id_to_refcount dictionary: {id: None} and *shortly afterwards*, the refcount is incremented to get: {id: 1} When the object is deleted or goes out of scope later on, the refcount is decref'd, and id is removed entirely from id_to_refcount. The problem occurs when there are two processes simultaneously asking for access to the same object. If a second process happens to decref 'id' in between the first process creating the reference and incrementing it, then the incref from the first process will fail. This is exactly what's happening (some of the time) in test_remote in the test_suite. The failing sequence of operations is: initial state: id not in id_to_refcount (process 1): createid_to_refcount[id] is None (process 1): increfid_to_refcount[id] == 1 (process 2): createid_to_refcount[id] == 1 (process 1): decrefid not in id_to_refcount (process 2): increfKeyError! (process 2): decref(never get here...) I'm not really familiar enough with the whole multiprocessing module to know what the right fix for this is. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Are you looking at the conn refused or the incref error? The connection refused error. The attached patch fixes the problem, for me. On my machine, the connection refused error code was 61 rather than 10061. With this patch, I'm no longer seeing any hangs in test_multiprocessing.py (at least, not in the last 500 runs :-)). (Though I am still seeing the incref error occasionally.) If anyone's prepared to answer a stupid question: I'm curious why failed socket connections occur at all. Is connecting to a socket generally considered an unreliable operation, or is there some aspect of the multiprocessing module that makes it potentially unreliable? -- keywords: +patch Added file: http://bugs.python.org/file11038/mp_nohang.patch ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: I'm also seeing the incref error occasionally, on OS X 10.5.4: == ERROR: test_remote (__main__.WithManagerTestRemoteManager) -- Traceback (most recent call last): File Lib/test/test_multiprocessing.py, line 1157, in test_remote queue = manager2.get_queue() File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 635, in temp authkey=self._authkey, exposed=exp File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 887, in AutoProxy incref=incref) File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 696, in __init__ self._incref() File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 743, in _incref dispatch(conn, None, 'incref', (self._id,)) File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 79, in dispatch raise convert_to_error(kind, result) RemoteError: --- Traceback (most recent call last): File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 181, in handle_request result = func(c, *args, **kwds) File /Users/dickinsm/python_source/trunk/Lib/multiprocessing/managers.py, line 397, in incref self.id_to_refcount[ident] += 1 KeyError: '6fdb78' --- -- nosy: +marketdickinson ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: For the connection refused error, the following script fails for me (again, on OS X 10.5.4) with the message: errno: [Errno 61] Connection refused on most runs. (If xrange(10) is replaced by a smaller range like xrange(4) then the script usually runs without error.) Looks like something in semaphore.c needs fixing, but I've no idea what. from multiprocessing import Process, Manager sleeping = Manager().Semaphore(0) def f(): sleeping.release() for _ in xrange(10): Process(target=f).start() for _ in xrange(10): sleeping.acquire() ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Mark Dickinson [EMAIL PROTECTED] added the comment: Looks like something in semaphore.c needs fixing, but I've no idea what. I take that back. It's nothing to do with semaphore.c. I'll keep trying. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: Ben's incref error != the connection refused error. Here's some information from Scott Leerssen who is using the original .52 code: Hi Jesse, I am having some trouble with the processing module (0.52) and thought you might be of some help since you were so involved with it in the PEP process. I have a queue of Process objects from which a runner thread pulls and starts processes. The producers for the queue are many, many threads. I'm seeing some very odd behavior when lots of processes try to start while others are being reaped at the same time. I see tracebacks that look like the one below, and also occasionally get hangs where the process module seems to take over the GIL and never give it back. I did find a race condition bug in process.py where a _cleaner function is run in start(), potentially joining on processes that other threads may also be joining at the same time, which leads to KeyError exceptions coming from _cleaner. I mutexed all my starts and joins to get around that for now, but am stumped on the issue below. Does it look familiar to you? Thanks, and feel free to shrug and reply beats me. Scott [dmbd:32451:-1236096096] 2008-07-31 16:52:10,609 SYSMESG Process Process-24: [dmbd:32451:-1236096096] 2008-07-31 16:52:10,612 SYSMESG [dmbd:32451:-1236096096] 2008-07-31 16:52:10,613 SYSMESG Traceback (most recent call last): [dmbd:32451:-1236096096] 2008-07-31 16:52:10,613 SYSMESGFile /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/process.py, line 224, in _bootstrap [dmbd:32451:-1236096096] 2008-07-31 16:52:10,615 SYSMESG _runAfterForkers() [dmbd:32451:-1236096096] 2008-07-31 16:52:10,616 SYSMESGFile /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/process.py, line 300, in _runAfterForkers [dmbd:32451:-1236096096] 2008-07-31 16:52:10,617 SYSMESG func(obj) [dmbd:32451:-1236096096] 2008-07-31 16:52:10,617 SYSMESGFile /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/managers.py, line 806, in _afterFork [dmbd:32451:-1236096096] 2008-07-31 16:52:10,620 SYSMESG self._incref() [dmbd:32451:-1236096096] 2008-07-31 16:52:10,621 SYSMESGFile /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/managers.py, line 760, in _incref [dmbd:32451:-1236096096] 2008-07-31 16:52:10,621 SYSMESG dispatch(connection, None, 'incref', (self._id,)) [dmbd:32451:-1236096096] 2008-07-31 16:52:10,622 SYSMESGFile /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/managers.py, line 92, in dispatch [dmbd:32451:-1236096096] 2008-07-31 16:52:10,623 SYSMESG raise result [dmbd:32451:-1236096096] 2008-07-31 16:52:10,624 SYSMESG RemoteError: --- Remote Traceback (most recent call last): File /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/managers.py, line 190, in handleRequest result = func(c, *args, **kwds) File /opt/race/share/sw/os/Linux_2.6_i686/python/lib/python2.5/site- packages/processing/managers.py, line 404, in incref self.id_to_refcount[ident] += 1 KeyError: 2880341100L ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
New submission from Ismail Donmez [EMAIL PROTECTED]: Tested on MacOSX 10.5.4, running test_multiprocessing in a tight loop : [~/Sources/py3k] while true;do ./python ./Lib/test/regrtest.py test_multiprocessing;done test_multiprocessing 1 test OK. test_multiprocessing 1 test OK. test_multiprocessing 1 test OK. test_multiprocessing Process Process-48: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 720, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection' During handling of the above exception, another exception occurred: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 232, in _bootstrap self.run() File /Users/cartman/Sources/py3k/Lib/multiprocessing/process.py, line 88, in run self._target(*self._args, **self._kwargs) File /Users/cartman/Sources/py3k/Lib/test/test_multiprocessing.py, line 601, in f cond.acquire() File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 949, in acquire return self._callmethod('acquire', (blocking,)) File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 724, in _callmethod self._connect() File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 711, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File /Users/cartman/Sources/py3k/Lib/multiprocessing/connection.py, line 133, in Client c = SocketClient(address) File /Users/cartman/Sources/py3k/Lib/multiprocessing/connection.py, line 254, in SocketClient s.connect(address) socket.error: [Errno 61] Connection refused Exception in thread Thread-58: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 720, in _callmethod conn = self._tls.connection AttributeError: 'ForkAwareLocal' object has no attribute 'connection' During handling of the above exception, another exception occurred: Traceback (most recent call last): File /Users/cartman/Sources/py3k/Lib/threading.py, line 492, in _bootstrap_inner self.run() File /Users/cartman/Sources/py3k/Lib/threading.py, line 447, in run self._target(*self._args, **self._kwargs) File /Users/cartman/Sources/py3k/Lib/test/test_multiprocessing.py, line 601, in f cond.acquire() File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 949, in acquire return self._callmethod('acquire', (blocking,)) File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 724, in _callmethod self._connect() File /Users/cartman/Sources/py3k/Lib/multiprocessing/managers.py, line 711, in _connect conn = self._Client(self._token.address, authkey=self._authkey) File /Users/cartman/Sources/py3k/Lib/multiprocessing/connection.py, line 133, in Client c = SocketClient(address) File /Users/cartman/Sources/py3k/Lib/multiprocessing/connection.py, line 254, in SocketClient s.connect(address) socket.error: [Errno 61] Connection refused -- components: Tests messages: 70053 nosy: cartman severity: normal status: open title: multiprocessing module is racy versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Benjamin Peterson [EMAIL PROTECTED] added the comment: When running test_multiprocessing in a loop, I see: test test_multiprocessing failed -- Traceback (most recent call last): File /temp/python/trunk/Lib/test/test_multiprocessing.py, line 1157, in test_remote queue = manager2.get_queue() File /temp/python/trunk/Lib/multiprocessing/managers.py, line 635, in temp authkey=self._authkey, exposed=exp File /temp/python/trunk/Lib/multiprocessing/managers.py, line 887, in AutoProxy incref=incref) File /temp/python/trunk/Lib/multiprocessing/managers.py, line 696, in __init__ self._incref() File /temp/python/trunk/Lib/multiprocessing/managers.py, line 743, in _incref dispatch(conn, None, 'incref', (self._id,)) File /temp/python/trunk/Lib/multiprocessing/managers.py, line 79, in dispatch raise convert_to_error(kind, result) RemoteError: --- Traceback (most recent call last): File /temp/python/trunk/Lib/multiprocessing/managers.py, line 181, in handle_request result = func(c, *args, **kwds) File /temp/python/trunk/Lib/multiprocessing/managers.py, line 397, in incref self.id_to_refcount[ident] += 1 KeyError: '1033760' --- -- assignee: - jnoller nosy: +benjamin.peterson, jnoller ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3419] multiprocessing module is racy
Jesse Noller [EMAIL PROTECTED] added the comment: This also happens under trunk -- versions: +Python 2.6 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3419 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com