[ 
https://issues.apache.org/jira/browse/QPID-7317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15861827#comment-15861827
 ] 

Alan Conway edited comment on QPID-7317 at 2/10/17 8:48 PM:
------------------------------------------------------------

I hope this will address the pulp hang in the wild, I have been unable to 
reproduce it with the fix.

Note you can apply this patch by replacing 
/usr/lib/python2.7/site-packages/qpid/selector.py with 

https://git-wip-us.apache.org/repos/asf?p=qpid-python.git;a=blob_plain;f=qpid/selector.py;h=56b137d1a05da4875c45edcab60cde8084bfed8f;hb=fda9594
   

it is the only file modified and should work with and version of python-qpid 
released in the last year.

If you do see this hang again please report to this JIRA with the output of the 
following commands from the machine where the hung celery workers are:
{code}
> rpm -q python-qpid  # or attach a copy of 
> /usr/lib/python2.7/site-packages/qpid/selector.py
> journalctl         #  use --since and --until to get a few minutes 
> before/after the hang
> yum install -y gdb python-debug # needed for worker-stacks script
> worker-stacks # script attached to this JIRA
{code}
Here is log output showing that pulp does indeed use qpid.messaging in an 
illegal state that could have caused a hang prior to this fix. However it is 
not an exact match for reported stack traces so I'm not yet 100% sure the 
problem is solved. I am not able to reproduce the original hang or traces that 
look like it with the fix.

{code}
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
illegal use of qpid.messaging at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File 
"/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.__bootstrap_inner()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.run()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 
55, in run
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.monitor_events()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 
82, in monitor_events
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
recv.capture(limit=None, timeout=None, wakeup=True)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 715, in 
__exit__
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.release()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 330, in 
release
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 298, in _close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._do_close_self()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 288, in 
_do_close_self
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.maybe_close_channel(self._default_channel)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 280, in 
maybe_close_channel
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
channel.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 983, in 
close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._broker.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpidtoollibs/broker.py", line 48, in 
close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.sess.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 213, in log_raise
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
_check(exception, 1)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging was previously stopped at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File 
"/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
func(*targs, **kargs)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 184, in stop
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: pulp.server.async.scheduler:ERROR: 
qpid.messaging thread has been stopped
{code}


was (Author: aconway):
I hope this will address the pulp hang in the wild, I have been unable to 
reproduce it with the fix.

Note you can apply this patch by replacing 
/usr/lib/python2.7/site-packages/qpid/selector.py with the patched file, it is 
the only file modified and should work with and version of python-qpid released 
in the last year.

If you do see this hang again please report to this JIRA with the output of the 
following commands from the machine where the hung celery workers are:
{code}
> rpm -q python-qpid  # or attach a copy of 
> /usr/lib/python2.7/site-packages/qpid/selector.py
> journalctl         #  use --since and --until to get a few minutes 
> before/after the hang
> yum install -y gdb python-debug # needed for worker-stacks script
> worker-stacks # script attached to this JIRA
{code}
Here is log output showing that pulp does indeed use qpid.messaging in an 
illegal state that could have caused a hang prior to this fix. However it is 
not an exact match for reported stack traces so I'm not yet 100% sure the 
problem is solved. I am not able to reproduce the original hang or traces that 
look like it with the fix.

{code}
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
illegal use of qpid.messaging at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File 
"/usr/lib64/python2.7/threading.py", line 784, in __bootstrap
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.__bootstrap_inner()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib64/python2.7/threading.py", line 811, in __bootstrap_inner
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.run()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 
55, in run
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.monitor_events()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/pulp/server/async/scheduler.py", line 
82, in monitor_events
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
recv.capture(limit=None, timeout=None, wakeup=True)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 715, in 
__exit__
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.release()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 330, in 
release
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 298, in _close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._do_close_self()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 288, in 
_do_close_self
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.maybe_close_channel(self._default_channel)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/connection.py", line 280, in 
maybe_close_channel
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
channel.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/kombu/transport/qpid.py", line 983, in 
close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self._broker.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpidtoollibs/broker.py", line 48, in 
close
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.sess.close()
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 213, in log_raise
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
_check(exception, 1)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging was previously stopped at:
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) File 
"/usr/lib64/python2.7/atexit.py", line 24, in _run_exitfuncs
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
func(*targs, **kargs)
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)   
File "/usr/lib/python2.7/site-packages/qpid/selector.py", line 184, in stop
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416)     
self.stopped = SelectorStopped("qpid.messaging thread has been stopped")
Feb 10 14:50:13 pulp-server pulp[7427]: qpid.messaging:ERROR: (7427-28416) 
qpid.messaging thread has been stopped
Feb 10 14:50:13 pulp-server pulp[7427]: pulp.server.async.scheduler:ERROR: 
qpid.messaging thread has been stopped
{code}

> Deadlock on publish
> -------------------
>
>                 Key: QPID-7317
>                 URL: https://issues.apache.org/jira/browse/QPID-7317
>             Project: Qpid
>          Issue Type: Bug
>          Components: Python Client
>    Affects Versions: 0.32
>         Environment: python-qpid-0.32-13.fc23.noarch
>            Reporter: Brian Bouterse
>            Assignee: Alan Conway
>             Fix For: qpid-python-1.36.0
>
>         Attachments: bad_child.py, bad_child.py, bt.txt, lsof.txt, 
> pystack.17806, spout-hang.py, spout-hang-trace.txt, taabt.txt, worker-stacks
>
>
> When publishing a task with qpid.messaging it deadlocks and our application 
> cannot continue. This has not been a problem for several releases, but within 
> a few days recently, another Satellite developer and I both experienced the 
> issue on separate machines, different distros. He is using a MRG built 
> pacakge (not sure of version). I am using python-qpid-0.32-13.fc23.
> Both deadlocked machines had core dumps taken on the deadlocked processes and 
> only show only 1 Qpid thread when I expect there to be 2. There are other 
> mongo threads, but those are idle as expected and not related. The traces 
> show our application calling into qpid.messaging to publish a message to the 
> message bus.
> This problem happens intermittently, and in cases where message publish is 
> successful I've verified by core dump that there are the expected 2 threads 
> for Qpid.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@qpid.apache.org
For additional commands, e-mail: dev-h...@qpid.apache.org

Reply via email to