All,
As you all know, we have seen some issues with QPID over the past few months. I'd like to
summarize them and inform everyone were we stand. The primary symptom is that goferd
starts consuming ~100% CPU. I've also seen cases where httpd starts consuming ~60% CPU as
well. During my investigation, I've determined that there a (2) variations of this.
See attached pulp.log for error messages.
SUMMARY:
#1. python-qpid starts spewing error messages in the log which do not refer to a problem
with the broker. They instead suggest (to me) a bug in the python driver. Or, a state
mismatch between python-qpid and the broker. Currently, I cannot reproduce this on
demand. I've heard of this on both i386 and x86_64.
#2. python-qpid starts spewing error messages in the log which refer to a problem with the
broker (and/or contained queues). Basically, we're getting a "ConnectionError: Enqueue
capacity threshold exceeded on queue" on a temporary (durable) queue associated with a
topic subscription. Using qpid-stat, I determined that the queue had only 11 messages
queued. Then, using qpid-stat, I purged the queue. But, still having the same problem
which seems odd since the queue appears to be empty. I verified that even with 11
messages queued, the qpid consumer was getting an "Empty" exception trying to fetch
messages. None of this makes sense. See (attached) pulp.log. The cause of this issue
was thought to be (by qpid development team) that the root filesystem was full. However,
we're seeing this again on machines with FS not nearly full. A workaround is to uninstall
qpid-cpp-server-store. After doing this, the issue seems to be gone. Note,
qpid-cpp-server-store is required by pulp so in order to do this, you need to either run
from dev-env or hack & respin the rpm without the dep. I've heard of this on only x86_64.
NEXT STEPS:
=== #1 ===
A. I've sent an urgent email to Rafael Schloming who write the python-qpid driver asking
him to take a look at the stack traces and try to shed some light on this.
B. Considered that the driver could be getting corrupted by concurrency issues, I updated
the gofer stress test over the weekend to hammer gofer/qpid using large numbers of
threads. Each thread creates agent proxies and invokes a lot of methods. Inspection of
python-qpid code clearly shows that it /should/ be reentrant. All tests ran clean.
C. Replaced the goferd code profiler (cProfile) with YAPPI [1] in attempt to get better
profiling information beyond the main thread. cProfile & Profile suck for multi-threaded
applications and you have to figure this out the hard way. Any other suggestions
appreciated. Today, I'll reproduce condition #2 and at least try to see which
thread/function is eating the CPU.
=== #2 ===
A. So far, only know about this happening on Preethi's and Sayli's machines. I want to
recreate on another box.
B. Re-engage QPID development team to help resolve.
[1] https://bugzilla.redhat.com/show_bug.cgi?id=688662
[2] https://bugzilla.redhat.com/show_bug.cgi?id=689573
[3] http://code.google.com/p/yappi/
qpid-cpp-server-0.8-4.fc14.i686
python-qpid-0.8-3.fc14.noarch
qpid-tools-0.8-4.fc14.noarch
qpid-cpp-client-0.8-4.fc14.i686
qpid-cpp-server-ssl-0.8-4.fc14.i686
qpid-cpp-client-devel-0.8-4.fc14.i686
qpid-cpp-client-ssl-0.8-4.fc14.i686
ruby-qpid-0.8-1.fc14.i686
2011-03-25 13:39:44,135 [ERROR][heartbeat] run() @ consumer.py:69 - failed:
Message(ttl=9.999, durable=True, content='{\n "origin":
"c467bc1d-6533-4080-8a84-69bbccef7608", \n "heartbeat": {\n "uuid":
"jortel", \n "next": 10\n }, \n "version": "0.2", \n "sn":
"40bb4978-4ac1-4696-805a-cfacdae459ce"\n}')
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 63,
in run
try:
File "<string>", line 6, in fetch
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
997, in fetch
self._ecwait(lambda: self.linked)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50,
in _ecwait
result = self._ewait(lambda: self.closed or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
960, in _ewait
result = self.session._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
553, in _ewait
result = self.connection._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
196, in _ewait
self.check_error()
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
189, in check_error
raise self.error
InternalError: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 504,
in dispatch
self.engine.dispatch()
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 808,
in dispatch
self.process(ssn)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 1036,
in process
self.send(snd, msg)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 1209,
in send
rk = _snd._routing_key
AttributeError: Attachment instance has no attribute '_routing_key'
========================
2011-03-25 15:46:30,135 [DEBUG][heartbeat] run() @ consumer.py:68 - ready
2011-03-25 15:46:30,136 [ERROR][heartbeat] __fetch() @ consumer.py:91 - failed
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 86,
in __fetch
return self.receiver.fetch(timeout=self.WAIT)
File "<string>", line 6, in fetch
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
997, in fetch
self._ecwait(lambda: self.linked)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line 50,
in _ecwait
result = self._ewait(lambda: self.closed or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
960, in _ewait
result = self.session._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
553, in _ewait
result = self.connection._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
196, in _ewait
self.check_error()
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
189, in check_error
raise self.error
InternalError: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 504,
in dispatch
self.engine.dispatch()
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 807,
in dispatch
self.attach(ssn)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 850,
in attach
self.link(rcv, self._in, rcv.source)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 901,
in link
dir.do_unlink(sst, lnk, _lnk)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 269,
in do_unlink
sst.write_cmds(cmds, action)
AttributeError: 'NoneType' object has no attribute 'write_cmds'
========================
BK
2011-03-21 17:14:04,616 [ERROR][pulp-monitor] update() @ main.py:104 - plugin
pulp
Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/gofer/agent/main.py", line 100, in
update
plugin.attach(uuid)
File "/usr/lib/python2.7/site-packages/gofer/agent/plugin.py", line 214, in
attach
if not uuid:
File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line 90,
in __init__
Endpoint.__init__(self, **other)
File "/usr/lib/python2.7/site-packages/gofer/messaging/endpoint.py", line 53,
in __init__
self.open()
File "/usr/lib/python2.7/site-packages/gofer/messaging/consumer.py", line
115, in open
receiver = session.receiver(address)
File "<string>", line 6, in receiver
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
600, in receiver
if not self.closed and self.connection._connected:
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
957, in _ewait
return self.error
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
550, in _ewait
return self.error
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
194, in _ewait
def _ewait(self, predicate, timeout=None):
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
187, in check_error
if self.error:
InternalError: Traceback (most recent call last):
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 616,
in write
op.dispatch(self)
File "/usr/lib/python2.7/site-packages/qpid/ops.py", line 84, in dispatch
getattr(target, handler)(self, *args)
File "/usr/lib/python2.7/site-packages/qpid/messaging/driver.py", line 811,
in do_session_detached
sst = self._sessions.pop(dtc.channel)
KeyError: 'pop(): dictionary is empty'
========================
Preethi/sayli
File "/usr/lib/python2.7/site-packages/gofer/messaging/producer.py", line 57,
in send
sender = self.session().sender(address)
File "<string>", line 6, in sender
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
576, in sender
sender._ewait(lambda: sender.linked)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
783, in _ewait
result = self.session._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
550, in _ewait
result = self.connection._ewait(lambda: self.error or predicate(), timeout)
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
194, in _ewait
self.check_error()
File "/usr/lib/python2.7/site-packages/qpid/messaging/endpoints.py", line
187, in check_error
raise self.error
ConnectionError: Enqueue capacity threshold exceeded on queue
"49ba2548-098b-4784-834b-98323b2d113a:0.0". (JournalImpl.cpp:616)(501)
_______________________________________________
Pulp-list mailing list
[email protected]
https://www.redhat.com/mailman/listinfo/pulp-list