Status: New
Owner: ----
New issue 203 by [email protected]: frequent crashes of ganeti-masterd and
frequent failiures to restart it due to errors with the job queue.
http://code.google.com/p/ganeti/issues/detail?id=203
vhost1 queue # gnt-cluster --version
gnt-cluster (ganeti v2.4.4) 2.4.4
vhost1 queue # gnt-cluster version
Software version: 2.4.4
Internode protocol: 2040000
Configuration format: 2040000
OS api version: 20
Export interface: 0
vhost1 queue #
Running Gentoo and python2.7
The problem does not happen all the time, and not the exact same way either.
The most common problem now is:
python2.7: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock ==
((ath_mutex_t) 0)' failed.
And often, this also happens, of which the solution is to clear/archive the
queue:
2011-10-26 17:31:18,642: ganeti-masterd pid=32343/ClientReq13 INFO Received
instance query request for ['build0.internal']
python2.7: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock ==
((ath_mutex_t) 0)' failed.
2011-10-26 17:33:05,088: ganeti-masterd pid=839/MainThread INFO
ganeti-masterd daemon startup
2011-10-26 17:33:05,089: ganeti-masterd pid=839/MainThread INFO Using
PycURL libcurl/7.20.0 GnuTLS/2.10.4 zlib/1.2.5
2011-10-26 17:33:05,110: ganeti-masterd pid=839/MainThread INFO Inspecting
job queue
2011-10-26 17:33:05,112: ganeti-masterd pid=839/MainThread INFO Job queue
inspection: 0/2 (33.3 %)
2011-10-26 17:33:05,113: ganeti-masterd pid=839/MainThread INFO Job queue
inspection: 2/2 (100.0 %)
2011-10-26 17:33:05,114: ganeti-masterd pid=839/MainThread WARNING
Unfinished job 30502 found: <ganeti.jqueue._QueuedJob id=30502 ops=TAGS_SET
at 0x756ed0>
Traceback (most recent call last):
File "/usr/sbin/ganeti-masterd", line 21, in <module>
sys.exit(main.Main())
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line
653, in Main
ExecMasterd, multithreaded=True)
File "/usr/lib64/python2.7/site-packages/ganeti/daemon.py", line 707, in
GenericMain
exec_fn(options, args, prep_results)
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line
628, in ExecMasterd
master.setup_queue()
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line
165, in setup_queue
self.context = GanetiContext()
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line
404, in __init__
self.jobqueue = jqueue.JobQueue(self)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1240, in
__init__
self._InspectQueue()
File "/usr/lib64/python2.7/site-packages/ganeti/locking.py", line 72, in
sync_function
return fn(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1175, in
wrapper
return fn(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1293, in
_InspectQueue
self.UpdateJobUnlocked(job)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1175, in
wrapper
return fn(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1785, in
UpdateJobUnlocked
assert (finalized ^ (job.end_timestamp is None))
AssertionError