Status: New
Owner: ----

New issue 203 by [email protected]: frequent crashes of ganeti-masterd and frequent failiures to restart it due to errors with the job queue.
http://code.google.com/p/ganeti/issues/detail?id=203

vhost1 queue # gnt-cluster --version
gnt-cluster (ganeti v2.4.4) 2.4.4
vhost1 queue # gnt-cluster version
Software version: 2.4.4
Internode protocol: 2040000
Configuration format: 2040000
OS api version: 20
Export interface: 0
vhost1 queue #


Running Gentoo and python2.7

The problem does not happen all the time, and not the exact same way either.
The most common problem now is:
python2.7: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock == ((ath_mutex_t) 0)' failed. And often, this also happens, of which the solution is to clear/archive the queue:

2011-10-26 17:31:18,642: ganeti-masterd pid=32343/ClientReq13 INFO Received instance query request for ['build0.internal'] python2.7: ath.c:193: _gcry_ath_mutex_lock: Assertion `*lock == ((ath_mutex_t) 0)' failed. 2011-10-26 17:33:05,088: ganeti-masterd pid=839/MainThread INFO ganeti-masterd daemon startup 2011-10-26 17:33:05,089: ganeti-masterd pid=839/MainThread INFO Using PycURL libcurl/7.20.0 GnuTLS/2.10.4 zlib/1.2.5 2011-10-26 17:33:05,110: ganeti-masterd pid=839/MainThread INFO Inspecting job queue 2011-10-26 17:33:05,112: ganeti-masterd pid=839/MainThread INFO Job queue inspection: 0/2 (33.3 %) 2011-10-26 17:33:05,113: ganeti-masterd pid=839/MainThread INFO Job queue inspection: 2/2 (100.0 %) 2011-10-26 17:33:05,114: ganeti-masterd pid=839/MainThread WARNING Unfinished job 30502 found: <ganeti.jqueue._QueuedJob id=30502 ops=TAGS_SET at 0x756ed0>
Traceback (most recent call last):
  File "/usr/sbin/ganeti-masterd", line 21, in <module>
    sys.exit(main.Main())
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line 653, in Main
    ExecMasterd, multithreaded=True)
File "/usr/lib64/python2.7/site-packages/ganeti/daemon.py", line 707, in GenericMain
    exec_fn(options, args, prep_results)
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line 628, in ExecMasterd
    master.setup_queue()
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line 165, in setup_queue
    self.context = GanetiContext()
File "/usr/lib64/python2.7/site-packages/ganeti/server/masterd.py", line 404, in __init__
    self.jobqueue = jqueue.JobQueue(self)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1240, in __init__
    self._InspectQueue()
File "/usr/lib64/python2.7/site-packages/ganeti/locking.py", line 72, in sync_function
    return fn(*args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1175, in wrapper
    return fn(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1293, in _InspectQueue
    self.UpdateJobUnlocked(job)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1175, in wrapper
    return fn(self, *args, **kwargs)
File "/usr/lib64/python2.7/site-packages/ganeti/jqueue.py", line 1785, in UpdateJobUnlocked
    assert (finalized ^ (job.end_timestamp is None))
AssertionError



Reply via email to