Status: New
Owner: ----
New issue 1113 by [email protected]: ganeti-luxid fills the log
storage with millions lines like "INFO Successfully handled
WaitForJobChange"
https://code.google.com/p/ganeti/issues/detail?id=1113
Hi,
Our storage log partition got filled this night with
/var/log/luxi-daemon.log growing to several gigabytes withing a few minutes:
2015-06-29 01:20:06,651512000000 CEST: ganeti-luxid pid=34084 INFO
Scheduler notified of change of job 707033
2015-06-29 01:20:06,651603000000 CEST: ganeti-luxid pid=34084 INFO
Rereading job 707033
2015-06-29 01:20:06,753828000000 CEST: ganeti-luxid pid=34084 INFO
Scheduler notified of change of job 707033
2015-06-29 01:20:06,753968000000 CEST: ganeti-luxid pid=34084 INFO
Rereading job 707033
2015-06-29 01:20:06,859385000000 CEST: ganeti-luxid pid=34084 INFO
Scheduler notified of change of job 707033
2015-06-29 01:20:06,859552000000 CEST: ganeti-luxid pid=34084 INFO
Rereading job 707033
2015-06-29 01:20:08,159165000000 CEST: ganeti-luxid pid=34084 INFO
Scheduler notified of change of job 707033
2015-06-29 01:20:08,159408000000 CEST: ganeti-luxid pid=34084 INFO
Rereading job 707033
2015-06-29 01:20:08,159843000000 CEST: ganeti-luxid pid=34084 INFO Finished
jobs: (707033,JOB_STATUS_SUCCESS)
and then it looped over:
2015-06-29 01:20:27,325809000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,326939000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,327972000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,328843000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,329800000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,330766000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,331682000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,332570000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,333451000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,334242000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,335094000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,336053000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,336965000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,337754000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,338597000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,339507000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,340289000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,341122000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,341967000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,342745000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,343578000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,344423000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,345192000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,346008000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,346811000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,347679000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,348420000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,349222000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,350065000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,350800000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,351627000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,352408000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,353123000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
2015-06-29 01:20:27,353881000000 CEST: ganeti-luxid pid=34084 INFO
Successfully handled WaitForJobChange
(...)
It looped very fast as you can see for the log timestamp and we had a total
of 23 millions lines like this in the last 24 hours:
# grep -Fc 'INFO Successfully handled WaitForJobChange' luxi-daemon.log.1
23646165
It only stopped when the partition had no free storage left.
I've found a probable relevant part of the code (from 2.12.4):
def WaitForJobChange(self, job_id, fields, prev_job_info,
prev_log_serial):
job_id = Client._PrepareJobId(REQ_WAIT_FOR_JOB_CHANGE, job_id)
while True:
result = self.WaitForJobChangeOnce(job_id, fields,
prev_job_info, prev_log_serial)
if result != constants.JOB_NOTCHANGED:
break
return result
I don't know the internals but 'while True' looks dangerous to me. Maybe we
need a sleep() inside the 'while True' loop to avoid a race condition?
I'm not quite sure yet but I think this got triggered because one of the
nodes was running out of storage space before the 'cluster' got filled
itself too.
More information about the cluster:
We are running Ganeti 2.11.6-1~bpo70+1 on Debian GNU/Linux 7.8 (wheezy)
amd64.
# gnt-cluster --version
gnt-cluster (ganeti v2.11.6) 2.11.6
# gnt-cluster version
Software version: 2.11.6
Internode protocol: 2110000
Configuration format: 2110000
OS api version: 20
Export interface: 0
VCS version: (ganeti) version v2.11.6
# hspace --version
hspace (ganeti) version v2.11.6
compiled with ghc 7.4
running on linux x86_64
Please let me know if you need more information.
--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings