#158: BibSched erroneously thinks that tasks failed to run
-----------------------+----------------------
Reporter: skaplun | Owner: skaplun
Type: defect | Status: assigned
Priority: critical | Milestone:
Component: BibSched | Version:
Resolution: | Keywords:
-----------------------+----------------------
Comment (by simko):
Here are logs from INSPIRE showing the task dance:
Case study 1: (note that bibindex seems to start 10 seconds before
bibrank actually finishes)
{{{
2011-04-05 14:39:12 -> StandardError: Process bibindex (task_id: 1)
was launched but seems not to be able to reach RUNNING status.
2011-04-05 14:35:42 --> Task #9335 (bibupload) started
2011-04-05 14:35:56 --> Task #9335 (bibupload) exited
2011-04-05 14:35:56 --> Task #2 (webcoll) started
2011-04-05 14:36:40 --> Task #2 (webcoll) exited
2011-04-05 14:36:41 --> Task #3 (bibreformat) started
2011-04-05 14:37:36 --> Task #4 (bibrank) started
2011-04-05 14:37:39 --> Task #3 (bibreformat) exited
2011-04-05 14:38:47 --> Task #1 (bibindex) started
2011-04-05 14:38:57 --> Task #4 (bibrank) exited
2011-04-05 14:39:56 --> Task #1 (bibindex) exited
}}}
Case study 2:
{{{
2011-04-03 21:29:32 -> StandardError: Process bibreformat (task_id: 3)
was launched but seems not to be able to reach RUNNING status.
2011-04-03 21:24:06 --> Task #4 (bibrank) started
2011-04-03 21:24:19 --> Task #4 (bibrank) exited
2011-04-03 21:24:21 --> Task #1 (bibindex) started
2011-04-03 21:24:36 --> Task #1 (bibindex) exited
2011-04-03 21:24:36 --> Task #2 (webcoll) started
2011-04-03 21:24:48 --> Task #2 (webcoll) exited
2011-04-03 21:29:06 --> Task #3 (bibreformat) started
2011-04-03 21:30:11 --> Task #3 (bibreformat) exited
}}}
So it looks like the tasks were executed well and that bibsched
mis-detected the task launch trouble.
After talking to Sam, it appears this was happening for Benoit on ADS
too, so it is probably related to loading of big citation dictionaries
that does not finish within 5 * `CFG_BIBSCHED_REFRESHTIME` seconds at
times. (=25 seconds on INSPIRE)
Before reviving the lazy loading of citation dictionaries branch, that
should help here, I have increased the task launch detection interval
on INSPIRE production machine to be twice as long. (Basically, using
`count = 10` instead of `count = 5` in `bibsched.py`.)
--
Ticket URL: <http://invenio-software.org/ticket/158#comment:3>
Invenio <http://invenio-software.org>