Hi all,

I ran into an interesting situation yesterday:

I was doing some maintenance on the mysql server that holds
our Bacula database. I had hoped to be finished with that just in
time for the next scheduled backup jobs. However, such was not
the case and at the time the jobs started, the database was
actually still down. My bad, obviously...

However, I don't think Bacula handled the situation very graciously
and/or resiliently. What happened was that Bacula immediately
failed the first job with a fatal error and continued with the next job,
which, of course, also failed, and the next and the next etc, leaving
me with a whole bunch of failed jobs. To make things worse, these
jobs vanished from bconsole into thin air and were nowhere to be
found again. I assume this is because the job status could not be
updated in the database.

I would have hoped/expected that, in such a case, Bacula would
back off for a (configurable) short amount of time and then retry
the job. In the meantime, warnings could be sent to the administrator
and eventually a permanent time out could occur. Even then, there
would be no real use in discarding the queued jobs, would there?

Blindly trying to continue with the next queued job after a failed
database connection is practically useless and a bit silly, as there
is only a very slim chance that the exact same database will magically
be up and running mere seconds later?

Anyway, I don't (really) mean this as a rant or anything, but I must say
that I was quite surprised by this chain of events, which threw me into
a whole new world of pain while trying to re-run the failed jobs, which
I ultimately gave up on, because Bacula kept asking me to mount
two tapes simultaneously (?).

Obviously, in this case, there are some things I (c|sh)ould have done
myself to prevent this. I.e.: not doing maintenance right before/during
backups, shutting down Bacula during database maintenance and/or
reschedule the jobs to a later time beforehand. However, the same thing
would have happened if, for instance, the database had crashed right
before backup time, and I had not had time to respond to the resulting
monitoring alerts.

So, what's your view on this? Should I just STFU and make bloody
damned sure that the database is up and running at the time of
jobs scheduled to run? Is there perhaps some configuration option
I missed, that does what I proposed (back off/retry)? Is it a bug, a  
feature,
a possible future improvement, supergrover?

Kind regards,

Leander





-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to