To polish my Bacula (2.2.4 on CentOS5) conf a little bit, I tried to add max
wait time directive to job defaults.
More specifically:

  Max Start Delay = 7200
  Max Run Time = 7000
  Max Wait Time = 900

So, my goal was that
- job won't start if it's delayed 2 hours
- no single job may run longer than appr. 1 hour 56 minutes
- job would be canceled, if it's hung for 15 minutes due to eg. missing tape


Last night there were two full backups scheduled to run at 01.05, both with
the same priority of 10. No concurrent jobs are allowed, so one of the jobs
was expected to wait for the first one to finish.

This is what happened:
01-Oct 01:05 dogbert-dir: Start Backup JobId 163,
Job=Backup-Dogbert.2007-10-01_01.05.00
...
01-Oct 01:20 dogbert-dir: Backup-Dogbert.2007-10-01_01.05.00 Fatal error:
Max wait time exceeded. Job canceled.
01-Oct 01:20 dogbert-fd: Backup-Dogbert.2007-10-01_01.05.00 Fatal error:
backup.c:892 Network send error to SD. ERR=Input/output error
01-Oct 01:20 dogbert-fd: Backup-Dogbert.2007-10-01_01.05.00 Error:
bsock.c:311 Wrote 65536 bytes to Storage daemon:dogbert:9103, but only 16040
accepted.
01-Oct 01:20 dogbert-sd: Job Backup-Dogbert.2007-10-01_01.05.00 marked to be
canceled.
01-Oct 01:20 dogbert-sd: Job Backup-Dogbert.2007-10-01_01.05.00 marked to be
canceled.
01-Oct 01:20 dogbert-sd: Backup-Dogbert.2007-10-01_01.05.00 Fatal error:
append.c:259 Network error on data channel. ERR=Connection reset by peer
01-Oct 01:20 dogbert-sd: Job write elapsed time = 00:11:41, Transfer rate =
6.008 M bytes/second
01-Oct 01:20 dogbert-dir: Bacula dogbert-dir 2.2.4 (14Sep07): 01-Oct-2007
01:20:22
...
  FD Bytes Written:       4,192,251,934 (4.192 GB)
  SD Bytes Written:       4,211,990,004 (4.211 GB)
  Rate:                   4556.8 KB/s


So, the job had been running succesfully for the first 15 minutes, backing
up over 4GB of data (reasonable rate for my hardware), and then it is
canceled due to 15 minutes wait time limit???
Is this all quite correct? My idea was that "max run time" would be the
limiting factor to behave like this?


Then, the other job that was scheduled to start at the same time -but
couldn't start before the first one finishes:

01-Oct 01:20 dogbert-dir: Backup-Dilbert.2007-10-01_01.05.01 Fatal error:
Max wait time exceeded. Job canceled.


So this job was canceled immediatedly, since "max wait time had been
exeeded". However, Bacula documentation says:

Max Wait Time = <time> The time specifies the maximum allowed time that a
job may block waiting for a resource (such as waiting for a tape to be
mounted, or waiting for the storage or file daemons to perform their
duties), counted from the when the job starts, (not necessarily the same as
when the job was scheduled).


Now, it looks to me that max wait time is counted from the scheduled time
anyway. I increased the max wait time to 6000 and run the jos manually, it
worked for now. But I'm still wondering why things went this way?


Regards,
Timo



-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to