Hi !
I`m playing with the bacula-configuration here (1.37.36) and i noticed
the following behaviour.
I started a backup-job to backup our Windows 2003 Server ... everything
ran fine till the windows-box crashed (still investigating why but this
is something else i think - not bacula related). Now the director says
the job is still running. Meanwhile i have added another backup-job to
the queue that is awaiting the end of the first job.
Now even if i cancel the windows-backup-job and the director says it is
properly canceled it still won`t go away ... from the documentation i
got the idea that it should continue after a few minutes but now i
waited an hour an nothing happened. (logfiles at bottom)
Restarting the director solved the problem, alltough i had to start the
second job again.
My concern is that when our bacula setup goes live and during the
nightly backup-run ONE client crashes during the backup and the complete
backup hangs ? Is there some sort of timeout that notices the client
isn`t sending data anymore ? Is there a way to continue the backup from
where it crashed (i doubt this functionality exists) ?
If i had a Full-Backup pool for the crashed machine with a volume
retention time of 2 months and 2 volumes maximum - what would happen if:
sunday fullbackup crashes - volume marked as used
monday - incremental gets upgraded to full - volume marked as used
tue-sat - incrementals as usual
sunday fullbackup -> retention time for both used volumes are not over
so what happens ?
---------- Logfiles -----------
Running Jobs:
JobId Level Name Status
======================================================================
10 Full Noether.2005-08-31_08.52.29 is running
====
*cancel 10
Automatically selected Job: JobId=10 Job=Noether.2005-08-31_08.52.29
Confirm cancel (yes/no): yes
2901 Job Noether.2005-08-31_08.52.29 not found.
3000 Job Noether.2005-08-31_08.52.29 marked to be canceled.
You have messages.
*messages
31-Aug 09:49 backup-sd: Noether.2005-08-31_08.52.29 Fatal error:
append.c:238 Network error on data channel. ERR=Connection reset by peer
31-Aug 09:49 backup-dir: Max Volume jobs exceeded. Marking Volume
"noether.full.0001" as Used.
Running Jobs:
JobId Level Name Status
======================================================================
10 Full Noether.2005-08-31_08.52.29 has been canceled
11 Increme Backup.2005-08-31_10.00.00 is waiting execution
====
After that i waited about an hour for the job to cancel but only
restarting the director did the trick
--
Daniel Holtkamp Riege Software International GmbH
System Administration Mollsfeld 10
40670 Meerbusch, Germany Phone: +49-2159-9148-41
mail: holtkamp [at] riege.com Fax: +49-2159-9148-11
--------------------------------------------------------------------
-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users