Hi !

I`m playing with the bacula-configuration here (1.37.36) and i noticed the following behaviour.

I started a backup-job to backup our Windows 2003 Server ... everything ran fine till the windows-box crashed (still investigating why but this is something else i think - not bacula related). Now the director says the job is still running. Meanwhile i have added another backup-job to the queue that is awaiting the end of the first job.

Now even if i cancel the windows-backup-job and the director says it is properly canceled it still won`t go away ... from the documentation i got the idea that it should continue after a few minutes but now i waited an hour an nothing happened. (logfiles at bottom)

Restarting the director solved the problem, alltough i had to start the second job again.

My concern is that when our bacula setup goes live and during the nightly backup-run ONE client crashes during the backup and the complete backup hangs ? Is there some sort of timeout that notices the client isn`t sending data anymore ? Is there a way to continue the backup from where it crashed (i doubt this functionality exists) ?

If i had a Full-Backup pool for the crashed machine with a volume retention time of 2 months and 2 volumes maximum - what would happen if:
sunday fullbackup crashes - volume marked as used
monday - incremental gets upgraded to full - volume marked as used
tue-sat - incrementals as usual
sunday fullbackup -> retention time for both used volumes are not over so what happens ?



---------- Logfiles -----------
Running Jobs:
 JobId Level   Name                       Status
======================================================================
    10 Full    Noether.2005-08-31_08.52.29 is running
====
*cancel 10
Automatically selected Job: JobId=10 Job=Noether.2005-08-31_08.52.29
Confirm cancel (yes/no): yes
2901 Job Noether.2005-08-31_08.52.29 not found.
3000 Job Noether.2005-08-31_08.52.29 marked to be canceled.
You have messages.
*messages
31-Aug 09:49 backup-sd: Noether.2005-08-31_08.52.29 Fatal error: append.c:238 Network error on data channel. ERR=Connection reset by peer 31-Aug 09:49 backup-dir: Max Volume jobs exceeded. Marking Volume "noether.full.0001" as Used.

Running Jobs:
 JobId Level   Name                       Status
======================================================================
    10 Full    Noether.2005-08-31_08.52.29 has been canceled
    11 Increme  Backup.2005-08-31_10.00.00 is waiting execution
====

After that i waited about an hour for the job to cancel but only restarting the director did the trick

--
Daniel Holtkamp                    Riege Software International GmbH
System Administration                                   Mollsfeld 10
40670 Meerbusch, Germany                     Phone: +49-2159-9148-41
mail: holtkamp [at] riege.com                Fax:   +49-2159-9148-11
--------------------------------------------------------------------


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to