Re: [Bacula-users] Technical aspects of the restore bug

DAve Mon, 10 Sep 2007 08:07:40 -0700

Kern Sibbald wrote:
>        This document contains the technical details of Bug #395.
> 
> Bacula bug #935 reports that during a restore, a large number of files are 
> missing and thus not restored.  This is really quite surprising because we 
> have a fairly extensive regression test suite that explicitly tests for this 
> kind of problem many times.
> 
> Despite our testing, there is indeed a bug in Bacula that has the following 
> characteristics:
> 
> 1. It happens only when multiple simultaneous Jobs are run (regardless of 
> whether or not data spooling is enabled), and happens only when the 
> Storage daemon is changing from one Volume to another -- i.e. the
> backups span multiple volumes.
> 
> 2. It has only been observed on disk based backup, but not on tape. 
> 
> 3. Under the right circumstances (timing), it could and probably does happen 
> on tape backups.
> 
> 4. It seems to be timing dependent, and requires multiple clients to 
> reproduce, although under the right circumstances, it should be reproducible
> with a single client doing multiple simultaneous backups.
> 
> 5. Analysis indicates that it happens most often when the clients are slow 
> (e.g. doing Incremental backups).
> 
> 6. It has been verified to exist in versions 2.0.x and 2.2.x.
> 
> 7. It should also be in version 1.38, but could not be reproduced in testing, 
> perhaps due to timing considerations or the fact that the test FD daemons 
> were version 2.2.2.
> 
> 8. The data is correctly stored on the Volume, but incorrect index (JobMedia) 
> records are stored in the database.  (the JobMedia record generated during 
> the Volume change contains the index of the new Volume rather than the 
> previous Volume).  This will be described in more detail below.
> 
> 9. You can prevent the problem from occurring by either turning off multiple 
> simultaneous Jobs or by ensuring that while running multiple simultaneous 
> Jobs that those Jobs do not span Volumes.  E.g. you could manually mark 
> Volumes as full when they are sufficiently large.
> 
> 10. If you are not running multiple simultaneous Jobs, you will not be 
> affected by this bug.
> 
> 11. If you are running multiple simultaneous Jobs to tapes, I believe there 
> is 
> a reasonable probability that this problem could show up when Jobs are split 
> across tapes.
> 
> 12. If you are running multiple simultaneous Jobs to disks, I believe there 
> is 
> a high probability that this problem will show up when Jobs are split across 
> disks Volumes.
> 
> ===============================
> 
> The problem comes from the fact that when the end of a Volume is reached,
> the SD must generate a JobMedia (index) record for each of the Jobs that is
> currently running. Since each job is in a separate thread, the thread that
> does the Volume switch marks all the other threads (Jobs) with a flag
> that tell them to update the catalog index (JobMedia).  Sometime later,
> when that thread attempts to do another write to the volume, it will
> create a JobMedia record.  
>


If I read everything correctly, I believe we would be immune to this bug 
at this time. While we certainly use concurrent jobs, each job is 
written to a recycled volume each night. We have no jobs that span a 
volume at any time.

Would that be a correct analysis?

DAve
-- 
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Technical aspects of the restore bug

Reply via email to