Kern Sibbald wrote: > This document contains the technical details of Bug #395. > > Bacula bug #935 reports that during a restore, a large number of files are > missing and thus not restored. This is really quite surprising because we > have a fairly extensive regression test suite that explicitly tests for this > kind of problem many times. > > Despite our testing, there is indeed a bug in Bacula that has the following > characteristics: > > 1. It happens only when multiple simultaneous Jobs are run (regardless of > whether or not data spooling is enabled), and happens only when the > Storage daemon is changing from one Volume to another -- i.e. the > backups span multiple volumes. > > 2. It has only been observed on disk based backup, but not on tape. > > 3. Under the right circumstances (timing), it could and probably does happen > on tape backups. > > 4. It seems to be timing dependent, and requires multiple clients to > reproduce, although under the right circumstances, it should be reproducible > with a single client doing multiple simultaneous backups. > > 5. Analysis indicates that it happens most often when the clients are slow > (e.g. doing Incremental backups). > > 6. It has been verified to exist in versions 2.0.x and 2.2.x. > > 7. It should also be in version 1.38, but could not be reproduced in testing, > perhaps due to timing considerations or the fact that the test FD daemons > were version 2.2.2. > > 8. The data is correctly stored on the Volume, but incorrect index (JobMedia) > records are stored in the database. (the JobMedia record generated during > the Volume change contains the index of the new Volume rather than the > previous Volume). This will be described in more detail below. > > 9. You can prevent the problem from occurring by either turning off multiple > simultaneous Jobs or by ensuring that while running multiple simultaneous > Jobs that those Jobs do not span Volumes. E.g. you could manually mark > Volumes as full when they are sufficiently large. > > 10. If you are not running multiple simultaneous Jobs, you will not be > affected by this bug. > > 11. If you are running multiple simultaneous Jobs to tapes, I believe there > is > a reasonable probability that this problem could show up when Jobs are split > across tapes. > > 12. If you are running multiple simultaneous Jobs to disks, I believe there > is > a high probability that this problem will show up when Jobs are split across > disks Volumes. > > =============================== > > The problem comes from the fact that when the end of a Volume is reached, > the SD must generate a JobMedia (index) record for each of the Jobs that is > currently running. Since each job is in a separate thread, the thread that > does the Volume switch marks all the other threads (Jobs) with a flag > that tell them to update the catalog index (JobMedia). Sometime later, > when that thread attempts to do another write to the volume, it will > create a JobMedia record. >
If I read everything correctly, I believe we would be immune to this bug at this time. While we certainly use concurrent jobs, each job is written to a recycled volume each night. We have no jobs that span a volume at any time. Would that be a correct analysis? DAve -- Three years now I've asked Google why they don't have a logo change for Memorial Day. Why do they choose to do logos for other non-international holidays, but nothing for Veterans? Maybe they forgot who made that choice possible. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2005. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users