-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kern Sibbald wrote:
> Hello,
> 
> I regret to have to announce that there is a rather serious bug in Bacula.
> 
> Bacula bug #935 reports that during a restore, a large number of files are 
> missing and thus not restored.  This is really quite surprising because we 
> have a fairly extensive regression test suite that explicitly tests for this 
> kind of problem many times.
> 
> Despite our testing, there is indeed a bug in Bacula that has the following 
> characteristics:
> 
> 1. It happens only when multiple simultaneous Jobs are run (regardless of 
> whether or not data spooling is enabled).
> 
> 2. It has only been observed on disk based backup, but not on tape. 
> 
> 3. Under the right circumstances (timing), it could and probably does happen 
> on tape backups.
> 
> 4. It seems to be timing dependent, and requires multiple clients to 
> reproduce.
> 
> 5. Analysis indicates that it happens most often when the clients are slow 
> (e.g. doing Incremental backups).
> 
> 6. It has been verified to exist in versions 2.0.x and 2.2.x.
> 
> 7. It should also be in version 1.38, but could not be reproduced in testing, 
> perhaps due to timing considerations or the fact that the test FD daemons 
> were version 2.2.2.
> 
> 8. The data is correctly stored on the Volume, but incorrect index (JobMedia) 
> records are stored in the database.  (the JobMedia record generated during 
> the Volume change contains the index of the new Volume rather than the 
> previous Volume).
> 
> 9. You can prevent the problem from occurring by either turning off multiple 
> simultaneous Jobs or by ensuring that while running multiple simultaneous 
> Jobs that those Jobs do not span Volumes.  E.g. you could manually mark 
> Volumes as full when they are sufficiently large.
> 
> 10. If you are not running multiple simultaneous Jobs, you will not be 
> affected by this bug.
> 
> 11. If you are running multiple simultaneous Jobs to tapes, I believe there 
> is 
> a reasonable probability that this problem could show up when Jobs are split 
> across tapes.
> 
> 12. If you are running multiple simultaneous Jobs to disks, I believe there 
> is 
> a high probability that this problem will show up when Jobs are split across 
> disks Volumes.
> 
> I have uploaded patches to bug #935 (bugs.bacula.org) that will correct 
> version 2.2.0, 2.2.1, and 2.2.2.  The patch has been tested only on version 
> 2.2.2 and passes all regression tests as well as the specific test that 
> reproduced the problem.
> 
> After a little more testing, I plan to release version 2.2.3 probably on 
> Monday the 10th or Tuesday.
> 
> At this time, I do not have a patch for 2.0.x versions, and unless there is 
> some really compelling reason to create one, I would prefer not -- it would 
> not be a huge effort to back port the patch, but it would require rather 
> extensive testing.  Though it is hard to make a specific recommendation, I 
> believe that it probably will be the wisest and simplest to either patch 
> version 2.2.x if that is what you are currently running, or upgrade to 
> version 2.2.3 when it is released.

My personal recommendation would be to release a patch to all versions
back to at 1.38.x if the bug can be verified. I know that not too many
people are running that version anymore, but if this bug is serious
enough that the software will not work, I would personally be worried
that someone will use one of these versions (the latest available of the
minor release, eg. the latest 1.38.x) and not know that this is a
problem. Theoretically what you've discovered is that all versions of
Bacula at least back to 2.0.x are a time bomb of sorts, and really
should not be used at all. I can't think of any such bugs in the past
that carry a very real risk of data loss that were on non-beta versions
of the code, and I think not fixing the problem in older releases would
not be good for Bacula's image. I'm running 2.0.3 presently, and it's
only 6 months old. I'm sure you can imagine there are many places that
do not allow upgrades of major products except for certain times of the
year.

Not trying to give you a hard time, but I'm not sure how it would look
to abandon such recent versions of software.

PS: Does this affect spooled simultaneous jobs, or only simultaneous
jobs that are simultaneously writing to storage?

- --
 ---- _  _ _  _ ___  _  _  _
 |Y#| |  | |\/| |  \ |\ |  | |Ryan Novosielski - Systems Programmer II
 |$&| |__| |  | |__/ | \| _| |[EMAIL PROTECTED] - 973/972.0922 (2-0922)
 \__/ Univ. of Med. and Dent.|IST/AST - NJMS Medical Science Bldg - C630

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFG5KuQmb+gadEcsb4RAglEAKChj9jtdVIHiVuBNrEpPoPpHlRHIQCgvruN
u+1NawO+LxUtFJA20KQrhzY=
=Ld7l
-----END PGP SIGNATURE-----
begin:vcard
fn:Ryan Novosielski
n:Novosielski;Ryan
org:UMDNJ;IST/AST
adr;dom:MSB C630;;185 South Orange Avenue;Newark;NJ;07103
email;internet:[EMAIL PROTECTED]
title:Systems Programmer III
tel;work:(973) 972-0922
tel;fax:(973) 972-7412
tel;pager:(866) 20-UMDNJ
x-mozilla-html:FALSE
version:2.1
end:vcard

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to