On Fri, 7 Sep 2007, Dan Langille wrote:

> On 8 Sep 2007 at 2:42, Doytchin Spiridonov wrote:
>> At least we found a working solution (no concurrent jobs, because with
>> concurent jobs bacula was useless) hoping they will fix it sometime
>> when they receive enough proof that there IS a bug. You can reopen it
>> (as I'm not going to do it after I've got several times a response
>> "can't replicate, so there are no bugs") at bugs.bacula.org
> What do you suggest we do if we are unable to replicate the bug?
> What course[s] of action would you suggest?

OK, I will suggest a course of action.

I am sure that you would agree that enough people have reported this issue 
now to confirm that there is a major problem with concurrent job 
processing that is unrelated to any hardware issues.

I am sure you would also agree that people are running bacula for a 
reason, and they expect to be able to restore their data, and consequently 
they cannot enter into a testing regime with production systems to debug 
this. I can reproduce this problem at will, but I cannot use my own 
systems nor any customer systems for debugging it further, nor give access 
to anyone else to do the same. Now that it is known that using Max 
Concurrent Jobs greater than 1 can lead to volume corruption, no system 
that I manage can use concurrent jobs until the cause is known and fixed. 
And this will apply to everyone using bacula: test your restores 
regularly.

Presumably "you" (developers, not just you personally) have testing 
systems for which the actual backed up data is not important, and that can 
therefore be used to investigate this issue, and that you have a way to 
verify the structural integrity of the saved data volumes, and that you 
cannot expect folks running bacula in production to have the same. Since 
the developers also presumably have an interest in the functionality of 
the code base, and are familiar with the structure of that code, I would 
suggest that for such a major issue an inability to reproduce the problem 
by doing a number of successful restores is not sufficient cause to stop 
investigating it: it has to be worked on it until the cause is known. Let 
me state again that this is a major show-stopper problem. Obviously 
Doytchin has spent considerable time on it already, and his efforts allow 
both him and me, and probably many others, to run backups with a 
reasonable expectation of being able to restore.

I have some spare hardware that I can probably rig up for testing, but I 
have a business to run and my time is therefore limited. I am willing, 
however, to assist in whatever way I can, given these constraints.

Steve
----------------------------------------------------------------------------
Steve Thompson                 E-mail:      smt AT vgersoft DOT com
Voyager Software LLC           Web:         http://www DOT vgersoft DOT com
39 Smugglers Path              VSW Support: support AT vgersoft DOT com
Ithaca, NY 14850
   "186,300 miles per second: it's not just a good idea, it's the law"
----------------------------------------------------------------------------

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to