On 08/04/2014 06:43 PM, Josh Fisher wrote:

 ...

Have you set PreferMountedVolumes=no in the Job resource in bacula-dir.conf? If 3 jobs start and want to write to volumes in the same pool, then all three can be assigned the same volume. In fact, if PreferMountedVolumes=yes, (the default), then all three WILL be assigned the same volume unless the pool restricts the max number of jobs that the volume may contain. However, your device (drive) restricts the max concurrent jobs to 2. Therefore one of those three jobs will not be able to select the drive where the volume is mounted and will be forced to select another unused drive. That third job will nevertheless select the same volume as the other two and attempt to move the volume from the drive it is in into the drive that it has been assigned to. The configuration has a built-in race condition.
This is the first time that I have heard this explained so clearly.  I am going to try to duplicate this problem now that you have so clearly explained it.  By the way, I am not really sure I would classify this as a race condition, because theoretically the SD is not blocked, the third job just waits until the Volume is free (at least that is what I programmed).  However, this is clearly very inefficient. 

I would like to fix this, but one must keep in mind one important difficulty with Bacula.  The SD knows what is going on with Volumes, but the Dir does not, and it is the Dir that proposes Volumes to the SD.  Currently there is no good atomic way to pass the information in the SD to the Dir so that it can make better decisions. 

So, with the (current) restraint that the solution must involve changing only the SD algorithm, how could one prevent this from happening?  I have some ideas, but wonder what you think.


Setting PreferMountedVolumes=no causes the three jobs to select a drive that is NOT already mounted with a volume from the pool. This allows jobs writing to the same pool to select different volumes from the pool, rather than all selecting the same next available volume. This has its own caveats. It doesn't necessarily prevent two jobs from selecting the same volume in some cases, meaning that they will want to swap the volume back and forth between drives, which is another type of race condition. I have used this method successfully for a pool containing full backups only by setting PreferMountedVolumes=no in the job resource and setting MaximumVolumeJobs=1 in the pool resource. Since Bacula selects the volume for a job in an atomic manner, this forces an exclusive set of volumes for each job, thus preventing the race condition. This means that concurrency is limited only by the number of drives, but at the "expense" of creating a greater number of smaller volume files. I quote "expense" because on a disk vchanger it isn't usually a big issue to have more volume files. Doing this with a tape autochanger would use a lot more tapes and be truly more expensive. Of course unlimited concurrency is theoretical, since the hardware limits the USEFUL concurrency.

I really do not like the PreferMountedVolumes = No option (I have probably said this many times), but I find your use of it very well explained and very interesting.

Best regards,
Kern

...


------------------------------------------------------------------------------
Infragistics Professional
Build stunning WinForms apps today!
Reboot your WinForms applications with our WinForms controls. 
Build a bridge from your legacy apps to the future.
http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Reply via email to