Re: [Bacula-users] Concurrent Job Behaviour

Sean O'Grady Wed, 18 May 2005 14:27:37 -0700

Hi,

Good points on a number of things but a few comments need to be made.

1) I'm not attempting to use spooling as a backup method that I want to restore from. I'm using spooling as its intended for, to avoid "shoe-shining". I backup a number of clients servers at remote sites and their network connections can sometimes be saturated while backing up. With the spooling the tape is only moving when it needs to be which is good for the wear and tear :) There is multiple clients and I'm trying to use different pools of tapes for them which is whats got me into this predicament. In terms of spool size and running out of space that is a consideration even in the current version so with some careful management this problem could be avoided.

2) I don't believe writing to a disk based volume and then migrating to tape would work for me. For restores wouldn't that require me to first restore the disk based volume from to tape to disk then restore the files I need from that disk volume (which is really its own Storage Device)? I haven't really looked into this scenario but the bits that I have read led me to believe that the restore scenario would be like that.

3) In terms of waiting for a Volume to be inserted I have the luxury of having a tape auto-loader doing the work for me. In my proposed scenario Bacula could check to see what Volume it requires as the Job finishes its spooling and if the tape is not in the drive it could issue an mtx-changer command and have the autoloader load it. I see some potential issues here with timing and deadlocking for the drive between jobs but some careful queue management could ensure this works.

4) Your absolutely right about Bacula positioning the tape before the Job starts. Looking at the code I see that before data spooling begins only after Bacula acquires the Storage Device which wouldn't work so well with Jobs needing Volumes from multiple Pools. With how the checks are currently working in terms of getting the "ok" to start the job wouldn't end up being too much different (I say with a wink), possibly some shuffling around in the order of the sub-routines?? In reality I think this is where the major part of the work would be needed, since there is potential for some major failure here.

With all this being said I'll diffently bring it up next time Kern asks for "wish-list" suggestions. In the meantime I can simply do away with the multiple pools and make sure that same Level Jobs happen in the same time frames and I should have the behaviour that I want minus the separate Pools.

Thanks everyone for your help!

Sean

Arno Lehmann wrote:

Hello,
Sean O'Grady wrote:
Hi,
...
I believe I have sorted out what my issue with this is. As I didn't post my complete configs and only the ones that I thought would be relevant I ended up only giving half the picture. What was missing was that there is another set of Pool tapes and different Jobs that run using these Pools (that also do data spooling) at the same time as the Jobs I showed before.
Ok, so this explains it.
Looking at src/dird/jobq.c I see the following which hopefully Kern or someone else in touch with the code can enlighten a bit more for me.
Well, I'm not in touch with the code, but still...
 >SNIP
if (njcr->store == jcr->store && njcr->pool != jcr->pool) {
    skip_this_jcr = true;
    break;
}
 >SNIP
This says to me that as long as the Pools of the Jobs being queued match, the Jobs will all run concurrently. Jobs however that have mismatching Pools will instead queue and wait for the storage device to free when previous jobs complete.
That's about it, I'd say.
Its probably not this simple but some behaviour equivalent to ...
if (njcr->store == jcr->store && njcr->pool != jcr->pool && njcr->spool_data != true) { skip_this_jcr = true; break; }

... sould allow for Jobs to queue with different Pools that have spooling on.
Your ideamight be possible, but there are someother things to consider. One is that bacula positions the tape *before* starting a job, i.e. bwfore starting to spool data.

I was wondering about this, but I can see some good reason as well. I guess that Kern's idea was that a job should only run when everything indicates that it can run at all.
So, making sure tape space is available is one important preparation.
To ensure that Jobs complete some further checks of the storage daemon and the director that -

1) when spooling from the client completes is the Storage device available for append 2) if the Storage device is availble is a Pool object suitable for this Job currently loaded (if not load it) 3) when the Job completes check the status of Jobs queued and grab the next Job where the spooling is complete goto 2) again
Although I can see advantages in your scenario I also see some disadvantages.

Spool space is one important thing - allowing jobs to spool without being sure when they will be despooled can use up much or even all of your disk space, thus preventing jobs from running smooth that otherwise could run fine.

Then, I think it's a good idea to have jobs finish as soon as possible, with would not be the case if they started, spooled data, and then had to wait for someone to insert the right volume. Bacula keeps open some network connections with each job, so it even wastes resources (although this should not be a serious problem).

Finally, I think spooling as bacula does it now is not the best approach to your needs. A spooled job is not available for restore and not considered done, so it's not yet a useful backup. A better approach would be to backup to a disk based volume first, and later migrate the job to tape.

My question now changes to "Is there a way for Jobs to run Concurrently that use different Pools as long as the Job Definitions are set to Spool Data" as outlined in the example above (or something similiar) ?

Or of course maybe Bacula can already handle this and I'm just missing it :)
This time you're not :-)
But, considering that Kern seems to have the development version in a state that approaches beta stability, I assume he will release the version 1.38 in the next few months.

After that, he will probably ask for feature requests and suggestions. This would be the best time to present your ideas once more.
Anyway, I'd vote for job migration :-)
Arno
Thanks,
Sean
Arno Lehmann wrote:
Hi.
Sean O'Grady wrote:
Well its good to know that Bacula will do what I need!
Guess now I need to determine what I've done wrong in my configs ...
I'm short forming all the config inforation to reduce the size of the e-mail but I can post my full configs if necessary. Anywhere where I have "Maximum Concurrent Jobs" I've posted that section of the config. If there is something else besides "Maximum Concurrent Jobs" needed in the configs to get this behaviour to happen and I'm missing it, please let me know.
The short form is ok :-)
Now, after reading through it I actually don't see any reason why only one job at a time is run.
Perhaps someone else can...
Still, I have some questions. First, which version of bacula do you use? Then, do you perhaps use job overrides concerning the pools or the priorities in your schedule? And, finally, are all the jobs scheduled to run at the same level, e.g. full, and do they actually do so? Perhaps you have a job running at Full level, and the others are scheduled to run incremental, so they have to wait for the right media (of pool DailyPool).
Arno
Any suggestions appreciated!
Sean
In bacula-dir.conf ...
Director {
 Name = mobinet-dir1
 DIRport = 9101                # where we listen for UA connections
 QueryFile = "/etc/bacula/query.sql"
 WorkingDirectory = "/data/bacula/working"
 PidDirectory = "/var/run"
 Maximum Concurrent Jobs = 10
 Password = "****"         # Console password
 Messages = Daemon
}
JobDefs {
  Name = "MobinetDef"
  Storage = polaris-sd
  Schedule = "Mobinet-Cycle"
  Type = Backup
  Max Start Delay = 32400 # 9 hours
  Max Run Time = 14400 # 4 hours
  Rerun Failed Levels = yes
  Maximum Concurrent Jobs = 5
  Reschedule On Error = yes
  Reschedule Interval = 3600
  Reschedule Times = 2
  Priority = 10
  Messages = Standard
  Pool = Default
  Incremental Backup Pool = MobinetDailyPool
  Differential Backup Pool = MobinetWeeklyPool
  Full Backup Pool = MobinetMonthlyPool
  SpoolData = yes
}
JobDefs {
  Name = "SiriusWebDef"
  Storage = polaris-sd
  Schedule = "SiriusWeb-Cycle"
  Type = Backup
  Max Start Delay = 32400 # 9 hours
  Max Run Time = 14400 # 4 hours
  Rerun Failed Levels = yes
  Maximum Concurrent Jobs = 5
  Reschedule On Error = yes
  Reschedule Interval = 3600
  Reschedule Times = 2
  Priority = 10
  Messages = Standard
  Pool = Default
  Incremental Backup Pool = MobinetDailyPool
  Differential Backup Pool = MobinetWeeklyPool
  Full Backup Pool = MobinetMonthlyPool
  SpoolData = yes
}
Storage {
 Name = polaris-sd
 Address = "****"
 SDPort = 9103
 Password = "****"
 Device = "PowerVault 122T VS80"
 Media Type = DLTIV
 Maximum Concurrent Jobs = 10
}
In bacula-sd.conf
Storage { # definition of myself Name = polaris-sd SDPort = 9103 # Director's port WorkingDirectory = "/data/bacula/working" Pid Directory = "/var/run" Maximum Concurrent Jobs = 10 }
Device {
  Name = "PowerVault 122T VS80"
  Media Type = DLTIV
  Archive Device = /dev/nst0
  Changer Device = /dev/sg1
  Changer Command = "/etc/bacula/mtx-changer %c %o %S %a"
  AutoChanger = yes
  AutomaticMount = yes               # when device opened, read it
  AlwaysOpen = yes
  LabelMedia = no
  Spool Directory = /data/bacula/spool
  Maximum Spool Size = 14G
}
In bacula-fd.conf on all the clients
FileDaemon {                          # this is me
 Name = polaris-mobinet-ca
 FDport = 9102                  # where we listen for the director
 WorkingDirectory = /data/bacula/working
 Pid Directory = /var/run
 Maximum Concurrent Jobs = 10
}
Arno Lehmann wrote:
Hello,
Sean O'Grady wrote:
...
As an alternative which would be even better - All 5 Jobs start @ 23:00 spooling data from the client, the first Job to complete the spooling from the client starts writing to the Storage Device. Remaining Jobs queue for the Storage Device as it becomes available and as their spooling completes.

Instead what I'm seeing is while the first job executes the additional jobs all have a status of "is waiting on max Storage jobs" and will not begin spooling their data until that first Job has spooled->despooled->written to the Storage Device.

My question is of course "is this possible" to have Concurrent Jobs running and spooling in one of the scenarios above (or another I'm missing).
Well, I guess that this must be a setup problem on your side - after all, this is what I'm doing here and it works (apart from very few cases where jobs are held that *could* start, but I couldn't find out why yet).

From your description, I assume that you forgot to set "Maximum Concurrent Jobs" in all the necessary places, namely in the storage definitions.

I noticed that the same message is printed when the director has to wait for a client, though. (This is not yet confirmed, noticed it only yesterday and couldn't verify it yet).

If so I'll send out more details of my config to see if anyone can point out what I'm doing wrong.
First, verify the settings you have - there are directives in the client's config, the sd config, and the director configuration where you need to apply the right settings for your setup.
Arno
Thanks,
Sean
--
Sean O'Grady
System Administrator
Sheridan College
Oakville, Ontario
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users
-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


-------------------------------------------------------
This SF.Net email is sponsored by Oracle Space Sweepstakes
Want to be the first software developer in space?
Enter now for the Oracle Space Sweepstakes!
http://ads.osdn.com/?ad_id=7412&alloc_id=16344&op=click
_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Concurrent Job Behaviour

Reply via email to