> Copypools
> Extract capability (#25)
> Continued enhancement of bweb
> Threshold triggered migration jobs (not currently in list, but will be
> needed ASAP)
> Client triggered backups
> Complete rework of the scheduling system (not in list)
> Performance and usage instrumentation (not in list)

> Item #1, which was the number one rated project isn't even on your
> radar screen.  I can understand that Copy pools would be #1, but can
> comment on why Accurate backups don't appear on your list?

I'm primarily concerned with restoring data to the state of the last
backup. If there is a partial delete of the files on disk, then I'll get
what I want back by restoring the most recent backup and skipping files
that already exist. If there is a total loss of data on disk, then I'll
be restoring the last full and any incrementals after that fact to roll
me forward to the last known state. I really don't *want* Bacula trying
to figure out what was and wasn't there at a point in time. 

>From my seat, #1 (as written) is really an artifact of the job
orientation of file storage in Bacula. I'd like to see that change to a
file version orientation, but that's a major change, and the current
setup can be worked around for places where I really care about the
presence/absence problem. The list I gave are things that I can't work

Unless I've totally misunderstood #1, that is. 

> Could you give me a few details of what the scheduling problems were?

The biggest problem is contacting a large number of clients without
blocking the director. The current setup has to sort out the schedules,
group the clients into reasonable size blocks that don't exceed the
MaxJobs parm, and start hacking through them. If one client doesn't
respond, then that job slot is out of service until the connect timer
expires (and Bacula tries several times to contact the client before
giving up), so the problem escalates as the number of clients increases.
Switching to an external scheduler that knows how many job execution
slots are available, and runs a script to verify that the client can be
reached and submits the backup job only if the client can be reached and
a job slot is available gets a lot more work through the same director.

Second, the Bacula scheduler is completely internal to Bacula, and is
ignorant of anything else that is going on in the environment. It can't
take into account other workload priorities (especially in an
environment where a fixed number of devices have to be shared between
Bacula and non-Bacula uses, in some cases, not even the same OS
instance). Ditto network bandwidth, and CPU in virtualized environments.
Shutting off the internal scheduler entirely and using the enterprise
scheduler in place lets Bacula work interleave into the whole
environment, and the job scheduler can incorporate it properly. 

Third, by moving the complexity of schedule management out of the
director entirely, I improve the uptime of my backup system. I've
suggested in the past moving the Bacula configuration completely into
the database; in this configuration I only have to add the client to the
config file, and schedule it in the scheduler according to the
enterprise workload calendar that I already have going for all the other

Since you're familiar with MVS, think about a tool like OPC or Tivoli
Workload Scheduler. For Unix, compare with the Sun Grid1 scheduler
(which also works nicely for Bacula use). There's also an open-source
variant on Grid1 whose name escapes me at the moment.

(BTW, this kills off items 11, 12, 24 as well)

> By the way, I never imagined one Director could handle 2000 clients.

Well, it can -- IFF it's only doing resource scheduling and job
execution monitoring. I don't think it would be possible to handle that
many if it were also trying to do schedule initiation too. 

This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems?  Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >>  http://get.splunk.com/
Bacula-users mailing list

Reply via email to