On Monday 25 February 2008 18.47:50 [EMAIL PROTECTED] wrote: > In the message dated: Sat, 23 Feb 2008 12:40:43 +0100, > Kern Sibbald used the subject line > <[Bacula-users] Improving job scheduling flexibility> > and wrote: > > => Hello, > => > => As you know, current job scheduling has a few deficiencies, particular > if for => some reason your backups get blocked (a bad tape driver or > operator => intervention required), which can lead to a big pile of > duplicate jobs being => scheduled. > > Or if a job takes so long that it is still running when the next instance > of the same job is launched (ie., a backup that takes more than 24 hours). > > [SNIP!] > => > => My current idea is to create a new "DuplicateJobs" resource and a new > => Duplicate Jobs directive which would point to the duplicate jobs > resource. > > Sounds great! > > => The reason for the resource is that there are just too many different > => variations that it would require a lot of new directives, and it seems a > => shame to add them to every Job. > => > => My current design calls for a Duplicate Jobs resource that looks > something => like the following: > => > => DuplicateJobs { > > [SNIP!] > > => > => Job Proximity = <time-interval> (0) > => > => } > => > > [SNIP!] > > => > => Finally Job Proximity is to allow a bit of overlap. For example, if a > job has => been running 20 minutes or ran 20 minutes ago, you might want to > not apply => the rules. > > Could you elaborate on what this means to you a bit more?
I think I was confused and stated it backwards. Anyway, the Job Proximity directive was proposed by David Boyes, so perhaps he could give us a definitive definition :-) > > I see the distinction here being mainly in terms of jobs that take a "long" > time vrs a "short" time. If the entire job normally takes 30 minutes, I > don't really care whether there's a duplicate, and it doesn't matter to me > if the duplicate starts 1 minute after the original or 29 minutes after. > > However, if the job normally takes 18 hours, then the conditions are very > different. In this case, I really, really, really don't want a duplicate > running if there's a lot of overlap--this would have a major effect on disk > loads on the client, on network traffic, and on disk/cpu/media resource on > the bacula server. However, if the original job is almost near completion > when the duplicate is launched, then I don't want to cancel the duplicate. > In this case, the reasoning is that canceling the duplicate would result in > a long window with no backups, in an effort to close a small window of > duplicate (simultaneous) backups running. I can see the usefulness of the above, and don't want to rule it out, but for this cut, it probably requires more time to implement than I have for the current enhancement. This go around, I am really targetting the problem of multiple jobs being scheduled and piling up waiting execution due to something "blocking" or taking too long. > > Here's a very complicated proposal, which will almost certainly be > rejected, that really leverages Bacula's database backend and gives a > really powerful feature: > > if the job historically takes over $DURATION [minutes|hours|days] > and the current job is at least $PERCENTAGE complete, then allow the > duplicate to run, otherwise kill the duplicate > > in this case, $DURATION would be determined from database stats, > as an average of previous runs of the same job at the same > level. > > I could also see an algorithm that > gives more weight to the duration of the most recent backups if > the > standard deviation of the average vrs. the most recent backups > is > greater than a specified value. This is because a given backup > is > more likely to take "almost as much" time as the most recent > backup > of the same level than as much time as a much earlier backup. > > similarly, the $PERCENTAGE value could be expressed as a range, > incorporating the standard deviation in the backup duration > I think you have something there, so you might want to put the above into a Feature Request. I don't think it will get implemented in the near future due to the long list of big, important projects that we have, but it would be a good way to ensure that the idea is not lost. > > > [As an aside, I'd like to see this kind of predictive/AI capability put > into more of bacula, particularly in the scheduling. It would be wonderful > to use the historic records to allow bacula to schedule jobs most > efficiently, in a way similar to Amanda, rather than hard-coding specific > times in each job resource.] Virtually everyone that I have talked to especially in companies says that they do not like Amanda's way of scheduling jobs. That said, I don't rule out doing something like they do, and certainly the new "Max Full Age" directive goes in that direction. However, at the current time, I would suggest if you would like AI features, by all means turn of Bacula scheduling and implement a Perl script that does the scheduling. After you have a bit of experience with your system, I would be really interested in hearing about it. I suspect that you will find that it takes a lot of work and many iterations to get AI type features working correctly -- at least that would be the case for me. Best regards, Kern > > => > => As you can see, there is a lot of room for clarification of what should > be => done, and also a need for a bit more functionality ... -- in other > words a => bit more design is needed before beginning the implementation. > => > => Comments? > => > => Best regards, > => > => Kern > => > > ---- > Mark Bergman [EMAIL PROTECTED] 215-662-7310 > System Administrator Section of Biomedical Image Analysis > Department of Radiology University of Pennsylvania > PGP Key at: https://www.rad.upenn.edu/sbia/bergman > > > > The information contained in this e-mail message is intended only for the > personal and confidential use of the recipient(s) named above. If the > reader of this message is not the intended recipient or an agent > responsible for delivering it to the intended recipient, you are hereby > notified that you have received this document in error and that any review, > dissemination, distribution, or copying of this message is strictly > prohibited. If you have received this communication in error, please notify > us immediately by e-mail, and delete the original message. ------------------------------------------------------------------------- This SF.net email is sponsored by: Microsoft Defy all challenges. Microsoft(R) Visual Studio 2008. http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users