I'm soliciting comments about my thinking re: possibly doing incrementals of the TSM DB instead of some of the daily fulls I'm doing now. I'm braced for tomatoes, so fire away. :)
My big DB2 backup customers are going through the exercise of measuring "How frequently do we really need to do fulls?". I'm hoping to convince them that daily fulls _plus_ some, in addition to retaining all the logs, can be scaled back some. But this made me think. I've never done that same exercise with my own DB backups; the TSM db. My infrastructure has more than 200G of TSM database running at the moment, split up into 12 servers. Currently, I start my TSM DB backups at a little after 0400, and have to struggle to get all the various copies and migrations complete by the time the backup window opens in the evening. Now, I'm making an unreasonable number of copies at the moment: one onsite copy and -THREE- offsite copies, two of which are electronically vaulted. While the percentage savings would be the same for any of us, my absolute savings are starting to feel compelling. My first response to scaling back was a shudder, but I'm trying to see it logically. If I trust the substrate, and do (say) an incremental DB backup 2 out of 3 days, or some similar such... How much exposure am I _really_ adding? If I go from daily fulls to every-Nth-day fulls, and run incrementals in between: + I add some amount (how much?) to DB restoration in the event of a disaster. Knee-jerk estimation is that the succesive incremental application isn't going to be huge in relation to the full. Perhaps linear with size? + I increase by some amount my exposure to media failure. This seems negligible. If I've got a reasonable number of extra copies of my DB backups, the chance of media failure is acceptably tiny. + I add exposure to several new code paths in the TSM server codebase. A bug in incremental application would mean I'd have to revert to the full, possibly increasing my lost-data period. That's probably negligible. .... I couldn't come up with any more negatives. Oh, + My TSM administrator will have the willies about it for a while. Accepting those risks, I win: + Dramatically smaller use of backup landing pad and offsite generation resources. + Dramatically smaller use of primary tape. + More wall-clock time to do e.g. expiration and such. Anybody see something I'm missing? - Allen S. Rout