After some off-list discussions, we've set up a meeting for this
Thursday to talk about the six-month retention request.  The IT group
leader has said he's fine with "daily for a week, weekly for a month,
and monthly for two months", but I'm thinking a 14-day dumpcycle and
keep it for a month would be both sufficient and simpler, given how
amanda does scheduling.

14-day instead of 7-day dumpcycle is mostly just to free up network
overhead for fulls - if we go by the current TSM backup times and assume
no parallel dumps, it's looking like about 8 days to do a full.  In
reality it would take less time than that (because of parallelization)
under normal circumstances, but I don't want to tempt fate in the event
of non-normal circumstances which might slow things down temporarily.

Once retention is defined, the (uncompressed) storage requirements are
straightforward to calculate, so that'll be covered.

I'll probably do some limited testing with VDO to see whether it does me
any good, but I don't expect it to, since tarring everything up (instead
of storing individual files) will greatly reduce the number of identical
disk sectors for it to deduplicate, and VDO doing compression outside of
amanda would complicate dump size estimates.  So I'm assuming no VDO.

That all seem sane?


The remaining question is what kind of CPU horsepower will be needed to
manage everything and compress the resulting volume of data.  (~3.5T/
day, assuming a 14-day dumpcycle, or 6.5T/day at 7 days)  Any thoughts
on what that's likely to require?


On Tue, Nov 24, 2020 at 04:14:41AM -0600, Dave Sherohman wrote:
> On Mon, Nov 23, 2020 at 11:28:37PM +0100, Stefan G. Weichinger wrote:
> > Am 16.11.20 um 14:25 schrieb Dave Sherohman:
> > I am a bit surprised by the fact you haven't yet received any reply on
> > the list so far (maybe per direct/private reply).
> 
> I received one accidentally-off-list reply, as already mentioned.  But,
> aside from that, I interpreted it as just the list acting up - if you
> check the headers on the message you replied to, I sent it on Monday the
> 16th, but it didn't go out to the list until Friday the 20th.  So
> getting on-list replies on the 24th is right in keeping with that
> schedule...
> 
> > Your "project" and the related questions could start a new thread
> > without problems ;-)
> 
> True.  But here's a new subject line, at least.  :)
> 
> > * how dynamic is your data: are the incremental changes big or small ...
> 
> We're currently doing backup via Tivoli Storage Manager.  The daily TSM
> output shows a total of about 700GB per day in "Total number of bytes
> transferred".  Most hosts are only sending some MB or maybe a dozen GB.
> The substantial majority comes from two database servers (400GB and
> 150GB/day).
> 
> I only have access to the output emitted by the TSM client as it runs,
> so I don't know what space is used on the server, but this 700GB/day
> is the raw data size.  ("Objects compressed by: 0%")
> 
> > * what $dumpcycle is targetted?
> 
> Seven days is a nice default, but, given the scale of data here and the
> request for maintaining 6 months of backups, I'm thinking 30 days might
> be more sane.
> 
> Back when I was using amanda 20 years ago, I recall a lot of people
> would run a 7-day tapecycle, then monthly and annual full archival
> backups.  I assume something like that would be possible with vtapes as
> well, so that could be an option for maintaining a seven-day dumpcycle
> without needing an exabyte of storage.
> 
> And, personally, I think the 6 month retention is massive overkill in
> any case.  I've been in this job for just over a decade, and I could
> probably count the number of restores in that time on my fingers, and
> none of them needed data more than a week old.
> 
> > * parallelity: will your new amanda server have multiple NICs etc / plan
> > for a big holding disk (array)
> 
> We tend to default to 4 NICs on new server purchases and have gone
> higher.  But we've only done active/passive bonding so far, which is
> basically just single-NIC throughput.  We tried a higher-capacity mode
> once, but the campus data center and I weren't able to get all the
> pieces to coordinate properly to make it work.  (It was some years ago,
> so I don't recall the details of the problems.)
> 
> Holding disk size is one of the things I'm looking for advice on.  The
> largest DLE is currently a 19T NAS, but the admin responsible for that
> system agrees that it should be split into multiple fliesystems, even
> aside from backup-related reasons.  Assuming it doesn't get split, would
> 20T holding disk be sufficient or does it need to be 2x the largest DLE?
> 
> > * fast network is nice, but this results in a bottleneck called
> > *storage* -> fast RAID arrays, maybe SSDs.
> 
> My boss isn't particularly price-sensitive, but I doubt that he could
> swallow the cost of putting all the vtapes on SSD, so hopefully it won't
> come to that.  SSD for the holding disk should be doable.
> 
> > I'd start with asking: how do your current backups look like?
> > 
> > What is the current rate of new/changed data generated?
> 
> Covered that above, but, to quickly reiterate, we're using Tivoli
> Storage Manager, which runs daily incrementals totaling approx. 700GB
> (uncompressed) per day, the bulk of which is 400GB from one database
> server and 150GB from a second database server.  Both are running
> mysql/mariadb, if that matters.
> 
> > * how long does it take to copy all the 40TB into my amanda box (*if* I
> > did a FULL backup every time)?
> 
> The 400GB/day server takes about 8 hours to do its daily run.  If we
> assume that data rate and *no* parallelization, it comes out to a bit
> over a week for 40T.
> 
> However, I assume that's being throttled by the TSM server, because I
> get approximately double that rate when copying disk images on my kvm
> servers, and those are using remote glusterfs disk mounts, so the data
> is crossing the network multiple times.
> 
> > * what grade of parallelity is possible?
> 
> As much as the network capacity will support, really.  Our current
> backups kick off simultaneously for almost all servers (the one
> exception is that 400G/day db server, which starts earlier).  About half
> finish within a minute or so (only backing up a couple hundred MB or
> less) and most are complete within half an hour.  It's pretty much just
> db servers (the ones I've mentioned already, plus some postgresql
> machines with between 10 and 50G/day) that take longer than an hour to
> complete.
> 
> -- 
> Dave Sherohman


-- 
Dave Sherohman

Reply via email to