On 1/5/11 12:00 PM, rory_f wrote:
I want to ensure tapes are filled 100% each time where possible. I've written a 
script in python to look at directory, figure out size, and create a disklist 
which will ensure a round about size for each disklist file - so for instance 
it will try to create a disklist file that contains entries in groups of 
400gb's - the size of a tape. I know amanda will fill a tape to 100% where 
possible but sometimes, if it is using compression, this doesn't work, and the 
first two tapes will fill 500gb+ and then the last tape will be left with 
200gb. This is a waste of 200gb - I'm trying to make sure all tapes are full 
where possible and not waste any space.

Not to be rude, but that's a false economy.

It could just as easily be said that you would be wasting tape capacity by not 
using compression.

You are asking to not allow more than 400GB per tape, and thus no more than 1200GB on the set of 3. Then you are complaining that the 1200GB is unevenly distributed across the 3 tapes, because compression allowed more than 400GB on each of the first 2 tapes. So, stated another way, you are asking that the "wasted" (or unused) 300GB (or so) of space be distributed across all 3 tapes, rather than just being on the last tape, and/or to just not use compression so that you can imagine that you are not wasting tape.

500GB per tape means that you are getting about 20% compression. If that is consistent, have your python script set to queue up somewhere between 1400GB to 1500GB for backup, the choice depending on how close you want to shave it (with a higher risk of over running the last tape). Then you are being economical with your tape usage, getting a couple hundred more GB on the set of tapes than you were originally thinking.

Of course, compressibility varies widely. Huge directories of TIFF and JPEG files can be essentially uncompressible. Typical unix directories of predominantly text based stuff, like log files or configuration files, are highly compressible, and repetitive things like Apache access logs can compress as much as 10:1. So, you have to know your data to efficiently plan what you are trying to do.


--
---------------

Chris Hoogendyk

-
   O__  ---- Systems Administrator
  c/ /'_ --- Biology&  Geology Departments
 (*) \(*) -- 140 Morrill Science Center
~~~~~~~~~~ - University of Massachusetts, Amherst

<hoogen...@bio.umass.edu>

---------------

Erdös 4


Reply via email to