On 15/8/22 06:44, Dale wrote:
Howdy,

With my new fiber internet, my poor disks are getting a work out, and
also filling up.  First casualty, my backup disk.  I have one directory
that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
is right now and it's still trying to pack in files.


/dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb


Right now, I'm using rsync which doesn't compress files but does just
update things that have changed.  I'd like to find some way, software
but maybe there is already a tool I'm unaware of, to compress data and
work a lot like rsync otherwise.  I looked in app-backup and there is a
lot of options but not sure which fits best for what I want to do.
Again, backup a directory, compress and only update with changed or new
files.  Generally, it only adds files but sometimes a file gets replaced
as well.  Same name but different size.

I was trying to go through the list in app-backup one by one but to be
honest, most links included only go to github or something and usually
doesn't tell anything about how it works or anything.  Basically, as far
as seeing if it does what I want, it's useless. It sort of reminds me of
quite a few USE flag descriptions.

I plan to buy another hard drive pretty soon.  Next month is possible.
If there is nothing available that does what I want, is there a way to
use rsync and have it set to backup files starting with "a" through "k"
to one spot and then backup "l" through "z" to another?  I could then
split the files into two parts.  I use a script to do this now, if one
could call my little things scripts, so even a complicated command could
work, just may need help figuring out the command.

Thoughts?  Ideas?

Dale

:-)  :-)

The questions you need to ask is how compressible is the data and how much duplication is in there.  Rsync's biggest disadvantage is it doesn't keep history, so if you need to restore something from last week you are SOL.  Honestly, rsync is not a backup program and should only be used the way you do for data that don't value as an rsync archive is a disaster waiting to happen from a backup point of view.

Look into dirvish - uses hard links to keep files current but safe, is easy to restore (looks like a exact copy so you cp the files back if needed.  Downside is it hammers the hard disk and has no compression so its only deduplication via history (my backups stabilised about 2x original size for ~2yrs of history - though you can use something like btrfs which has filesystem level compression.

My current program is borgbackup which is very sophisticated in how it stores data - its probably your best bet in fact.  I am storing literally tens of Tb of raw data on a 4Tb usb3 disk (going back years and yes, I do restore regularly, and not just for disasters but for space efficient long term storage I access only rarely.

e.g.:

A single host:

------------------------------------------------------------------------------
                       Original size      Compressed size Deduplicated size
All archives:                3.07 TB              1.96 TB            151.80 GB

                       Unique chunks         Total chunks
Chunk index:                 1026085             22285913


Then there is my offline storage - it backs up ~15 hosts (in repos like the above) + data storage like 22 years of email etc. Each host backs up to its own repo then the offline storage backs that up.  The deduplicated size is the actual on disk size ... compression varies as its whatever I used at the time the backup was taken ... currently I have it set to "auto,zstd,11" but it can be mixed in the same repo (a repo is a single backup set - you can nest repos which is what I do - so ~45Tb stored on a 4Tb offline disk).  One advantage of a system like this is chunked data rarely changes, so its only the differences that are backed up (read the borgbackup docs - interesting)

------------------------------------------------------------------------------
                       Original size      Compressed size Deduplicated size
All archives:               28.69 TB             28.69 TB              3.81 TB

                       Unique chunks         Total chunks
Chunk index:



Reply via email to