Re: [gentoo-user] Backup program that compresses data but only changes new files.

John Covici Mon, 15 Aug 2022 02:45:46 -0700

On Mon, 15 Aug 2022 04:33:44 -0400,
Dale wrote:
> 
> William Kenworthy wrote:
> >
> > On 15/8/22 06:44, Dale wrote:
> >> Howdy,
> >>
> >> With my new fiber internet, my poor disks are getting a work out, and
> >> also filling up.  First casualty, my backup disk.  I have one directory
> >> that is . . . well . . . huge.  It's about 7TBs or so.  This is where it
> >> is right now and it's still trying to pack in files.
> >>
> >>
> >> /dev/mapper/8tb            7.3T  7.1T  201G  98% /mnt/8tb
> >>
> >>
> >> Right now, I'm using rsync which doesn't compress files but does just
> >> update things that have changed.  I'd like to find some way, software
> >> but maybe there is already a tool I'm unaware of, to compress data and
> >> work a lot like rsync otherwise.  I looked in app-backup and there is a
> >> lot of options but not sure which fits best for what I want to do.
> >> Again, backup a directory, compress and only update with changed or new
> >> files.  Generally, it only adds files but sometimes a file gets replaced
> >> as well.  Same name but different size.
> >>
> >> I was trying to go through the list in app-backup one by one but to be
> >> honest, most links included only go to github or something and usually
> >> doesn't tell anything about how it works or anything.  Basically, as far
> >> as seeing if it does what I want, it's useless. It sort of reminds me of
> >> quite a few USE flag descriptions.
> >>
> >> I plan to buy another hard drive pretty soon.  Next month is possible.
> >> If there is nothing available that does what I want, is there a way to
> >> use rsync and have it set to backup files starting with "a" through "k"
> >> to one spot and then backup "l" through "z" to another?  I could then
> >> split the files into two parts.  I use a script to do this now, if one
> >> could call my little things scripts, so even a complicated command could
> >> work, just may need help figuring out the command.
> >>
> >> Thoughts?  Ideas?
> >>
> >> Dale
> >>
> >> :-)  :-)
> >>
> > The questions you need to ask is how compressible is the data and how
> > much duplication is in there.  Rsync's biggest disadvantage is it
> > doesn't keep history, so if you need to restore something from last
> > week you are SOL.  Honestly, rsync is not a backup program and should
> > only be used the way you do for data that don't value as an rsync
> > archive is a disaster waiting to happen from a backup point of view.
> >
> > Look into dirvish - uses hard links to keep files current but safe, is
> > easy to restore (looks like a exact copy so you cp the files back if
> > needed.  Downside is it hammers the hard disk and has no compression
> > so its only deduplication via history (my backups stabilised about 2x
> > original size for ~2yrs of history - though you can use something like
> > btrfs which has filesystem level compression.
> >
> > My current program is borgbackup which is very sophisticated in how it
> > stores data - its probably your best bet in fact.  I am storing
> > literally tens of Tb of raw data on a 4Tb usb3 disk (going back years
> > and yes, I do restore regularly, and not just for disasters but for
> > space efficient long term storage I access only rarely.
> >
> > e.g.:
> >
> > A single host:
> >
> > ------------------------------------------------------------------------------
> >
> >                        Original size      Compressed size Deduplicated
> > size
> > All archives:                3.07 TB              1.96 TB           
> > 151.80 GB
> >
> >                        Unique chunks         Total chunks
> > Chunk index:                 1026085             22285913
> >
> >
> > Then there is my offline storage - it backs up ~15 hosts (in repos
> > like the above) + data storage like 22 years of email etc. Each host
> > backs up to its own repo then the offline storage backs that up.  The
> > deduplicated size is the actual on disk size ... compression varies as
> > its whatever I used at the time the backup was taken ... currently I
> > have it set to "auto,zstd,11" but it can be mixed in the same repo (a
> > repo is a single backup set - you can nest repos which is what I do -
> > so ~45Tb stored on a 4Tb offline disk).  One advantage of a system
> > like this is chunked data rarely changes, so its only the differences
> > that are backed up (read the borgbackup docs - interesting)
> >
> > ------------------------------------------------------------------------------
> >
> >                        Original size      Compressed size Deduplicated
> > size
> > All archives:               28.69 TB             28.69 TB             
> > 3.81 TB
> >
> >                        Unique chunks         Total chunks
> > Chunk index:
> >
> >
> >
> >
> 
> 
> For the particular drive in question, it is 99.99% videos.  I don't want
> to lose any quality but I'm not sure how much they can be compressed to
> be honest.  It could be they are already as compressed as they can be
> without losing resolution etc.  I've been lucky so far.  I don't think
> I've ever needed anything and did a backup losing what I lost on working
> copy.  Example.  I update a video only to find the newer copy is corrupt
> and wanting the old one back.  I've done it a time or two but I tend to
> find that before I do backups.  Still, it is a downside and something
> I've thought about before.  I figure when it does happen, it will be
> something hard to replace.  Just letting the devil have his day.  :-(
> 
> For that reason, I find the version type backups interesting.  It is a
> safer method.  You can have a new file but also have a older file as
> well just in case new file takes a bad turn.  It is a interesting
> thought.  It's one not only I should consider but anyone really. 
> 
> As I posted in another reply, I found a 10TB drive that should be here
> by the time I do a fresh set of backups.  This will give me more time to
> consider things.  Have I said this before a while back???  :/ 
>


zfs would solve your problem of corruption, even without versioning.
You do a scrub at short intervals and at least you would know if the
file is corrupted.  Of course, redundancy is better, such as mirroring
and backups take a very short time because sending from one zfs to
another it knows exactly what bytes to send.

-- 
Your life is like a penny.  You're going to lose it.  The question is:
How do
you spend it?

         John Covici wb2una
         cov...@ccs.covici.com

Re: [gentoo-user] Backup program that compresses data but only changes new files.

Reply via email to