I have not seen any replies to your question,  I can’t speak to that volume of 
data though I see no reason why it cannot.  Here are my thoughts below how I 
would approach it along with some of your other questions.

* The number of files will impact things more than total data size.  It will 
increase database size, scan time etc.  
* I have easily seen Bareos saturate well above 100Mbit networking.  Though 
100Mbit is very slow to do the initial full backup of 200T  You are looking at 
a minimum of 6 months assuming data does not compress.  For initial backup you 
might want to do sneaker net with a raspberry pi and a drobo.  This is what I 
do, full backup is done on site @ gig speeds then cary the entire setup to the 
other site and do a volume migration to the real server.
https://fasterdata.es.net/home/requirements-and-expectations/

* Look at the Bareos client side compression options, on bandwidth constrained 
hosts (this includes cloud because of cost)  I use gzip turned all teh way up.  
This will peg one CPU core but for text data reduces the volume of data over 
wire drastically.  Something like lz4  is a great low CPU impact but still get 
70% of the compression of gzip.  If you have the CPU core to burn and in a test 
if it still saturates your 100Mbit maybe use it to get that backup time down.  
If this is all video or already compressed images cram files it likely just 
burns CPU for no impact.  Baroes give you a report at the end of ajob of how 
well it compressed.

* how baroes checks for files,  using the accurate settings (recommended) the 
server will upload a list of files it knows about to the client and it compares 
them.  This process is very fast, by default Bareos won’t use checksums to 
compare, but only  1. does the file exist,   2.  is the filesystem metadata 
newer then in the database/catalog (file has changed).   Incrementals with 
Baroes are much faster than rsync.  (I have moved PB of data with Rsync)


With 200TB of data you will want a lot of tape,  otherwise you're looking at 
400TB+  of disk.  If your new to backup you have to build a new “full”  every 
so often.  Given your network is 100Mbit I would look at the Always Incremental 
features of Baroes.   This will let you avoid the 180 days of a new full 
backup.  But you still have to write 200TB every so often but it can all be 
done baroes server side.   I recommend tape just for cost, as you need 66 LTO 7 
tapes or 33 LTO8 tapes.   LTO7 i still the best value but LTO8 has come down in 
cost a lot and LTO9 is scheduled for GA this year.  You will also want a few 
tape drives and a fast spool pool of disks to do this right.   This 2x minimum 
size is one downside backup systems have to rsync.

An all disk solution will be faster because a big raid z2  will have greater 
bandwidth or the VirtualFull,  but it will be expensive.  You could look at 
something like 45 Drives to turn into your SD.   I do a mix (again fraction the 
size you are with Baroes) 

I would personally split this into several jobs using wildcards in filesets, 
and not have 1 200TB job,  but several few TByte jobs.  This will also let you 
run jobs in parlalel recover better from a full backup failure,  not have to 
copy 200TB when you do a full etc.



Brock Palen
[email protected]
www.mlds-networks.com
Websites, Linux, Hosting, Joomla, Consulting



> On Apr 15, 2021, at 10:49 AM, Steve Eppert <[email protected]> wrote:
> 
> Hi.
> I need to backup around 200 TB of data (with many small files) with around 1 
> TB per week new/changed data. Currently I simply rsync the data to an offsite 
> location using a 100 MBit/s connection.
> 
> While searching for solutions for making the rsync faster (because of the 
> many small files an rsync almost never uses the full 100 MBit/s) I stumbled 
> across Bareos.
> 
> A question I could not find an answer to in the docs is: how does the 
> bareos-fileseamon check for changed data when doing an incremental backup? 
> Does the daemon hold some kind of database or does it check each file against 
> the Bareos server? I'm wondering if a Bareos incremental backup job might be 
> faster than the rsync.
> 
> Also after looking at the docs I'm considering purchasing a tape loader to 
> backup a specific subset of more valuable data to tape.
> Is it possible to have incremental backups to disk and do a regular full 
> backup of only a subset of this data to tape?
> 
> It it possible to get filesystem access to the incremental backed up data on 
> disk or is the Bareos interface the only way to access this data?
> 
> Thanks!
> Steve
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "bareos-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/bareos-users/ccddf8c0-4fcc-4230-994f-157b9a2d1b06n%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"bareos-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/bareos-users/BF4E78CF-9343-4B68-B516-7005C4118E92%40mlds-networks.com.

Reply via email to