Re: [Hampshire] Open source network backup with de-dupe.
Adrian Bridget wrote: > I've come to the conclusion that there aren't any decent open source > backup products. Yes, I do actually have it on my todo list to write > one :-) :-) It's hard, I've not been able to find a decent contract for my company so have had to cease development of DIASER, shame. We reached beta-2 but have had to create a cut-off point. So the whole kaboodle may well be up for sale in a few weeks time as Interlinux Ltd has brought the project IP and finances to safety by entering a dormant state - this is a gradual process. I don't want the community to use something that can't be or isn't well maintained or sustained. -- Damian -- http://interlinux.co.uk DIASER RoadMap http://bit.ly/1Vtdp5 -- This message has been scanned by MailScanner. -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Thu, Jul 15, 2010 at 21:11:25 +0100 (+0100), Keith Edmunds wrote: > However, Chris is right: you cannot *know* that two files are the same > unless you compare them, byte by byte. If hashes are good enough for you, > just backup the hashes and save lots of time and diskspace! My understanding on this point is that in fact a hash _is_ good enough - or rather the odds of a hash not being good enough are sufficiently low (cf corruption on hard disks etc) that it's irrevevant. For instance see: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=122945 Adrian -- bitcube.co.uk - Expert Linux infrastructure consultancy Puppet, Debian, Red Hat, Ubuntu, CentOS -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
> Being able to take snapshots every few minutes and sync them to a > remote datacentre really is rather nice Yep. I've done that with LVM snapshots and rsync. Very handy :-) Vic. -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Thu, Jul 15, 2010 at 09:15:23 +0100 (+0100), James Courtier-Dutton wrote: > > I've come to the conclusion that there aren't any decent open source > > backup products. Yes, I do actually have it on my todo list to write [snip] > So, in summary, it is not good enough to replace the system currently > at my customer's that cost over £10 ! There's a good reason they can charge 100K :-) VSS and other snapshotting technologies (particularly those built into decent storage arrays) are the way forward if you can afford it. Being able to take snapshots every few minutes and sync them to a remote datacentre really is rather nice :D See also "continuous data protection". TBH my personal attitude is that snapshots are great for box restores, file level are good for digging out a single file. Good sysadmin practice should almost remove the need to ever use backups in enterprise environemnts. Testing on pre-production environments then rolling out onto production boxes (or flip-flopping environments where you clone A to B, upgrade B, then flip the service over to B) works a treat. Backups then become emergency only "oh crap, we've been hacked" and/or "database is corrupted" and it's an issue of how much loss of data you can afford time wise. Adrian -- bitcube.co.uk - Expert Linux infrastructure consultancy Puppet, Debian, Red Hat, Ubuntu, CentOS -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Thu, 15 Jul 2010 20:56:01 +0100, james.dut...@gmail.com said: > Say you change one byte in a large file. rsync will not send the > entire file again, it will only send the changes. That's what BackupPC does. However, Chris is right: you cannot *know* that two files are the same unless you compare them, byte by byte. If hashes are good enough for you, just backup the hashes and save lots of time and diskspace! -- Keith Edmunds +-+ |Tiger Computing Ltd| Helping businesses make the most of Linux | | "The Linux Specialists" | http://www.tiger-computing.co.uk | +-+ -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On 15 July 2010 20:44, Chris Dennis wrote: > On 15/07/10 15:39, James Courtier-Dutton wrote: >> >> Take 1 central site PC called "A" >> Take two remote sites PC called "B" and "C". >> >> B has already sent a full backup to A. >> C wishes to send a full backup to A, but lots of the data on C is the same >> as B. >> C generates HASHs of its files, and only sends the HASHs to A. >> A responses to C saying which HASHs it has not already got from B. >> C then only sends a subset of the data, I.e. data that was not already >> sent from B. >> >> Thus, as lot of WAN bandwidth is saved. > > The problem is that hash collisions can occur. Two files with the same hash > are /probably/ the same file, but probably isn't good enough -- a backup > system has to 100% sure. And the only way to be certain is to get both > files and compare them byte by byte. > There are algorithms that detect collisions without having to send the entire file. For example, rsync uses them. Say you change one byte in a large file. rsync will not send the entire file again, it will only send the changes. If I followed your statement, I would have to stop using rsync. Kind Regards James -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On 15/07/10 15:39, James Courtier-Dutton wrote: Take 1 central site PC called "A" Take two remote sites PC called "B" and "C". B has already sent a full backup to A. C wishes to send a full backup to A, but lots of the data on C is the same as B. C generates HASHs of its files, and only sends the HASHs to A. A responses to C saying which HASHs it has not already got from B. C then only sends a subset of the data, I.e. data that was not already sent from B. Thus, as lot of WAN bandwidth is saved. The problem is that hash collisions can occur. Two files with the same hash are /probably/ the same file, but probably isn't good enough -- a backup system has to 100% sure. And the only way to be certain is to get both files and compare them byte by byte. BackupPC uses hashes for file names, but also checks for hash collisions and deals with them when they happen. There is also the possibility of doing this on a site bases. So, one machine at the site de-dupes all the data for that site, and then just sends the de-duped data over the WAN link. That could work, but needs more software at the client end. Does rsync do anything like that? cheers Chris -- Chris Dennis cgden...@btinternet.com Fordingbridge, Hampshire, UK -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On 15 July 2010 15:14, Keith Edmunds wrote: > Hi James > > You're being unrealistic. > >> The documentation gives no explanation of what WAN bandwidth it will use. > > How can it? It depends on how much data you backup; more accurately, it > depends on how much data has changed since the last backup. > >> It reads as if it gets all the data into a central location, and then >> de-dupes it. > > It does. > >> This is not good for WAN bandwidth at all. If the same file is on two >> computers, I only want one computer to send the file once. > > Explain how the server can ascertain that the data is the same on both > clients without getting a full copy of the data. Note: not ascertain that > it may be the same, but that it IS the same. > Take 1 central site PC called "A" Take two remote sites PC called "B" and "C". B has already sent a full backup to A. C wishes to send a full backup to A, but lots of the data on C is the same as B. C generates HASHs of its files, and only sends the HASHs to A. A responses to C saying which HASHs it has not already got from B. C then only sends a subset of the data, I.e. data that was not already sent from B. Thus, as lot of WAN bandwidth is saved. There is also the possibility of doing this on a site bases. So, one machine at the site de-dupes all the data for that site, and then just sends the de-duped data over the WAN link. Another view of this is can be: When sending files from C to A, A compares the hashes sent by C with its entire file store, and not just the single file that C is sending. Kind Regards James -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
Hi James You're being unrealistic. > The documentation gives no explanation of what WAN bandwidth it will use. How can it? It depends on how much data you backup; more accurately, it depends on how much data has changed since the last backup. > It reads as if it gets all the data into a central location, and then > de-dupes it. It does. > This is not good for WAN bandwidth at all. If the same file is on two > computers, I only want one computer to send the file once. Explain how the server can ascertain that the data is the same on both clients without getting a full copy of the data. Note: not ascertain that it may be the same, but that it IS the same. > Also, it sounds very much like a Linux only solution In your original post you made no mention of the requirement that it would back up Windows systems, and given that this is a Linux ML, it's not unreasonable to discuss Linux solutions. However, BackupPC does backup Windows systems, although it doesn't use VSS. > So, in summary, it is not good enough to replace the system currently > at my customer's that cost over £10 ! > That product uses RPC to set off a VSS on the remote windows machine > so that backups work better. > It also handles WAN bandwidth better. So keep using it then. Those who want to use Windows servers usually understand that such a decision will be costly; those who don't understand that usually find it out quite quickly. -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On 14 July 2010 19:56, Adrian Bridgett wrote: > On Wed, Jul 14, 2010 at 19:45:00 +0100 (+0100), Keith Edmunds wrote: >> On Wed, 14 Jul 2010 12:25:10 +0100, james.dut...@gmail.com said: >> >> > Does anyone know of any open source backup programs that do de-dupe >> > for the express purposes of reducing traffic over the WAN. >> >> BackupPC. Recommended. > > Snap :-) > > + dedupes between backups and across boxes > + nice gui > - file layout is sadly not rsyncable from the raw FS > > I've come to the conclusion that there aren't any decent open source > backup products. Yes, I do actually have it on my todo list to write > one :-) > > PS: hantslug.org.uk is backed up using backuppc > The documentation gives no explanation of what WAN bandwidth it will use. It reads as if it gets all the data into a central location, and then de-dupes it. This is not good for WAN bandwidth at all. If the same file is on two computers, I only want one computer to send the file once. Also, it sounds very much like a Linux only solution, because its explanation of how it does windows backup does not use VSS and therefore will have problems with locked files. So, in summary, it is not good enough to replace the system currently at my customer's that cost over £10 ! That product uses RPC to set off a VSS on the remote windows machine so that backups work better. It also handles WAN bandwidth better. Kind Regards James -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Wed, Jul 14, 2010 at 19:45:00 +0100 (+0100), Keith Edmunds wrote: > On Wed, 14 Jul 2010 12:25:10 +0100, james.dut...@gmail.com said: > > > Does anyone know of any open source backup programs that do de-dupe > > for the express purposes of reducing traffic over the WAN. > > BackupPC. Recommended. Snap :-) + dedupes between backups and across boxes + nice gui - file layout is sadly not rsyncable from the raw FS I've come to the conclusion that there aren't any decent open source backup products. Yes, I do actually have it on my todo list to write one :-) PS: hantslug.org.uk is backed up using backuppc Adrian -- bitcube.co.uk - Expert Linux infrastructure consultancy Puppet, Debian, Red Hat, Ubuntu, CentOS -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Wed, 14 Jul 2010 12:25:10 +0100, james.dut...@gmail.com said: > If two sites have the same data, and one site has already sent the > initial backup seeding to the central site, the second site should not > need to also send the same data. Further to my earlier reply: yes, it does need to send the data (otherwise the server won't know it's the same). However, BackupPC only *stores* one copy of the data. -- Keith Edmunds +-+ |Tiger Computing Ltd| Helping businesses make the most of Linux | | "The Linux Specialists" | http://www.tiger-computing.co.uk | +-+ -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --
Re: [Hampshire] Open source network backup with de-dupe.
On Wed, 14 Jul 2010 12:25:10 +0100, james.dut...@gmail.com said: > Does anyone know of any open source backup programs that do de-dupe > for the express purposes of reducing traffic over the WAN. BackupPC. Recommended. -- Keith Edmunds +-+ |Tiger Computing Ltd| Helping businesses make the most of Linux | | "The Linux Specialists" | http://www.tiger-computing.co.uk | +-+ -- Please post to: Hampshire@mailman.lug.org.uk Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire LUG URL: http://www.hantslug.org.uk --