On Fri, May 28, 2010 at 12:32 AM, Eric Bollengier < eric.bolleng...@baculasystems.com> wrote:
> Hello Robert, > What would be the result if you do Incremental backup instead of full > backup ? > Imagine that you have 1% changes by day, it will give something like > total_size = 30GB + 30GB*0.01 * nb_days > (instead of 30GB * nb_days) > > I'm quite sure it will give a "compression" like 19:1 for 20 backups... > > This kind of comparison is the big argument of dedup companies, "do 20 full > backup, and you will have 20:1 dedup ratio", but do 19 incremental + 1 full > and this ratio will fall down to 1:1... (It's not exactly true neither > because > you can save space with multiple systems having same data) > The idea was to in some ways simulate a few things all at once. This kind of test could show how multiple similar OSes could dedupe (20 Windows OS for example, you only have to store those bits once for any number of Windows machines), using Bacula's incrementals, you have to store the bits once per machine and then again when you do your next full each week or month. It also was to show how much you could save when doing your fulls each week or month, a similar effect would happen for the differentials too. It wasn't meant to be all inclusive, but just to show some trends that I was interested in. In our environment, since everything is virtual, we don't save the OS data, and only try to save the minimum that we need, that doesn't work for everyone though. > > [image: backup.png] > > > > This chart shows that using the sync method, the data's compression grew > in > > almost a linear fashion, while the Bacula data stayed close to 1x > > compression. My suspicion is that since the Bacula tape format inserts > job > > information regularly into the stream file and lessfs uses a fixed block > > size, lessfs is not able to find much unique data in the Bacula > > stream. > > You are right, we have a current project to add a new device format that > will > be able to be compatible with dedup layer. I don't know yet how it will > work > because I can imagine that each dedup system works differently, and finding > a > common denominator won't be easy. A first proof of concept will certainly > use > LessFS (It is already in my radar scope). But as you said, depending on > block > size, alignment, etc... it's not so easy. > I think in some ways, each dedupe file system can work very well with each file as it's own instead of being in a stream. That way the start of the file is always on a boundary that the deduplication file system uses. I think you might be able to use sparse files for a stream and always sparse up the block alignment, that would make the stream file look really large compared to what it actually uses on a non deduped file system. I still think if Bacula lays the data down in the same file structure as on the client organized by jobID with some small bacula files to hold permissions, etc I think it would be the most flexible for all dedupe file systems because it would be individual files like they are expecting. > > Although Data Domain's variable block size feature allows it much > > better compression of Bacula data, rsync still achieved an almost 2x > > greater compression over Bacula. > > The compression on disk is better, on the network layer and the remote IO > disk > system, this is an other story. BackupPC is smarter on this part (but have > problems with big set of files). > I'm not sure I understand exactly what you mean. I understand that BacupPC can cause a file system to not mount because it exhausts the number of hard links the fs can support. Luckly, with deduplication file system, you don't have this problem, because you just copy the bits and the fs does the work of finding the duplicates. A dedupe fs can even only store a small part of a file (if most of the file is duplicate and only a small part is unique) where BackupPC would have to write that whole file. I don't want Bacula to adopt what BackupPC is doing, I think it's a step backwards. > > In conclusion, lessfs is a great file system and can benefit from > variable > > block sizes, if it can be added, for both regular data and Bacula data. > > Bacula could also greatly benefit by providing a format similar to a > native > > file system on lessfs and even a good benefit on DataDomain. > > Yes, variable block size and dynamic alignment seems the edge of the > technology, but it's also heavily covered by patents (and those companies > are > not very friendly). And I can imagine that it's easy to ask for them, and > it's > a little more complex to implement :-) One of the reasons I mentioned if it could be implemented. If there is anything I know about OSS, is that there are some amazing people with an ability to think so outside the box that these things have not been able to stop the progress of OSS. Robert LeBlanc Life Sciences & Undergraduate Education Computer Support Brigham Young University
------------------------------------------------------------------------------
_______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users