Re: [Bacula-users] Bacula tape format vs. rsync on deduplicated file systems

Robert LeBlanc Fri, 28 May 2010 07:46:09 -0700

On Fri, May 28, 2010 at 12:32 AM, Eric Bollengier <
eric.bolleng...@baculasystems.com> wrote:


> Hello Robert,
> What would be the result if you do Incremental backup instead of full
> backup ?
> Imagine that you have 1% changes by day, it will give something like
> total_size = 30GB + 30GB*0.01 * nb_days
> (instead of 30GB * nb_days)
>
> I'm quite sure it will give a "compression" like 19:1 for 20 backups...
>
> This kind of comparison is the big argument of dedup companies, "do 20 full
> backup, and you will have 20:1 dedup ratio", but do 19 incremental + 1 full
> and this ratio will fall down to 1:1... (It's not exactly true neither
> because
> you can save space with multiple systems having same data)
>

The idea was to in some ways simulate a few things all at once. This kind of
test could show how multiple similar OSes could dedupe (20 Windows OS for
example, you only have to store those bits once for any number of Windows
machines), using Bacula's incrementals, you have to store the bits once per
machine and then again when you do your next full each week or month. It
also was to show how much you could save when doing your fulls each week or
month, a similar effect would happen for the differentials too. It wasn't
meant to be all inclusive, but just to show some trends that I was
interested in. In our environment, since everything is virtual, we don't
save the OS data, and only try to save the minimum that we need, that
doesn't work for everyone though.


> > [image: backup.png]
> >
> > This chart shows that using the sync method, the data's compression grew
> in
> > almost a linear fashion, while the Bacula data stayed close to 1x
> > compression. My suspicion is that since the Bacula tape format inserts
> job
> > information regularly into the stream file and lessfs uses a fixed block
> > size, lessfs is not able to find much unique data in the Bacula
> > stream.
>
> You are right, we have a current project to add a new device format that
> will
> be able to be compatible with dedup layer. I don't know yet how it will
> work
> because I can imagine that each dedup system works differently, and finding
> a
> common denominator won't be easy. A first proof of concept will certainly
> use
> LessFS (It is already in my radar scope). But as you said, depending on
> block
> size, alignment, etc... it's not so easy.
>

I think in some ways, each dedupe file system can work very well with each
file as it's own instead of being in a stream. That way the start of the
file is always on a boundary that the deduplication file system uses. I
think you might be able to use sparse files for a stream and always sparse
up the block alignment, that would make the stream file look really large
compared to what it actually uses on a non deduped file system. I still
think if Bacula lays the data down in the same file structure as on the
client organized by jobID with some small bacula files to hold permissions,
etc I think it would be the most flexible for all dedupe file systems
because it would be individual files like they are expecting.


> > Although Data Domain's variable block size feature allows it much
> > better compression of Bacula data, rsync still achieved an almost 2x
> > greater compression over Bacula.
>
> The compression on disk is better, on the network layer and the remote IO
> disk
> system, this is an other story. BackupPC is smarter on this part (but have
> problems with big set of files).
>

I'm not sure I understand exactly what you mean. I understand that BacupPC
can cause a file system to not mount because it exhausts the number of hard
links the fs can support. Luckly, with deduplication file system, you don't
have this problem, because you just copy the bits and the fs does the work
of finding the duplicates. A dedupe fs can even only store a small part of a
file (if most of the file is duplicate and only a small part is unique)
where BackupPC would have to write that whole file. I don't want Bacula to
adopt what BackupPC is doing, I think it's a step backwards.


> > In conclusion, lessfs is a great file system and can benefit from
> variable
> > block sizes, if it can be added, for both regular data and Bacula data.
> > Bacula could also greatly benefit by providing a format similar to a
> native
> > file system on lessfs and even a good benefit on DataDomain.
>
> Yes, variable block size and dynamic alignment seems the edge of the
> technology, but it's also heavily covered by patents (and those companies
> are
> not very friendly). And I can imagine that it's easy to ask for them, and
> it's
> a little more complex to implement :-)


One of the reasons I mentioned if it could be implemented. If there is
anything I know about OSS, is that there are some amazing people with an
ability to think so outside the box that these things have not been able to
stop the progress of OSS.

Robert LeBlanc
Life Sciences & Undergraduate Education Computer Support
Brigham Young University

------------------------------------------------------------------------------

_______________________________________________
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users

Re: [Bacula-users] Bacula tape format vs. rsync on deduplicated file systems

Reply via email to