On Thu, Apr 8, 2010 at 12:39 AM, Kern Sibbald <k...@sibbald.com> wrote: > Hello, > > I haven't seen the original messages, so I am not sure if I understand the > full concept here so my remarks may not be pertinent. > > However, from what I see, this is basically similar to what BackuPC does. The > big problem I have with it is that it does not scale well to thousands of > machines. > > If I were thinking about changing the disk Volume format, I would start by > looking at how git handles storing objects, and whether git can scale to > handle a machine with 40 million file entries. > > One thing is sure is that, unless some new way of implementing hardlinks is > implemented, you will never see Bacula using hard links in the volumes. That > is a sure way to make your machine unbootable if you scale large enough Just > backup enough clients with BackupPC and one day you will find that fsck no > longer works. I suspect that it will require only a couple hundred million > hardlinks before a Linux machine will no longer boot. >
It wasn't my intention that Bacula try to create the hard links like BackupPC, I figured that if someone wanted to do that, they could run a script outside of Bacula. I'm thinking of the ability to offload the data compression to the file system in general, or alternatively have Bacula compress it. The reason being is that with Bacula's current tape format, dedup technologies can not dedup it very well. From what I can tell of the tape format, every 64K of duplicate data had a unique header rendering it unique and therefore not a candidate for dedup. I had two ideas for trying to overcome this problem, one was to have a slightly modified Bacula tape format for disks that would move the unique header information to the front or the back of the job stream, and the format would create a sparse file with job files starting at a user defined blocksize. I then thought about storing tier 3 data on the same dedup device or file system and that if done a certain way we could get 'free' backups. If Bacula backed-up to the same device with a hierarchical file system approach, then the original files and the files that Bacula backed up would look the same. Plus it would be easy to recover in the case of total failures of Bacula-dir and sd (I'm thinking disaster recovery). I've been running Bacula backups on a dedup box for almost a year and can't get better than 4x when I believe the data that we have should be about 10x. With dedup becoming more popular, I'm just trying to make Bacula even more appealing for those who want to dedup. If people are using straight disk, then compression could be enabled by bacula and the format might be a little different (like tar bz2 archives), but most newer file systems are starting to support on the fly compression, so I don't know how critical it is. These are all ideas to get some discussion about how, if a file aware SD is implemented, what may be good to offer maximum flexibility and be able to leverage features that are being implemented in current and future file systems. Thanks, Robert LeBlanc Life Sciences & Undergraduate Education Computer Support Brigham Young University ------------------------------------------------------------------------------ Download Intel® Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users