Jeffrey J. Kosowsky wrote at about 08:57:54 -0400 on Tuesday, June 2, 2009: > Peter Walter wrote at about 06:27:35 -0400 on Tuesday, June 2, 2009: > > I have read with interest various threads on this list concerning > > methods of how to back up a backuppc server to a remote file system over > > the internet. My impression from reading the threads is that there is no > > *good* way - that rsync is a poor choice if you have many hardlinks, and > > methods like copying a "snapshot" of a block-level device are > > inefficient if only a relatively small proportion of the data changes. I > > have tried both methods, and am not satisfied with the performance and > > efficiency of either. In addition, BackupPC is not compatible with > > 'cloud' storage systems - at least the ones I have looked at do not seem > > to support hardlinks. > > > > As a Linux newbie, I have only a partial understanding of the technology > > underlying Linux and BackupPC, but I get the impression that the problem > > with a rsync-like solution is that processing hardlinks is very > > expensive in terms of cpu time and memory resources. This may be a > > stupid question, but, if hardlinks are the problem, has any thought been > > given to adding to BackupPC an option to use some form of database > > (text, SQL or otherwise) to associate hashes to files, instead? It seems > > to me that using hardlinks is in fact using that feature of the file > > system *as* a database, a use that does not appear to be optimal ... if > > I have misunderstood, please educate me :-) > > > > Peter > > > > Indeed this has been discussed many times before ;) -- see the archives. > > That being said, I agree that using a database to store both the > hardlinks along with the metadata stored in the attrib files would be > a more elegant, extensible, and platform-independent solution though > presumably it would require a major re-write of BackupPC. > > I certainly understand why BackupPC uses hardlinks since it allows for > an easy way to do the pooling and in a sense as you suggest uses the > filesystem as a rudimentary database. > > On the other hand as I and others have mentioned before moving to a > database would add the following advantages: > > 1. Platform and filesystem independence -- BackupPC would no longer > depend on the specific hard link behaviors of linux and associated > filesytems. > > 2. It would be easier to extend the attrib notion to store extended > attributes whether for Linux (e.g., selinux attributes), Windows > (e.g., ACL attributes) or any other OS. > > 3. The pool could be split among multiple disks and filesystems since > it would no longer depend on hard-link behavior > > 4. Backing up BackupPC backups would be much easier and faster since > you no longer would have hard links to worry about -- just backup > the database and any portion of the pool that you want to. > > 5. The whole system would be more elegant and extensible since all > types of metadata could be stored in the database rather than being > stored in various files in the BackupPC tree. For example, > - You wouldn't need the kludge of file mangling > - Checksums could be stored in the database rather than being > appended in a non-standard way to the end of the file > - File level encryption could easily be added > - Alternative file-level compression schemes could easily be > supported. > - The host-specific config data (and maybe even all the config > data) could be stored in tables rather than in individual > config files > - The 'backups' file could also be stored as a table > > 6. Presumably a database architecture would also make it easier to > have more granular control over user access and permissions at the > feature and file level. > > The challenge though is that to do this right (i.e. in a way that is > both elegant and extensible) would require a substantial if not almost > complete re-write of BackupPC and I'm not sure that Craig (or anybody > else for that matter) are willing to sign up for that... > > Still, it would be awesome to combine the simplicity and pooling > structure of BackupPC with the flexibility of a database > architecture... >
One more advantage of a database architecture: 7. Reconstructing incremental backups would be simpler and faster since the database could point directly to the file rather than having to crawl through a tree of attrib files to reconstruct the hierarchy of which files have changed or not. ------------------------------------------------------------------------------ OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/