Hi all, On Mon, Aug 31, 2009 at 04:32:14PM -0400, Jeffrey J. Kosowsky wrote:
> In a very real sense, the current implementation already uses an > artificial database structure - albeit it a slow, prorprietary, > non-extensible, non-optimizable version. To wit, the attrib files > present in each and every pc directory. The real essence of my > suggestion is to replace the scattered myriad of attrib linear > databases with a single relational database that can benefit from all > the features, speed, tools, and optimizations of modern databases. As > has been mentioned many times in the past, such a move would solve > many, many problems though would obviously require some significant > development work. I suppose this is the most important argument _for_ trying the SQL approach - maybe just for storing file attributes? On the other hand, we're using one kind of atomic file system operation: Hardlink count, that is, file expiration. That would be more difficult using a database (prone to DB<->filesystem inconsistencies). Maybe we should move this discussion to the -devel list? Or somebody should come up with a database scheme, so we could start discussing details - possibly figuring out that the requirements are difficult to meet with a database? I'm just skeptical that it's is possible to store file system layout more efficiently than a file system - and I suppose we'd need to completely represent the directory structure of backups in database. We'd end up with loads of entries pointing to a file.id(int8) which is equivalent to the inode number in filesystem world. File attributes would have to be stored in a separate table since they may be different from host to host while file content is identical (and I'm not sure how to do that efficiently, taking extended attributes like ACL, resource forks etc. into account - you'll either get into JOIN hell or you'll start storing serialized data). Of course, a database might allow lookups like "which backups reference file x". Also, standard databases are not good at querying hierarchical structures. It's more natural for filesystems (but only up to a certain point - traversing is still expensive). These are just my random thoughts. I suppose it's worth spending some time discussing/designing/developing a database layout - we'll learn a lot and a) it looks like it's worth trying to implement it - hey then we'd already have a database layout! b) we get convinced that it's not worth it or it's getting too complicated - hey, then we've tried and get something out of the process to show to people claiming that a database would improve things Tino. PS: Another weird thought just crossed my head: Maybe separating pool data from backups might be worth a try. That is: Only store zero-byte files in the pool (or maybe files with some metadata like MD5 in them) which get hardlinked to backups, then have a second pool which contains real data and no hardlinks (the implicit connection being the pool file name). Creating and changing pool files is a rather central operation (done by BackupPC_dump/_link/_nightly). That way, we could decouple the extensive directory lookups while traversing a backup from the data reading/writing - file pool could be separated from data pool. Without detailed knowledge of the code, I suppose it should be doable as a proof-of-concept hack. Of course, this should be a configurable setting since it only makes sense when there are actually separate physical volumes for metadata and filedata. -- "What we nourish flourishes." - "Was wir nähren erblüht." www.lichtkreis-chemnitz.de www.craniosacralzentrum.de ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List: https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki: http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/