I have seen some discussion on other boards about using a database to
provide the de-duplication like backuppc does.  This is a love it or hate it
idea.  Though it sounds like a pretty good idea you should consider this.
Database backed email servers are typically outperformed by their file based
counterparts by significant margins.  If you want a real high performance
email system you put user and alias data in the database because it is good
at storing that type of data and the emails themselves in files.

I do think that there would be a performance advantage to splitting out some
of the file info into a database such as the hash, date, size, mod time etc
and put just the raw file right on the filesystem.  The database will
outperform the filesystem for accessing those types of data and the
filesystem will perform even better because I/O will be reduced.  The work
to implement this would be a lot of work and would only offer small gains.
It would be much more profitable to put this effort into improving
filesystems or implementing some sore of delayed write system.






On Tue, Jun 2, 2009 at 7:15 PM, Holger Parplies <wb...@parplies.de> wrote:

> Hi,
>
> Jeffrey J. Kosowsky wrote on 2009-06-02 14:26:44 -0400 [Re:
> [BackupPC-users] why hard links?]:
> > Les Mikesell wrote at about 12:32:14 -0500 on Tuesday, June 2, 2009:
> >  > Jeffrey J. Kosowsky wrote:
> >  > > [...]
> >  > >  > If you have to add an extra system call to lock/unlock around
> some
> >  > >  > other operation you'll triple the overhead.
> >  > >
> >  > > I'm not sure how you definitively get to the number "triple". Maybe
> >  > > more maybe less.
>
> I agree. It's probably more.
>
> >  > Ummm, link(), vs. lock(),link(),unlock() equivalents, looks like 3x
> the
> >  > operations to me - and at least the lock/unlock parts have to involve
> >  > system calls even if you convert the link operation to something else.
> >
> > 3x operations != 3x worse performance
> > Given that disk seeks times and input bandwidth are typical
> > bottlenecks, I'm not particularly worried about the added
> > computational bandwidth of lock/unlock.
>
> Since you can't lock() the file you are about to create (can you?), you'll
> probably need a different file - either one big global lock file or one on
> the
> directory level. I'm not familiar with the kernel code, but I wouldn't be
> surprised if that got you the disk seeks you are worried about.
>
> >  > > Les - I'm really not sure why you seem so intent on picking apart a
> >  > > database approach.
> >  >
> >  > I'm not. I'm encouraging you to show that something more than black
> >  > magic is involved. [...]
> >
> > I never claimed performance. My claims have been around flexibility,
> > extendability, and transportability.
>
> And I'm worried about complexity and robustness:
> 1. Complexity
>   What additional skills do you need to set up the BackupPC version you are
>   imagining and keep it running?
> 2. Complexity
>   Who is going to write and, more importantly, debug the code? How do you
> test
>   all the new cases that can go wrong? How do people feel about entrusting
>   vital data to a system they no longer have a basic understanding of?
> 3. Complexity
>   When everything goes wrong, what can you still do with the data?
> Currently,
>   you can locate a file in the file system (file mangling is not that
>   complicated) or even with an FS debugging tool in an image of an
>   unmountable FS and BackupPC_zcat it to get the contents. Attributes are
> lost
>   that way, but for regaining the contents of a few crucial files, this can
>   work quite well. It could be made to even restore the attributes with
> only
>   slightly more requirements (intact attribs file). With a database, can
> you
>   do anything at all without a completely running BackupPC system? What are
>   the exact requirements? Database file? Database engine? Accessible pool
>   file system?
> 4. Robustness, points of failure
>   How do you handle losing single files, on-disk corruption of a few files?
>   Losing/corrupting many files? Your database?
>
> > I think all (or nearly all) of my 7 claimed advantages are
> > self-evident.
>
> Yes, mostly, though they were claimed in a different thread. I hope
> everyone
> has multiple MUAs open ...
>
> 1. I don't see how "platform and filesystem independence" fits together
> with
>   the use of a database, though. You are currently dependent on a POSIX
> file
>   system. How is depending on one of a set of databases any better?
>
> 4. How does backing up the database and *a portion of the pool* work? Sure,
>   you can make anything fault-tolerant, but are missing files faults of
> which
>   you *want* to be tolerant?
>   But yes, backing up the complete pool would be easier, though it's your
>   responsibility to get it right (i.e. consistent), and there's probably no
>   sane way to check.
>
> 5.1. Why is file name mangling a kludge, and in what way is storing file
> names
>     in a database better?
>
> 5.2. What is non-standard about defining a file format any way you like?
> It's
>     not like compressed pool files would otherwise adhere to a particular
>     known file format. But yes, treating compressed and uncompressed files
>     alike would be nice.
>
> 5.3. I'm not really sure encrypting files *on the server* does much, unless
>     you are thinking of a remote storage pool. In particular, you need to
> be
>     able to decrypt files not only for restoration, but also for pooling
>     (unless you want an intermediate copy and an extra comparison).
>
> 5.5. Configuration stored in the database? Is that supposed to be an
>     advantage?
>
> 6. If you mean access controlled by the database (different database
> users),
>   I don't really see why you are worried about access to the *meta data*
> when
>   the actual contents remain readable (you're not saying that it being such
> a
>   huge amount of data is a security feature, are you?).
>   If you mean that a database will make it easier to implement file level
>   access control, I honestly don't see how.
>
> 7. How that? If you are less concerned about how much space you use, you
> can
>   store things in a way that they can be accessed faster. But I still think
>   you are mistaken in that multiple attrib files would need to be read.
> I've
>   had to read so much discussion on this today that I won't check the code
>   now, but I'd reason that for attrib file pooling to make any sense, the
>   default would be an identical attrib file (compared to the reference
>   backup) if no files in the directory were changed.
>   Or, differently, if BackupPC *would* need to scan multiple attrib files,
>   your delete-file-from-backups script would only ever need to modify one
>   attrib file for any file it deletes, right? ;-)
>
> > Plus, I don't want my backup system to be
> > filesystem dependent because I might have other reasons for picking
> > other filesystems or my OS of the future (or of today) might not even
> > support the filesystem features required.
>
> The same arguments hold against incorporating a database.
>
> > I think good system design calls for abstracting the backup software from
> > the underlying filesystem.
>
> Well, the only thing you are abstracting from are hardlinks, which are
> POSIX
> standard. I wouldn't be surprised if there were other POSIX dependencies.
> BackupPC currently makes no other assumptions about the file system, does
> it?
> Well, file size maybe - you need a file system capable of storing large
> enough
> files. And long enough paths. I look forward to the introduction of
> $Conf{PathSeparator} ...
>
> Regards,
> Holger
>
>
> ------------------------------------------------------------------------------
> OpenSolaris 2009.06 is a cutting edge operating system for enterprises
> looking to deploy the next generation of Solaris that includes the latest
> innovations from Sun and the OpenSource community. Download a copy and
> enjoy capabilities such as Networking, Storage and Virtualization.
> Go to: http://p.sf.net/sfu/opensolaris-get
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users@lists.sourceforge.net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/
>
------------------------------------------------------------------------------
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
_______________________________________________
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

Reply via email to