First off... thanks for all the input. I need to research a couple of things before I dive too deep into this project ;)

The rest of my response is inline:


On Sun, 24 Jul 2005, Brian Chrisman wrote:

James Washer wrote:

The big issues with backups IMHO is getting a consistent snapshot. If you start copying a large data file, and it changes after you've copied the beginning, you're screwed.


An exact snapshot will be one of the hardest qualities to achieve. It's not my intension to modify the current rsync algorithm to add this feature (if it isn't already part of the system... I don't know all the details of the rsync algorithm), so I may have to develop a method for freezing a snapshot of a file. Anyone know of any good algorithms?

Other backup schemes I've used involved BCL's (Block changed logging), that allow the admin to only backup up those blocks that have been updated since the last master. This is a GREAT space saver for large data farms with low update rates. I tend to work on larger multi-terrabyte systems, so minimizing backup times is very important.


Rsync uses a method similar to BCL's insofar as it will only sync changed files -- based on modification time and size, or a file checksum. Clearly BCL's are considerably lower in filesystem abstraction "level" (and considerably faster), but I'd prefer to stick to a userland process operating on top of the complete filesystem abstraction for portability reasons (and ease of programming).

It's not my intension to create a complete enterprise backup solution... at least not at this stage. I just need to backup my user's files, databases, etc when they go home at 5 every night. But, simultaneously, I'd like a backup system that is flexible enough to create hourly incremental backups, backup to any media I want at any time I want, and is scriptable. So... prehaps it hits in the "middle ground" -- it's not ntbackup or veritas, but somewhere inbetween.


Are you planning on your backup scheme being able to handle "hot" backups?


Yes... but functionality will be limited. I'm probably going to make use of the Ostrich Algorithm -- stick my head in the sand, and pretend the problem doesn't exist -- when it comes to filesystem locks, databases, and other complex backup issues. At least for the prototype. Again, this really hits the portability aspect of the code IMHO. If I bloat it too much it becomes difficult to modify, and will only work with specific systems.

I guess what I'm trying for is a "core" backup utility that can perform advanced backup operations -- like incremental backups -- very well, but that can be easily supplimented for your specific installation via scripting, modification of code, or addon software.


This makes particular sense in open source software. With proprietary backup solutions, I always leaned against block level incrementals such as provided by veritas, because without their software, there was pretty much no way to recover your data... This meant that things like license management or lame bugs could end up biting you in the butt at the very worst possible time. There have always been file/tar-based products out there, but like you said, not so efficient.


Great point Jim! This is by far my most hated feature of proprietary backup software. It costs $1000 for the software, and I have to have it loaded to get to my files?! Ridiculous! This will not be a feature of my software. By default all files will be stored as they appear in your filesystem. Of course, you'll have the options to compress, encrypt, and do whatever else people do to files these days. Due to this storage method, for example, if /etc and /home are backed up to a USB hard disk you could boot from a live disk and configure it to serve your files until your server is back online.

Thanks again for all of the ideas!  Keep them coming!

- Sebastian

_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug

Reply via email to