First off... thanks for all the input. I need to research a couple of
things before I dive too deep into this project ;)
The rest of my response is inline:
On Sun, 24 Jul 2005, Brian Chrisman wrote:
James Washer wrote:
The big issues with backups IMHO is getting a consistent snapshot. If you
start copying a large data file, and it changes after you've copied the
beginning, you're screwed.
An exact snapshot will be one of the hardest qualities to achieve. It's
not my intension to modify the current rsync algorithm to add this feature
(if it isn't already part of the system... I don't know all the details of
the rsync algorithm), so I may have to develop a method for freezing a
snapshot of a file. Anyone know of any good algorithms?
Other backup schemes I've used involved BCL's (Block changed logging), that
allow the admin to only backup up those blocks that have been updated since
the last master. This is a GREAT space saver for large data farms with low
update rates. I tend to work on larger multi-terrabyte systems, so
minimizing backup times is very important.
Rsync uses a method similar to BCL's insofar as it will only sync changed
files -- based on modification time and size, or a file checksum.
Clearly BCL's are considerably lower in filesystem abstraction "level"
(and considerably faster), but I'd prefer to stick to a userland process
operating on top of the complete filesystem abstraction for portability
reasons (and ease of programming).
It's not my intension to create a complete enterprise backup solution...
at least not at this stage. I just need to backup my user's files,
databases, etc when they go home at 5 every night. But, simultaneously,
I'd like a backup system that is flexible enough to create hourly
incremental backups, backup to any media I want at any time I want, and is
scriptable. So... prehaps it hits in the "middle ground" -- it's not
ntbackup or veritas, but somewhere inbetween.
Are you planning on your backup scheme being able to handle "hot" backups?
Yes... but functionality will be limited. I'm probably going to make use
of the Ostrich Algorithm -- stick my head in the sand, and pretend the
problem doesn't exist -- when it comes to filesystem locks, databases, and
other complex backup issues. At least for the prototype. Again, this
really hits the portability aspect of the code IMHO. If I bloat it too
much it becomes difficult to modify, and will only work with specific
systems.
I guess what I'm trying for is a "core" backup utility that can perform
advanced backup operations -- like incremental backups -- very well, but
that can be easily supplimented for your specific installation via
scripting, modification of code, or addon software.
This makes particular sense in open source software. With proprietary backup
solutions, I always leaned against block level incrementals such as provided
by veritas, because without their software, there was pretty much no way to
recover your data... This meant that things like license management or lame
bugs could end up biting you in the butt at the very worst possible time.
There have always been file/tar-based products out there, but like you said,
not so efficient.
Great point Jim! This is by far my most hated feature of proprietary
backup software. It costs $1000 for the software, and I have to have it
loaded to get to my files?! Ridiculous! This will not be a feature of my
software. By default all files will be stored as they appear in your
filesystem. Of course, you'll have the options to compress, encrypt, and
do whatever else people do to files these days. Due to this storage
method, for example, if /etc and /home are backed up to a USB hard disk
you could boot from a live disk and configure it to serve your files until
your server is back online.
Thanks again for all of the ideas! Keep them coming!
- Sebastian
_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug