I've got an "alpha" version of my script written in bash (I'll be changing
the language soon) that copies any number of local source files to any
number of local destination files, and allows for exceptions. On *nix
systems this works great because it can backup open files.
Unfortunately, open files in Windows cannot be accessed without expensive
software -- I'm backing up our current (soon to be replaced with Linux)
Windows file server over SMB, and all open files are being ignored.
So... I'm going to define "portability" as "it will work on *nix".
As far as snapshots go: I'm thinking of offering limited snapshot
capability in the script so that data files that depend upon each other
can be frozen. Extending these features to the entire filesystem will not
work efficiently, or... well, at all. Full filesystem snapshots can be
taken, as James suggested, with aid from the Kernel and LVM2. So, if a
snapshot is required before backup, the system can be scripted to take the
snapshot, then execute my scripts -- but, again, custom server
configuration will be required to use this feature. I have no info on
performance of the LVM2 snapshot method... but the workstations for the
robotics lab just arrived, so I'll test it.
Any ideas on a fast method for freezing a snapshot of individual files?
- Sebastian
On Mon, 25 Jul 2005, Sebastian Smith wrote:
First off... thanks for all the input. I need to research a couple of things
before I dive too deep into this project ;)
The rest of my response is inline:
On Sun, 24 Jul 2005, Brian Chrisman wrote:
James Washer wrote:
The big issues with backups IMHO is getting a consistent snapshot. If you
start copying a large data file, and it changes after you've copied the
beginning, you're screwed.
An exact snapshot will be one of the hardest qualities to achieve. It's not
my intension to modify the current rsync algorithm to add this feature (if it
isn't already part of the system... I don't know all the details of the rsync
algorithm), so I may have to develop a method for freezing a snapshot of a
file. Anyone know of any good algorithms?
Other backup schemes I've used involved BCL's (Block changed logging),
that allow the admin to only backup up those blocks that have been updated
since the last master. This is a GREAT space saver for large data farms
with low update rates. I tend to work on larger multi-terrabyte systems,
so minimizing backup times is very important.
Rsync uses a method similar to BCL's insofar as it will only sync changed
files -- based on modification time and size, or a file checksum. Clearly
BCL's are considerably lower in filesystem abstraction "level" (and
considerably faster), but I'd prefer to stick to a userland process operating
on top of the complete filesystem abstraction for portability reasons (and
ease of programming).
It's not my intension to create a complete enterprise backup solution... at
least not at this stage. I just need to backup my user's files, databases,
etc when they go home at 5 every night. But, simultaneously, I'd like a
backup system that is flexible enough to create hourly incremental backups,
backup to any media I want at any time I want, and is scriptable. So...
prehaps it hits in the "middle ground" -- it's not ntbackup or veritas, but
somewhere inbetween.
Are you planning on your backup scheme being able to handle "hot" backups?
Yes... but functionality will be limited. I'm probably going to make use of
the Ostrich Algorithm -- stick my head in the sand, and pretend the problem
doesn't exist -- when it comes to filesystem locks, databases, and other
complex backup issues. At least for the prototype. Again, this really hits
the portability aspect of the code IMHO. If I bloat it too much it becomes
difficult to modify, and will only work with specific systems.
I guess what I'm trying for is a "core" backup utility that can perform
advanced backup operations -- like incremental backups -- very well, but that
can be easily supplimented for your specific installation via scripting,
modification of code, or addon software.
This makes particular sense in open source software. With proprietary
backup solutions, I always leaned against block level incrementals such as
provided by veritas, because without their software, there was pretty much
no way to recover your data... This meant that things like license
management or lame bugs could end up biting you in the butt at the very
worst possible time. There have always been file/tar-based products out
there, but like you said, not so efficient.
Great point Jim! This is by far my most hated feature of proprietary backup
software. It costs $1000 for the software, and I have to have it loaded to
get to my files?! Ridiculous! This will not be a feature of my software.
By default all files will be stored as they appear in your filesystem. Of
course, you'll have the options to compress, encrypt, and do whatever else
people do to files these days. Due to this storage method, for example, if
/etc and /home are backed up to a USB hard disk you could boot from a live
disk and configure it to serve your files until your server is back online.
Thanks again for all of the ideas! Keep them coming!
- Sebastian
_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug
_______________________________________________
RLUG mailing list
[email protected]
http://lists.rlug.org/mailman/listinfo/rlug