Re: Backup software

2010-12-17 Thread Chris O'Connell
Hey Mark,

IMO I think there are a few important features:
1.  The backups must be mountable, allowing for file browsing and single
file restoration.
2.  The backup should NOT be file based, it should be image based.
3.  Encrypted backups.  I want the backups to be encrypted and I want the
encryption to be self contained in the backup.  This means that you can take
the backup to ANY computer with the backup software and open the backup file
by entering a password.

Acronis has all of these features, but it's expensive and I don't believe
the software will run on these linux nas devices you've specified.

--Chris

On Fri, Dec 17, 2010 at 8:10 AM, Mark Woodward  wrote:

> While I've got some free time on my hands, I decided to start work on a
> project. At its core, it is very much like a standard backup system.
> What makes it different from a regular backup is what you do with the
> data retrieved after the backup. I know it is a long shot or even a
> fools errand to start anything so pedantic and well traveled, but there
> is a specific need that I believe has been identified, but requires
> "backup" done in a specific manner.  Anyway, who knows? I'm already
> testing and using some of the core pieces and I have to say, I like it.
>
> My target OS are Windows, MacOS, and Linux. It will run on desktops,
> servers, and even some of these little NAS boxes that run Linux.
>
> My question for you guys is what do you *want* in a backup. We've all
> used these feature laden things that are out there, 99% of which is
> pointless.  What are "must haves?" What is something you've wanted but
> can't find? What are features that are most pointless and why?
> ___
> Discuss mailing list
> Discuss@blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Matt Shields
On Fri, Dec 17, 2010 at 8:22 AM, Chris O'Connell wrote:

> Hey Mark,
>
> IMO I think there are a few important features:
> 1.  The backups must be mountable, allowing for file browsing and single
> file restoration.
> 2.  The backup should NOT be file based, it should be image based.
> 3.  Encrypted backups.  I want the backups to be encrypted and I want the
> encryption to be self contained in the backup.  This means that you can
> take
> the backup to ANY computer with the backup software and open the backup
> file
> by entering a password.
>
> Acronis has all of these features, but it's expensive and I don't believe
> the software will run on these linux nas devices you've specified.
>
> --Chris
>
> On Fri, Dec 17, 2010 at 8:10 AM, Mark Woodward 
> wrote:
>
> > While I've got some free time on my hands, I decided to start work on a
> > project. At its core, it is very much like a standard backup system.
> > What makes it different from a regular backup is what you do with the
> > data retrieved after the backup. I know it is a long shot or even a
> > fools errand to start anything so pedantic and well traveled, but there
> > is a specific need that I believe has been identified, but requires
> > "backup" done in a specific manner.  Anyway, who knows? I'm already
> > testing and using some of the core pieces and I have to say, I like it.
> >
> > My target OS are Windows, MacOS, and Linux. It will run on desktops,
> > servers, and even some of these little NAS boxes that run Linux.
> >
> > My question for you guys is what do you *want* in a backup. We've all
> > used these feature laden things that are out there, 99% of which is
> > pointless.  What are "must haves?" What is something you've wanted but
> > can't find? What are features that are most pointless and why?
> > ___
> > Discuss mailing list
> > Discuss@blu.org
> > http://lists.blu.org/mailman/listinfo/discuss
> >
> ___
> Discuss mailing list
> Discuss@blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>

The ability for it to work not only through it's native communication method
(usually custom port/protocol), but also be able to run over an SSH tunnel.
 Sometimes you don't want to open up that port to the outside world even if
it is encrypted, so being able to tell it to start an SSH tunnel.  Have that
be part of the configuration where you can specify the username/password or
keypair to let it establish it's own SSH tunnel.  Also, be able to specify a
bandwidth limit on transfers.

Ability to do full, differentials and incremental backups.

Like Chris mentioned, the ability to easily browse the backups and pull out
specific files/directories.

Since it's not a good practice to touch the raw files for MySQL (probably
not for PostgreSQL either), be able to exclude path's.  But also be able to
specify there is a database and do a db backup.

-matt
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


RE: Backup software

2010-12-17 Thread Edward Ned Harvey
> From: discuss-boun...@blu.org [mailto:discuss-boun...@blu.org] On Behalf
> Of Mark Woodward
> 
> My question for you guys is what do you *want* in a backup. We've all
> used these feature laden things that are out there, 99% of which is
> pointless.  What are "must haves?" What is something you've wanted but
> can't find? What are features that are most pointless and why?

There are different requirements for laptops and servers.

Laptops:
Run frequently (minimum once daily), silently, in the background, low enough
priority that users don't generally notice or care.  Does not need to scan
the entire filesystem to see which files have changed.  It may seem obvious
now, but must run while the OS is running.  Must be able to exclude files
and directories (even filetypes, etc)...  Backup to remote server.  The
remote server must have sufficient security as to prevent Jane from reading
Tarzan's backups.  And ability to do baremetal restore.  Ability to browse
the backup to retrieve a specific file or subset of files.  Compression goes
without saying.  Simple for users to restore their own stuff without IT
help.

Bonus features, not requirements:  Able to run over WAN.  Able to
efficiently handle sparse files, such as guest VM's efficiently.
Centralized management, so administrators can quickly see how recently
somebody's backup was successful... And even threshold alerts if somebody's
backup hasn't run in a week or something like that...

Servers:
Actually, most of the above applies for servers too.  The main difference is
assigning priorities to the various aspects listed above.  Like, for
example, I consider it an absolute requirement that servers are able to
backup the whole filesystem without scanning the whole filesystem for
changes, but on a laptop, that might be acceptable if only it's able to
complete fast enough.  But on a server, if you don't have instant snapshots
etc, there is just absolutely no way possible to finish any significantly
sized backups in a reasonable amount of time.

Beyond this, I think I'm rambling.

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Chris O'Connell
Agreed, bare metal restore is a must.

On Fri, Dec 17, 2010 at 9:46 AM, Edward Ned Harvey wrote:

> > From: discuss-boun...@blu.org [mailto:discuss-boun...@blu.org] On Behalf
> > Of Mark Woodward
> >
> > My question for you guys is what do you *want* in a backup. We've all
> > used these feature laden things that are out there, 99% of which is
> > pointless.  What are "must haves?" What is something you've wanted but
> > can't find? What are features that are most pointless and why?
>
> There are different requirements for laptops and servers.
>
> Laptops:
> Run frequently (minimum once daily), silently, in the background, low
> enough
> priority that users don't generally notice or care.  Does not need to scan
> the entire filesystem to see which files have changed.  It may seem obvious
> now, but must run while the OS is running.  Must be able to exclude files
> and directories (even filetypes, etc)...  Backup to remote server.  The
> remote server must have sufficient security as to prevent Jane from reading
> Tarzan's backups.  And ability to do baremetal restore.  Ability to browse
> the backup to retrieve a specific file or subset of files.  Compression
> goes
> without saying.  Simple for users to restore their own stuff without IT
> help.
>
> Bonus features, not requirements:  Able to run over WAN.  Able to
> efficiently handle sparse files, such as guest VM's efficiently.
> Centralized management, so administrators can quickly see how recently
> somebody's backup was successful... And even threshold alerts if somebody's
> backup hasn't run in a week or something like that...
>
> Servers:
> Actually, most of the above applies for servers too.  The main difference
> is
> assigning priorities to the various aspects listed above.  Like, for
> example, I consider it an absolute requirement that servers are able to
> backup the whole filesystem without scanning the whole filesystem for
> changes, but on a laptop, that might be acceptable if only it's able to
> complete fast enough.  But on a server, if you don't have instant snapshots
> etc, there is just absolutely no way possible to finish any significantly
> sized backups in a reasonable amount of time.
>
> Beyond this, I think I'm rambling.
>
> ___
> Discuss mailing list
> Discuss@blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Matt Shields
On Fri, Dec 17, 2010 at 8:44 AM, Matt Shields  wrote:

> On Fri, Dec 17, 2010 at 8:22 AM, Chris O'Connell wrote:
>
>> Hey Mark,
>>
>> IMO I think there are a few important features:
>> 1.  The backups must be mountable, allowing for file browsing and single
>> file restoration.
>> 2.  The backup should NOT be file based, it should be image based.
>> 3.  Encrypted backups.  I want the backups to be encrypted and I want the
>> encryption to be self contained in the backup.  This means that you can
>> take
>> the backup to ANY computer with the backup software and open the backup
>> file
>> by entering a password.
>>
>> Acronis has all of these features, but it's expensive and I don't believe
>> the software will run on these linux nas devices you've specified.
>>
>> --Chris
>>
>> On Fri, Dec 17, 2010 at 8:10 AM, Mark Woodward 
>> wrote:
>>
>> > While I've got some free time on my hands, I decided to start work on a
>> > project. At its core, it is very much like a standard backup system.
>> > What makes it different from a regular backup is what you do with the
>> > data retrieved after the backup. I know it is a long shot or even a
>> > fools errand to start anything so pedantic and well traveled, but there
>> > is a specific need that I believe has been identified, but requires
>> > "backup" done in a specific manner.  Anyway, who knows? I'm already
>> > testing and using some of the core pieces and I have to say, I like it.
>> >
>> > My target OS are Windows, MacOS, and Linux. It will run on desktops,
>> > servers, and even some of these little NAS boxes that run Linux.
>> >
>> > My question for you guys is what do you *want* in a backup. We've all
>> > used these feature laden things that are out there, 99% of which is
>> > pointless.  What are "must haves?" What is something you've wanted but
>> > can't find? What are features that are most pointless and why?
>> > ___
>> > Discuss mailing list
>> > Discuss@blu.org
>> > http://lists.blu.org/mailman/listinfo/discuss
>> >
>> ___
>> Discuss mailing list
>> Discuss@blu.org
>> http://lists.blu.org/mailman/listinfo/discuss
>>
>
> The ability for it to work not only through it's native communication
> method (usually custom port/protocol), but also be able to run over an SSH
> tunnel.  Sometimes you don't want to open up that port to the outside world
> even if it is encrypted, so being able to tell it to start an SSH tunnel.
>  Have that be part of the configuration where you can specify the
> username/password or keypair to let it establish it's own SSH tunnel.  Also,
> be able to specify a bandwidth limit on transfers.
>
> Ability to do full, differentials and incremental backups.
>
> Like Chris mentioned, the ability to easily browse the backups and pull out
> specific files/directories.
>
> Since it's not a good practice to touch the raw files for MySQL (probably
> not for PostgreSQL either), be able to exclude path's.  But also be able to
> specify there is a database and do a db backup.
>
> -matt
>
>
>
Another useful feature is that the system should automatically verify the
backups to make sure they are good and alert you when there's a problem.
 Nothing like needing to go back to your backups and realizing that they
haven't been running for months.

-matt
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Rob Hasselbaum
On Fri, Dec 17, 2010 at 9:46 AM, Edward Ned Harvey wrote:

>
> There are different requirements for laptops and servers.
>
> Laptops:
> Run frequently (minimum once daily), silently, in the background, low
> enough
> priority that users don't generally notice or care.  Does not need to scan
> the entire filesystem to see which files have changed.  It may seem obvious
> now, but must run while the OS is running.  Must be able to exclude files
> and directories (even filetypes, etc)...  Backup to remote server.  The
> remote server must have sufficient security as to prevent Jane from reading
> Tarzan's backups.  And ability to do baremetal restore.  Ability to browse
> the backup to retrieve a specific file or subset of files.  Compression
> goes
> without saying.  Simple for users to restore their own stuff without IT
> help.
>


Sorry if this has been mentioned already. I'm coming late to this thread.
But I've found BackupPC to be an excellent solution for laptop backup over
the network. BackupPC is free, runs on a server, and works by "reaching out"
to workstations over the network once a day (or however often you like) for
full and incremental backups. I particularly like it because it requires no
client software (works via SAMBA, rsync over SSH, or rsyncd), and it is very
resilient to laptops coming and going on the network. If a laptop
disappears, it simply tries again later. It has a web interface with
separate authentication for administrators and users so users can restore
their own files. It will email you and/or the PC owner if something
consistently goes wrong (e.g. a PC doesn't get backed up for X number of
days). Files get backed up to the server file system and it's smart about
disk usage such that if the same file appears in multiple backups, it is
only stored once.

I started using it years ago and it has been quietly chugging along every
since. Highly recommended.

http://backuppc.sourceforge.net/
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread David Miller
On Fri, Dec 17, 2010 at 10:05 AM, Chris O'Connell wrote:

> Agreed, bare metal restore is a must.
>
>
I'm curious since bare metal restores have been mentioned what is everyone's
thoughts on using a pxe boot image to pre-seed or kickstart the server.
Then use puppet or chef to bring the server to a known configuration.  This
makes dealing with production vs testing environments much easier too as
it's simple to insure that they are identical and repeatable.

This approach also avoids any issues where a server was just discovered to
have been hacked but when it was hacked is unclear.  Rather than spending
time trying to find your last clean image you can just fire up a clean
install quickly and consistently.
--
David
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Matt Shields
On Fri, Dec 17, 2010 at 10:36 AM, David Miller  wrote:

> On Fri, Dec 17, 2010 at 10:05 AM, Chris O'Connell  >wrote:
>
> > Agreed, bare metal restore is a must.
> >
> >
> I'm curious since bare metal restores have been mentioned what is
> everyone's
> thoughts on using a pxe boot image to pre-seed or kickstart the server.
> Then use puppet or chef to bring the server to a known configuration.  This
> makes dealing with production vs testing environments much easier too as
> it's simple to insure that they are identical and repeatable.
>
> This approach also avoids any issues where a server was just discovered to
> have been hacked but when it was hacked is unclear.  Rather than spending
> time trying to find your last clean image you can just fire up a clean
> install quickly and consistently.
> --
> David
> ___
> Discuss mailing list
> Discuss@blu.org
> http://lists.blu.org/mailman/listinfo/discuss
>

pxe booting has been my approach.  I use Cobbler for pxe with Puppet for
configuration.  Currently the only thing I backup is cobbler, puppet and
data (custom code or databases).  From those 3 things I can restore any
machine.  By using this approach it also makes it easy to scale horizontally
since adding a new server means just adding it to a pre-defined group, which
then tells it to load specific configurations and data.

-matt
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Richard Pieri
On Dec 17, 2010, at 8:10 AM, Mark Woodward wrote:
> 
> My question for you guys is what do you *want* in a backup. We've all 
> used these feature laden things that are out there, 99% of which is 
> pointless.  What are "must haves?" What is something you've wanted but 
> can't find? What are features that are most pointless and why?

Backups must be simple to make and simple to maintain, and they must be easily 
automated.

Disaster recovery must be simple.  I need to be able to put a tape in the 
drive, type a command or three, and undump the backup to the file system.  In 
particular, I need to be able to do this from a live disc or netboot.

Individual file restores must be as simple as a full system restore.

Anything that facilitates these requirements is good.  Time Machine is an 
example of a "rich" backup system that meets these requirements.  Creating 
backups is simple: plug in a drive, click "yes" when OS X asks to use it for 
Time Machine.  Automation is automatic: Time Machine makes an incremental 
backup every hour.  Disaster recovery is simple: boot the OS X installation 
media, tell it to restore from a Time Machine backup, sit back and let it go.  
Individual restores are equally simple, and Time Machine offers two nearly 
identical methods.  For files, open the containing directory (folder) in 
Finder, Enter Time Machine from the Time Machine menulet, browse for when, 
select your files, and restore.  For applications that are aware of Time 
Machine, such as Mail and Address Book, repeat the same process but from the 
applications themselves rather than Finder.

Anything that can get in the way of these requirements is bad.  Anything that 
complicates restoration or makes disaster recovery take longer than booting a 
disc and undumping a tape is bad.  Examples include Legato Networker and 
Windows Backup.  The former requires catalog files to do any data restores, and 
these catalogs are not stored with the backup data in an accessible form.  If 
you lose a catalog or a catalog is corrupted then you have to scan the entire 
dump set to rebuild the catalog, which takes as long as a restore would, and 
only then can it be restored (caveat: this is 15 year old data; Legato could 
have fixed this glaring flaw).  The latter requires a full OS installation on 
the hardware being restored before a backup can be restored onto it, and the 
result is not a perfect 1:1 restoration of the original.

Everything else is nice to have.  Some nice to have things, such as incremental 
backups, data compression and encryption, are mandatory for specific 
environments.  I do not consider them to be universally required in a backup 
system.

--Rich P.


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-17 Thread Jerry Feldman
On 12/17/2010 10:05 AM, Matt Shields wrote:
> Another useful feature is that the system should automatically verify the
> backups to make sure they are good and alert you when there's a problem.
>  Nothing like needing to go back to your backups and realizing that they
> haven't been running for months.
Agreed. This happened for both the BLU and me. Actually in my case it
was interesting. First, I forgot to reset my crontab when I had
previously had upgraded the OS. But, even worse, I was backing up using
tar on a 32-bit system and the backup picked up a virtual Machine. I
could recover all the data before the VM but not after it. With the BLU,
we had added some additional security on the logins because we were
seeing increased attacks. However, this also prevented the backups from
working, but without a notifier we were not aware that it was not
actually performing the task.

-- 
Jerry Feldman 
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


RE: Backup software

2010-12-17 Thread Edward Ned Harvey
> From: David Miller [mailto:davi...@gmail.com]
> 
> I'm curious since bare metal restores have been mentioned what is
> everyone's thoughts on using a pxe boot image to pre-seed or kickstart the
> server.  Then use puppet or chef to bring the server to a known
> configuration.  This makes dealing with production vs testing environments
> much easier too as it's simple to insure that they are identical and
repeatable.

Baremetal restores are mostly just important for laptops and workstations...
And one-of-a-kind servers.  Actually, there's a better way to say this.

Automated system complete build scripts are excellent whenever you have
machines that are unspecialized clones with a (relatively) static
configuration.  They are not useful, for example, to help you build an
active directory, MS SQL and Sharepoint server.  Nor will they prevent Joe
User from wasting time reinstalling all his apps and reconfiguring all his
personal settings after a laptop is wiped out.

Baremetal restore is necessary for some situations...  And automated system
reinstall is excellent for other situations.


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Mark Woodward
On 12/17/2010 08:22 AM, Chris O'Connell wrote:
> Hey Mark,
>
> IMO I think there are a few important features:
> 1.  The backups must be mountable, allowing for file browsing and 
> single file restoration.
May I ask about this requirement? Does it need to be mountable? You can 
do Browse-able with single file restoration a number of ways, but 
"mountable" implies a file system construct, and I'm not sure that is 
feasible in a reasonable amount of time.

> 2.  The backup should NOT be file based, it should be image based.

That is actually at direct odds with the purpose of the backup. One of 
the main purposes is to provide extensive information about the files 
being backed up and why.

> 3.  Encrypted backups.  I want the backups to be encrypted and I want 
> the encryption to be self contained in the backup.  This means that 
> you can take the backup to ANY computer with the backup software and 
> open the backup file by entering a password.
The thing about passwords, and yes, encryption is high on my list of 
things that are must haves, is storage of them. If you require 
encryption, then you must either have the user enter the password at the 
time of backup, or you must store the password for use at a later date 
for automated backup.

Would you be satisfied with storing the password on the backup machine 
in format which may be vulnerable, but the backup target media would 
never see the password and be in an AES encrypted format?
>
> Acronis has all of these features, but it's expensive and I don't 
> believe the software will run on these linux nas devices you've specified.
Acronis is a good product, for what it is, but lets just say "that" 
market is served by products like Acronis. The market I'm targeting is 
much less concerned with "backup" and "restore" "disaster recovery,"  
and far more concerned with integrity and longevity.  I mean, yes, the 
data *MUST* be retrievable, but it wouldn't typically be used for full 
system backup.

It is concerned more with backing up information than it is backing up 
systems. Does that make sense?

>
> --Chris
>
> On Fri, Dec 17, 2010 at 8:10 AM, Mark Woodward  > wrote:
>
> While I've got some free time on my hands, I decided to start work
> on a
> project. At its core, it is very much like a standard backup system.
> What makes it different from a regular backup is what you do with the
> data retrieved after the backup. I know it is a long shot or even a
> fools errand to start anything so pedantic and well traveled, but
> there
> is a specific need that I believe has been identified, but requires
> "backup" done in a specific manner.  Anyway, who knows? I'm already
> testing and using some of the core pieces and I have to say, I
> like it.
>
> My target OS are Windows, MacOS, and Linux. It will run on desktops,
> servers, and even some of these little NAS boxes that run Linux.
>
> My question for you guys is what do you *want* in a backup. We've all
> used these feature laden things that are out there, 99% of which is
> pointless.  What are "must haves?" What is something you've wanted but
> can't find? What are features that are most pointless and why?
> ___
> Discuss mailing list
> Discuss@blu.org 
> http://lists.blu.org/mailman/listinfo/discuss
>
>

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Mark Woodward
On 12/17/2010 08:44 AM, Matt Shields wrote:
>
>
> The ability for it to work not only through it's native communication 
> method (usually custom port/protocol), but also be able to run over an 
> SSH tunnel.  Sometimes you don't want to open up that port to the 
> outside world even if it is encrypted, so being able to tell it to 
> start an SSH tunnel.  Have that be part of the configuration where you 
> can specify the username/password or keypair to let it establish it's 
> own SSH tunnel.  Also, be able to specify a bandwidth limit on transfers.
I'm not sure that applies to the market I'm targeting, I need to think 
about it.
>
> Ability to do full, differentials and incremental backups.
Absolutely, compression as well.
>
> Like Chris mentioned, the ability to easily browse the backups and 
> pull out specific files/directories.
Already planned.
>
> Since it's not a good practice to touch the raw files for MySQL 
> (probably not for PostgreSQL either), be able to exclude path's.  But 
> also be able to specify there is a database and do a db backup.
For now, I'm going to punt on databases. That may be a version 1.1 feature.
>
> -matt
>
>

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Mark Woodward
On 12/17/2010 09:46 AM, Edward Ned Harvey wrote:
>> From: discuss-boun...@blu.org [mailto:discuss-boun...@blu.org] On Behalf
>> Of Mark Woodward
>>
>> My question for you guys is what do you *want* in a backup. We've all
>> used these feature laden things that are out there, 99% of which is
>> pointless.  What are "must haves?" What is something you've wanted but
>> can't find? What are features that are most pointless and why?
>>  
> There are different requirements for laptops and servers.
>
> Laptops:
> Run frequently (minimum once daily), silently, in the background, low enough
> priority that users don't generally notice or care.  Does not need to scan
> the entire filesystem to see which files have changed.  It may seem obvious
> now, but must run while the OS is running.  Must be able to exclude files
> and directories (even filetypes, etc)...  Backup to remote server.  The
> remote server must have sufficient security as to prevent Jane from reading
> Tarzan's backups.  And ability to do baremetal restore.  Ability to browse
> the backup to retrieve a specific file or subset of files.  Compression goes
> without saying.  Simple for users to restore their own stuff without IT
> help.
>
The permissions to keep Jane from reading tarzan's files is an 
interesting one. Its obvious when said, but didn't occur to me. It's a 
potentially difficult problem.  Would merely matching the user name to 
the owner of the files be enough or would you also require full group 
access?

So, to get access to a backup set, you would need a user to be created 
for you by an admin (or some audomated tool, I'm not sure) and then you 
would only be able to see the files which you own. Would that be OK?

For compression, a user can specify a level 1-10, 1 is no compression, 
and 10 is full.

I also support encryption, although one has to compress before encrypt 
or compress doesn't work.
> Bonus features, not requirements:  Able to run over WAN.  Able to
> efficiently handle sparse files, such as guest VM's efficiently.
> Centralized management, so administrators can quickly see how recently
> somebody's backup was successful... And even threshold alerts if somebody's
> backup hasn't run in a week or something like that...
>
Sparse files are interesting. I hadn't thought of those. Not sure how to 
handle them. Got a suggestion?
> Servers:
> Actually, most of the above applies for servers too.  The main difference is
> assigning priorities to the various aspects listed above.  Like, for
> example, I consider it an absolute requirement that servers are able to
> backup the whole filesystem without scanning the whole filesystem for
> changes, but on a laptop, that might be acceptable if only it's able to
> complete fast enough.  But on a server, if you don't have instant snapshots
> etc, there is just absolutely no way possible to finish any significantly
> sized backups in a reasonable amount of time.
>
In version 1.0 I'm stuck, its a fail walker. Next version I'm thinking 
of using OS specific file access monitoring for incremental backup  and 
access logging.
> Beyond this, I think I'm rambling.
>

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Jerry Feldman
On 12/18/2010 12:18 PM, Mark Woodward wrote:
> On 12/17/2010 08:22 AM, Chris O'Connell wrote:
>> Hey Mark,
>>
>> IMO I think there are a few important features:
>> 1.  The backups must be mountable, allowing for file browsing and 
>> single file restoration.
> May I ask about this requirement? Does it need to be mountable? You can 
> do Browse-able with single file restoration a number of ways, but 
> "mountable" implies a file system construct, and I'm not sure that is 
> feasible in a reasonable amount of time.
>
>> 2.  The backup should NOT be file based, it should be image based.
> That is actually at direct odds with the purpose of the backup. One of 
> the main purposes is to provide extensive information about the files 
> being backed up and why.
I was originally going to question Chris on this, but an image backup to
another physical drive does make sense, especally on reading his
subsequent posts. I personally prefer a file based backup, but an
image-based backup has a distinct advantage in case of a drive failure.
One advantage of a file-by-file backup, such as rsnapshot is that you
have an incremental backup, so if I somehow change or overwrite a file,
I can go back to a previous version of that file. But, I can also use a
source control system such as git. I think you need to look at the
system you are backing up, and recovery from a failed drive. With
Windows, I certainly would do image backups because of certain unmovable
files and the registry.

-- 
Jerry Feldman 
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Jack Coats
Mark,

What is on your current version of a feature list?  I know lots of things
have been brought up and considered.
What are you considering as reasonable for your first cut?  Additional
versions?

I would like to see some 'exits' or scripts that are user generated that
could quiesce a database before
it is backed up, and start it back up afterwards.  Possibly one at the
'start' of the backup and another
at the end, in case there are particular tasks that users want to do.  Like
scan /tmp before a backup
for files that need to go away, and even be able to shut a system down after
a backup is complete,
or just about anything that could be scripted.  Also, if the scripts aren't
there, it is a non-problem, just
go on with backups. ... All these are things I have actually used in backup
products.

Another thing that would be nice is the ability to do a 'on demand user
backup' for files or directory structures.
I have done this just before I want to run a script that might trash my test
data.  This way I could get my test
data back.  Nice for regression
testing.
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread Jerry Feldman
I think the bottom line for this discussion is really that you first
need to define your requirements as well as what you are going to back
up, what are the security constraints. How often do you need to restore
individual files, and in the case of a catastrophic failure, such as a
head crash, how fast can you recover. Another issue is the quality of
the backup. If your backup is corrupted, then it is useless. Another
case is how often to backup. This can affect how much data you can
afford to have to recreate after a failure. In a commercial situation
you also may have a legal requirement to save files for a number of
years. Once you list your requirements, then you can set your strategy
and eventually choose the method and tools.
I also prefer a backup to disk because that drive will most likely be
available, but in the case of things like a surge or fire, any attached
device can be fried, so you might want an offsite backup. In my office
we backup to an attached Western Digital 2TB (RAID1) system. The device
is very, very slow, but meets our needs. We also backup to New York
every night, and New York backs us up to tape. If our building burned
down, we would be able to fully recover home and important working
directories, although they would be about 2 days old. This works for us.
But, the nature of the business dictates the needs.

-- 
Jerry Feldman 
Boston Linux and Unix
PGP key id: 537C5846
PGP Key fingerprint: 3D1B 8377 A3C0 A5F2 ECBB  CA3B 4607 4319 537C 5846


___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-18 Thread David Kramer
On 12/17/2010 08:10 AM, Mark Woodward wrote:
> While I've got some free time on my hands, I decided to start work on a 
> project. At its core, it is very much like a standard backup system. 
> What makes it different from a regular backup is what you do with the 
> data retrieved after the backup. I know it is a long shot or even a 
> fools errand to start anything so pedantic and well traveled, but there 
> is a specific need that I believe has been identified, but requires 
> "backup" done in a specific manner.  Anyway, who knows? I'm already 
> testing and using some of the core pieces and I have to say, I like it.
> 
> My target OS are Windows, MacOS, and Linux. It will run on desktops, 
> servers, and even some of these little NAS boxes that run Linux.
> 
> My question for you guys is what do you *want* in a backup. We've all 
> used these feature laden things that are out there, 99% of which is 
> pointless.  What are "must haves?" What is something you've wanted but 
> can't find? What are features that are most pointless and why?

I've talked about my backup methodology, and I don't think it's common
or standard, but I'll throw it out again so you can laugh heartily at it.

I try to only back up data and config files, not binaries.  All bin
directories (except under /usr/local) are excluded.  My theory (which
has been put to practice) is that I can restore the OS through regular
means, upgrade packages as needed, then restore my data, mail, scripts,
websites, etc.  It also gives me the ability to install a newer version,
restore all of my data and most of my configs, doing a diff to bring
them up to the newer versions as needed (which I've done).  I do not
back up my MythTV video files at all, which did actually bite me once,
but I worked my way through denial, anger, recovery.

The upshot is the .tgz backup for my server is about 14GB right now.
That gets copied to an external USB hard drive that is normally
unplugged from both power and data.  I also bought a 16GB thumb drive to
bring to work as an offsite backup (but I encrypt that).

Image backups can be nice and fast and simple, but it means your backup
media has to be several times the storage of your entire system (for
multiple backups), you only have the option of getting back to where you
were, and you have to trust that it was a real snapshot with all files
synchronized,

With regards to not letting Tarzan and Jane see each others' files, I
would be careful of trying to solve that one, because it forces you to
not only have intimate details of the filesystem involved, but what
authentication system the backed up system was using, etc.
___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


RE: Backup software

2010-12-18 Thread Edward Ned Harvey
> From: Mark Woodward [mailto:ma...@mohawksoft.com]
> 
> The permissions to keep Jane from reading tarzan's files is an
> interesting one. Its obvious when said, but didn't occur to me. It's a
> potentially difficult problem.  Would merely matching the user name to
> the owner of the files be enough or would you also require full group
> access?

Well, I have no idea what you have in mind, but ... I usually solve this
problem by backing up onto standard file shares of some kind.  If it's a
cifs share on a windows server, I simply set the ACL's to allow only Jane to
access Jane's backup directory.  If it's Apple, I create separate shares
which are only accessible by their owner.  And so on.


> So, to get access to a backup set, you would need a user to be created
> for you by an admin (or some audomated tool, I'm not sure) and then you
> would only be able to see the files which you own. Would that be OK?

Well, yes.  But of course, it's desirable  if it's based on a pre-existing
credentials system such as AD.


> For compression, a user can specify a level 1-10, 1 is no compression,
> and 10 is full.

This is a low-priority request, basically irrelevant to anything, but I'm an
idealist so I like to promote.  I think it's desirable to allow a choice of
compression algorithm.  LZO is always so fast that it effectively removes
large sequential repeated patterns (such as zero-filled files) but LZO will
never become processor bound, because it's just so darn fast, and generally
pretty wimpy compression.  But good whenever you have really fast IO
channels, because it'll never slow you down and sometimes speed you up.
Gzip/Zip seem to be industry standard, and as far as I can tell, they have
no discernable advantage and should be antiquated.  Bzip2 is also often
used, and it always loses to LZMA, so I think bzip2 should be retired.  LZMA
(7-zip, XZ) if you set level-1 compression, is both faster and stronger
compression than any level of gzip or bzip2.  End result is:  The only
compressions I ever use are lzop and xz -1 (and 7-zip)


> Sparse files are interesting. I hadn't thought of those. Not sure how to
> handle them. Got a suggestion?

Not really.  It's something a filesystem either supports, or doesn't.  For
example, if you have a sparse file inside a ZFS filesystem, and you do an
incremental ZFS send...  Then ZFS is intelligent enough to instantly
identify and send only the changed blocks.  No scanning or anything.
However, there is no such functionality in NTFS, EXT3/4, HFS+, or most
filesystems...

In most filesystems...  Let's suppose you have a disk which will read
500Mbit/s (which is a typical 7200rpm sata drive.)  If you simply read a
non-sparse file from start to end, then of course it will take some amount
of time, based on the speed of the disk.  But if you read a sparse file from
start to end, then the filesystem will generate zero-fill for all the sparse
sections, and it does this faster than the disk could have read real data.
I'd estimate about 10x faster.  

So unless your filesystem has a way of providing an index of the sparse
sections of a file (which no filesystem does, AFAIK)...  And unless you're
using ZFS Send...  The best alternative is to simply read the whole sparse
file from start to end, as fast as possible.  And this is NOT fast, even in
a sparse file.  Oh well.  The world stinks!!!


> In version 1.0 I'm stuck, its a fail walker. Next version I'm thinking
> of using OS specific file access monitoring for incremental backup  and
> access logging.

FYI, easier said than done.  Just so you know.  Yes, it's possible (just
look at dropbox, and ifolder, and sugarsync...)  but I have to assume there
are tricks and unreliability issues and booby traps, or else there would be
more products out there capable of doing that.

___
Discuss mailing list
Discuss@blu.org
http://lists.blu.org/mailman/listinfo/discuss


Re: Backup software

2010-12-19 Thread Mark Woodward
On 12/18/2010 11:55 PM, Edward Ned Harvey wrote:
>> From: Mark Woodward [mailto:ma...@mohawksoft.com]
>>
>> The permissions to keep Jane from reading tarzan's files is an
>> interesting one. Its obvious when said, but didn't occur to me. It's a
>> potentially difficult problem.  Would merely matching the user name to
>> the owner of the files be enough or would you also require full group
>> access?
>>  
> Well, I have no idea what you have in mind, but ... I usually solve this
> problem by backing up onto standard file shares of some kind.  If it's a
> cifs share on a windows server, I simply set the ACL's to allow only Jane to
> access Jane's backup directory.  If it's Apple, I create separate shares
> which are only accessible by their owner.  And so on.
>
Well, that's more or less what I intend to do as well. This is basically 
a file system to file system backup. I hesitate to call it a "backup," 
because there are a lot of preconceived notions about what a backup is.  
This is more of a targeted information management system. It is intended 
to handle a number of "businessy" requirements, but at its core, it 
really is just a type of backup program.

Yea, ACLs and so on are probably the way to go if this were a typical 
backup, but it it needs to operate as a stand-alone product as well.

>> So, to get access to a backup set, you would need a user to be created
>> for you by an admin (or some audomated tool, I'm not sure) and then you
>> would only be able to see the files which you own. Would that be OK?

>> Well, yes.  But of course, it's desirable  if it's based on a pre-existing
>> credentials system such as AD.
>>  
Yes, that's a good idea and I could enumerate AD and get all the 
heirarchy and so on, but ACLs are a PITA to parse and get right. For 
instance, built-in accounts as seen by AD are the built-ins on the AD 
server.  So, if the AD server is more or less liberal for the built-ins 
than the targeted systems, then the rights will be incorrect giving too 
much or too little access. Causing users to either be able to access 
data they shouldn't or unable to access data they should.

I think a positive user rights grant by an admin is less problematic to 
get right the first time, and maybe work on the AD matching and 
enumeration later. Besides, it would need to connect to the AD server to 
validate the credentials and there is a strong likelihood that  this 
will operate outside a domain.


>> For compression, a user can specify a level 1-10, 1 is no compression,
>> and 10 is full.
>>  
> This is a low-priority request, basically irrelevant to anything, but I'm an
> idealist so I like to promote.  I think it's desirable to allow a choice of
> compression algorithm.  LZO is always so fast that it effectively removes
> large sequential repeated patterns (such as zero-filled files) but LZO will
> never become processor bound, because it's just so darn fast, and generally
> pretty wimpy compression.  But good whenever you have really fast IO
> channels, because it'll never slow you down and sometimes speed you up.
> Gzip/Zip seem to be industry standard, and as far as I can tell, they have
> no discernable advantage and should be antiquated.  Bzip2 is also often
> used, and it always loses to LZMA, so I think bzip2 should be retired.  LZMA
> (7-zip, XZ) if you set level-1 compression, is both faster and stronger
> compression than any level of gzip or bzip2.  End result is:  The only
> compressions I ever use are lzop and xz -1 (and 7-zip)
>
If you can recommend a BSD licensed compression library that meets your 
wish list, I'll add it.
>
>> Sparse files are interesting. I hadn't thought of those. Not sure how to
>> handle them. Got a suggestion?
>>  
> Not really.  It's something a filesystem either supports, or doesn't.  For
> example, if you have a sparse file inside a ZFS filesystem, and you do an
> incremental ZFS send...  Then ZFS is intelligent enough to instantly
> identify and send only the changed blocks.  No scanning or anything.
> However, there is no such functionality in NTFS, EXT3/4, HFS+, or most
> filesystems...
>
> In most filesystems...  Let's suppose you have a disk which will read
> 500Mbit/s (which is a typical 7200rpm sata drive.)  If you simply read a
> non-sparse file from start to end, then of course it will take some amount
> of time, based on the speed of the disk.  But if you read a sparse file from
> start to end, then the filesystem will generate zero-fill for all the sparse
> sections, and it does this faster than the disk could have read real data.
> I'd estimate about 10x faster.
>
> So unless your filesystem has a way of providing an index of the sparse
> sections of a file (which no filesystem does, AFAIK)...  And unless you're
> using ZFS Send...  The best alternative is to simply read the whole sparse
> file from start to end, as fast as possible.  And this is NOT fast, even in
> a sparse file.  Oh well.  The world stinks!!!
>