Re: [BackupPC-users] Filter by file type

2007-01-27 Thread Holger Parplies
Hi,

Matteo Barbieri wrote on 16.01.2007 at 15:46:47 [[BackupPC-users] Filter by 
file type]:
> [...]
> I have to backup some Win Laptops, but I only want to save some file types.
> So i set $Conf {BackupFilesOnly} to '*.doc', but backuppc saves only 
> word files that are in the "root" of the module (setting it to '*/*.doc' 
> saves only files on a subdirectory).
> Is there a way to save only files with certain extensions recursively?

I'll assume you're using rsync(d) as transfer method, because that's the case
I'd like to answer.

>From looking at the BackupPC code and one or two rsync man pages, I'd expect
the following to work.

Leave $Conf{BackupFilesOnly} empty and add the following elements at
the *end* of the $Conf{RsyncArgs} array:

'--include=*/',
'--include=*.doc',
'--exclude=*',

(meaning: include any directories anywhere in the tree (not the files in
  them, just the directory skeleton),
  include any doc files,
  exclude any other files).
These patterns are adopted from rsync's man page, making them likely to
actually work.

I can't see any trick to make $Conf{BackupFilesOnly} do what you want, but
you can use $Conf{BackupFilesExclude}, even if that sounds paradoxical.

$Conf {BackupFilesExclude} = [ '+ */', '+ *.doc', '*' ];

See the rsync man page for details on why that should work.

Regards,
Holger


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/


Re: [BackupPC-users] every hour backup

2007-01-27 Thread John Pettitt
Phong Nguyen wrote:
> Hi all,
>
> I just would like to know if it is possible to make an incremental
> backup of a host every hour.
> I don't know how to set the value for $Conf{IncrPeriod}  since it juste
> take a value counted in days.
> Thanks a lot
>
> Phong Nguyen
>
> Axone S.A.
> Geneva / Swiss
>
>   


The value is in days but will happily accept numbers less than 1 - just 
set it to a number slightly less than 1/24.  you'll also need to mess 
with the blackout period (make sure there isn't one).  You will also 
need to account for full backups which may take more than an hour to 
run. Oh and make sure you've configure enough simultaneous jobs to 
allow your backup to always run.

What problem are you actually trying to solve?Keep in mind that you 
may have open file contention issues as well.

John

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/


Re: [BackupPC-users] Signal =PIPE?

2007-01-27 Thread Travis Fraser
On Sat, 2007-01-27 at 17:26 -0500, Randy Barlow wrote:
>   Every now and then (2 - 3 days or so) I get an e-mail like this:
> 
> > The following hosts had an error that is probably caused by a
> > misconfiguration.  Please fix these hosts:
> >   - abc.ece.ncsu.edu (aborted by signal=PIPE)
> > 
> > Regards,
> > PC Backup Genie
> 
> It's always about the same PC, and I don't get these errors with other
> machines.  The things that make this PC unique from my other machines
> are that it is a remote host (so backup over Internet), and I use rsync
> over ssh rather than rsyncd.  It's not something I'm worried about, but
> I was wondering if anyone might have any helpful comments...
> 
Have you tried increasing the client timeout?
-- 
Travis Fraser <[EMAIL PROTECTED]>


-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/


[BackupPC-users] Backuppc host discovery for linux computers

2007-01-27 Thread Ted To

Hi,

Reading the documenation on host discovery, it's not clear to me how to set
up my network so that backuppc knows what dhcp assigned IP address to
associate with a linux box.  Can someone offer some help or point me in the
right direction?

Thanks,
Ted To
-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/


[BackupPC-users] Signal =PIPE?

2007-01-27 Thread Randy Barlow
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Howdy,

Every now and then (2 - 3 days or so) I get an e-mail like this:

> The following hosts had an error that is probably caused by a
> misconfiguration.  Please fix these hosts:
>   - abc.ece.ncsu.edu (aborted by signal=PIPE)
> 
> Regards,
> PC Backup Genie

It's always about the same PC, and I don't get these errors with other
machines.  The things that make this PC unique from my other machines
are that it is a remote host (so backup over Internet), and I use rsync
over ssh rather than rsyncd.  It's not something I'm worried about, but
I was wondering if anyone might have any helpful comments...

Randy
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFFu9GThOwpC65EoKARAtRiAKCeQlDzsiJlxT7kcgVky74/kS9WvQCfYMMZ
/X0G2fieF/x9i+GjYkxAcdQ=
=+ozv
-END PGP SIGNATURE-

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/backuppc-users
http://backuppc.sourceforge.net/


Re: [BackupPC-users] Long: How BackupPC handles pooling, and how transfer methods affect bandwidth usage

2007-01-27 Thread Les Mikesell
Timothy J. Massey wrote:

>  > > As a start, how about a utility that simply clones one host to another
>  > > using only the pc/host directory tree, and assumes that none of the
>  > > source files are in the pool, just like it would during a brand-new
>  > > rsync backup?
>  >
>  > That would be better than nothing, but if you have multiple full runs
>  > that you want to keep you'll have to transfer a lot of duplicates that
>  > could probably be avoided.
> 
> Correct.  But it's a proof of concept that can be refined.  I understand 
> that some sort of inode or hash caching is required.  But the first step 
> can be done with the parts we've already got.

Agreed, but it's a lot easier to design in the out-of-band info you'll 
need later than to try to figure out where to put it afterwards.

>  > But what is the advantage over just letting the remote server make its
>  > run directly against the same targets?
> 
> I thought a lot of Holger's points were good.  But for me, it comes down 
> to two points:
> 
> Point 1:  Distributing Load
> ===
> I have hosts that take, across a LAN, 12 hours to back up.  The deltas 
> are not necessarily very big:  there's just *lots* of files.  And these 
> are reasonably fast hosts:  >2GHz Xeon processors, 10k and 15k RPM 
> drives, hardware SCSI (and now SAS) RAID controllers, etc.
> 
> I want to store the data in multiple places, both on the local LAN and 
> in at least 2 remote locations.  That would mean 3 backups.  It's 
> probably not going to take 36 hours to do that, but it's going to take a 
> *lot* more than 12...
> 
> Other times, it's not the host's fault, but the Internet connection. 
> Maybe it's a host that's behind a low-end DSL that only offers 768k up 
> (or worse).  It's hard enough to get *one* backup done over that, let 
> alone two.
> 
> So how can I speed this up?

Brute force approach: park a linux box with a big disk on the local LAN 
side.  Do scripted stock rsync backups to this box to make full 
uncompressed copies with each host in its own directory. It's not as 
elegant as a local backuppc but you get quick access to a copy
locally plus offloading any issues you might have in the remote 
transfer.  I actually use this approach in several remote offices, 
taking advantage of an existing box that also provides VPN and some file 
shares.  One up side is that you can add the -C option on the ssh 
command that runs rsync to get compression on the transfer (although
starting over, I'd use openvpn as the VPN and add compression there).

> And once one remote BackupPC server has the data, the 
> rest can get it over the very fast Internet connections that they have 
> between them.  So I only have to get the data across that slow link 
> once, and I can still get it to multiple remote locations.

For this case you might also want to do a stock rsync copy of the 
backups on the remote LAN to an uncompressed copy at the central 
location, then point 2 or more backuppc instances that have faster 
connections at that copy. Paradoxically, stock rsync with the -z option 
can move data more efficiently than just about anything but it requires 
the raw storage at both ends to be uncompressed.

This might be cumbersome if you have a lot of  individual hosts to add 
but it isn't bad if everyone is already saving the files that need 
backup onto one or a few servers at the remote sites.

As I've mentioned before, I raid-mirror to an external drive weekly to 
get an offsite copy.


> On top of this, the BackupPC server has a much easier task to replicate 
> a pool than the host does in the first place.  Pooling has already been 
> taken care of.  We *know* which files are new, and which ones are not.

I don't think you can count on any particular relationship between local 
and remote pools.

> There are only two things the replication need worry about:  1) 
> Transferring the new files and see if they already exist in the new 
> pool, and 2) integrating these new files into the remote server's own pool.

That happens now if you can arrange for the rsync method to see a raw 
uncompressed copy.  I agree that a more elegant method could be written, 
but so far it hasn't.

> Point 2:  Long-term Management of Data

LVM on top of RAID is probably the right approach for being able to 
maintain an archive that needs to grow and have failing drives replaced.

> However, with the ability to migrate hosts from one host to another, I 
> can have tiers of BackupPC servers.  As hosts are retired, I still need 
> to keep their data.  7 years was not chosen for the fun of it:  SOX 
> compliance requires it.  However, I can migrate it *out* of my 
> first-line backup servers onto secondary servers.

Again there is a brute force fix: keep the old servers with the old data 
but add new ones at whatever interval is necessary to keep current data. 
  You'll have to rebuild the pool of any still-existing files, but as a 
tradeoff you get some redundancy.

 >

Re: [BackupPC-users] Long: How BackupPC handles pooling, and how transfer methods affect bandwidth usage

2007-01-27 Thread Timothy J. Massey
Les Mikesell <[EMAIL PROTECTED]> wrote on 01/26/2007 09:53:11 PM:

 > Timothy J. Massey wrote:
 > >
 > > As a start, how about a utility that simply clones one host to another
 > > using only the pc/host directory tree, and assumes that none of the
 > > source files are in the pool, just like it would during a brand-new
 > > rsync backup?
 >
 > That would be better than nothing, but if you have multiple full runs
 > that you want to keep you'll have to transfer a lot of duplicates that
 > could probably be avoided.

Correct.  But it's a proof of concept that can be refined.  I understand 
that some sort of inode or hash caching is required.  But the first step 
can be done with the parts we've already got.

 > But what is the advantage over just letting the remote server make its
 > run directly against the same targets?

I thought a lot of Holger's points were good.  But for me, it comes down 
to two points:

Point 1:  Distributing Load
===
I have hosts that take, across a LAN, 12 hours to back up.  The deltas 
are not necessarily very big:  there's just *lots* of files.  And these 
are reasonably fast hosts:  >2GHz Xeon processors, 10k and 15k RPM 
drives, hardware SCSI (and now SAS) RAID controllers, etc.

I want to store the data in multiple places, both on the local LAN and 
in at least 2 remote locations.  That would mean 3 backups.  It's 
probably not going to take 36 hours to do that, but it's going to take a 
*lot* more than 12...

Other times, it's not the host's fault, but the Internet connection. 
Maybe it's a host that's behind a low-end DSL that only offers 768k up 
(or worse).  It's hard enough to get *one* backup done over that, let 
alone two.

So how can I speed this up?

I could use a faster host.  Unfortunately, I've already *got* a pretty 
powerful host, and it is doing *its* job just fine, so why do I want to 
spend multiple thousands of dollars on this?  Short answer:  that is not 
possible.

I could use a faster Internet connection.  Usually, if a faster option 
were available affordably, they'd already have it.  Even a T1, at 
$400/month, only offers 1.5Mb up.  Not a lot.  So getting a dramatically 
faster Internet connection is not possible, either.

The other way to manage this is to distribute the load to multiple 
systems.  By being able to replicate between BackupPC servers, I can 
still limit the number of backups the host must perform to 1 (with a 
local BackupPC server).  The BackupPC server can then take on the load 
of performing multiple time-consuming replications with remote BackupPC 
servers.  I'm not kidding when I say that the task can take a week for 
all I care, as long as it can get one week's worth of backups done 
during that time.  And once one remote BackupPC server has the data, the 
rest can get it over the very fast Internet connections that they have 
between them.  So I only have to get the data across that slow link 
once, and I can still get it to multiple remote locations.

On top of this, the BackupPC server has a much easier task to replicate 
a pool than the host does in the first place.  Pooling has already been 
taken care of.  We *know* which files are new, and which ones are not. 
There are only two things the replication need worry about:  1) 
Transferring the new files and see if they already exist in the new 
pool, and 2) integrating these new files into the remote server's own pool.

By distributing the load, we can get more backups replicated out to more 
places more quickly, with exactly zero increase in load on the most 
important devices in the entire process:  the hosts and their Internet 
connections.  Those are the machines that have "real" work to do, 
servicing real people with real tasks.  I cannot load these machines 
24x7.  The *only* person who cares about the BackupPC machines (until 
something is lost) is me.  They can stay 100% utilized 24x7 for all I care.

Point 2:  Long-term Management of Data
==
With BackupPC, you have a single, intertwined pool that stores data for 
all hosts.  Viewed as a static entity (the data and hosts I need today, 
or even over a couple of weeks), that's fine.  However, over time, I 
envision this getting unwieldy.  As hosts come and go, and as hosts' 
data needs change (usually upward), and as data storage requirements 
increase, this single, solid, unbreakable, indivisible pool still needs 
to be managed.

We are right now envisioning needing 2TB of space to back up a single 
host:  our mail server, which has less than 100GB of data.  The deltas 
on our mail server are currently in the neighborhood of 50GB/day. 
That's because we have 50GB of mail data, and we all receive at least 
one mail a day.  Now, there are things like transaction logs which can 
reduce this, but greatly increase the complexity of restoring individual 
mail files.

This is just the worst host, but far from unique.  We have other servers 
that have multi-GB daily d