Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 Thread Holger Parplies
Hi,

Jeffrey J. Kosowsky wrote on 2009-06-11 00:25:37 -0400 [Re: [BackupPC-users] 
backup the backuppc pool with bacula]:
 Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009:
   Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] 
 backup the backuppc pool with bacula]:
   [...]
   the file list [...] can and has been [optimized] in 3.0 (probably meaning
   protocol version 30, i.e. rsync 3.x on both sides).
 
 Holger, I may be wrong here, but I think that you get the more
 efficient memory usage as long as both client  server are version =3.0 
 even if protocol version is set to  30 (which is true for BackupPC
 where it defaults back to version 28). 

firstly, it's *not* true. BackupPC (as client side rsync) is not
version = 3.0. It's not even really rsync at all, and I doubt File::RsyncP
is more memory efficient than rsync, even if the core code is in C and copied
from rsync.

Secondly, I'm *guessing* that for an incremental file list you'd need a
protocol modification. I understand it that instead of one big file list
comparison done before transfer, 3.0 does partial file list comparisons during
transfer (otherwise it would need to traverse the file tree at least twice,
which is something you'd normally avoid). That would clearly require a
protocol change, wouldn't it?

Actually, I would think that rsync  3.0 *does* need to traverse the file tree
twice, so the change might even have been made because of the wish to speed up
the transfer rather than to decrease the file list size (it does both, of
course, as well as better utilize network bandwidth by starting the transfer
earlier and allowing more parallelism between network I/O and disk I/O -
presuming my assumptions are correct).

 But I'm not an expert and my understanding is that the protocols themselves
 are not well documented other than looking through the source code.

Neither am I. I admit that I haven't even looked for documentation (or at the
source code). It just seems logical to implement it that way.

I can't rule out that the optimization could be possible with the older
protocol versions, but then, why wouldn't rsync have always operated that way?

 and how the rest of the community deals with getting pools of
 100+GB offsite in less than a week of transfer time.

100 Gigs might be feasible - it depends more on the file sizes and how 
many directory entries you have, though.  And you might have to make the 
first copy on-site so subsequently you only have to transfer the changes.
   
   Does anyone actually have experience with rsyncing an existing pool to an
   existing copy (as in: verification of obtaining a correct result)? I'm 
 kind of
   sceptical that pool chain renumbering will be handled correctly. At least, 
 it
   seems extremely complicated to get right.
 
 Why wouldn't rsync -H handle this correctly? 

I'm not saying it doesn't. I'm saying it's complicated. I'm asking whether
anyone has actually verified that it does. I'm asking because it's an
extremely rare corner case that the developers may not have had in mind and
thus may not have tested. The massive usage of hardlinks in a BackupPC pool
clearly is something they did not anticipate (or, at least, feel the need to
implement a solution for). There might be problems that appear only in
conjunction with massive counts of inodes with nlinks  1.

In another thread, an issue was described that *could* have been caused by
this *not* working as expected (maybe crashing rather than doing something
wrong, not sure). It's unclear at the moment, and I'd like to be able to rule
it out on the basis of something more than it should work, so it probably
does.

I'm also saying that pool backups are important enough to verify the contents
by looking closely at the corner cases we are aware of.

 And the renumbering will change the timestamps which should alert rsync to
 all the changes even without the --checksum flag.

This part I'm not sure on. Is it actually *guaranteed* that a rename(2) must
be implemented in terms of unlink(2) and link(2) (but atomically), i.e. that
it must modify the inode change time? The inode is not really changed, except
for the side effect of (atomically) decrementing and re-incrementing the link
count. By virtue of the operation being atomical, the link count is
*guaranteed* not to change, so I, were I to implement a file system, would
feel free to optimize the inode change away (or simply not implement it in
terms of unlink() and link()), unless it is documented somewhere that updating
the inode change time is mandatory (though it really is *not* an inode change,
so I don't see why it should be).

Does rsync even act on the inode change time? File modification time will be
unchanged, obviously. rsync's focus is on the file contents and optionally
keeping the attributes in sync (as far as it can). ctime is an indication that
attributes have been changed (which may mask a content change

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 14:31:02 +0200 on Thursday, June 11, 2009:
  Hi,
  
  Jeffrey J. Kosowsky wrote on 2009-06-11 00:25:37 -0400 [Re: [BackupPC-users] 
  backup the backuppc pool with bacula]:
   Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009:
 Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] 
   backup the backuppc pool with bacula]:
 [...]
 the file list [...] can and has been [optimized] in 3.0 (probably 
   meaning
 protocol version 30, i.e. rsync 3.x on both sides).
   
   Holger, I may be wrong here, but I think that you get the more
   efficient memory usage as long as both client  server are version =3.0 
   even if protocol version is set to  30 (which is true for BackupPC
   where it defaults back to version 28). 
  
  firstly, it's *not* true. BackupPC (as client side rsync) is not
  version = 3.0. It's not even really rsync at all, and I doubt File::RsyncP
  is more memory efficient than rsync, even if the core code is in C and copied
  from rsync.
  
I had (perhaps mistakenly) assumed that BackupPC still used rsync
since at least in the Fedora installation, the rpm requires rsync.

Still, I believe you do get at least some of the advantages of rsync
=3.0 when you have it on the client side at least for the rsyncd
method. In fact, this might explain the following situation:
rsync 2.x and rsync method: Backups hang on certain files
rsync 3.x and rsync method: Backups hang on certain files
rsync 3.x and rsyncd method: Backups always work

Perhaps the combination of rsyncd and rsync 3.x on the client is what
allows taking advantage of some of the benefits of version 3.x.

  Secondly, I'm *guessing* that for an incremental file list you'd need a
  protocol modification. I understand it that instead of one big file list
  comparison done before transfer, 3.0 does partial file list comparisons 
  during
  transfer (otherwise it would need to traverse the file tree at least twice,
  which is something you'd normally avoid). That would clearly require a
  protocol change, wouldn't it?

Maybe not if using rsyncd makes the server into the master so that
it controls the file listing. Stepping back, I think it all depends on
what you define as protocol - if protocol is more about recognized
commands and encoding, then the ordering of file listing may not be
part of the protocol but instead might be more part of the control
structure which could be protocol independent if the control is ceded
to the master side -- i.e., at least some changes to the control
structure could be made without having to coordinate the change with
master and slave. I'm just speculating because there isn't much
documentation that I have been able to find.

  
  Actually, I would think that rsync  3.0 *does* need to traverse the file 
  tree
  twice, so the change might even have been made because of the wish to speed 
  up
  the transfer rather than to decrease the file list size (it does both, of
  course, as well as better utilize network bandwidth by starting the transfer
  earlier and allowing more parallelism between network I/O and disk I/O -
  presuming my assumptions are correct).
  
   But I'm not an expert and my understanding is that the protocols themselves
   are not well documented other than looking through the source code.
  
  Neither am I. I admit that I haven't even looked for documentation (or at the
  source code). It just seems logical to implement it that way.
  
  I can't rule out that the optimization could be possible with the older
  protocol versions, but then, why wouldn't rsync have always operated that 
  way?

You could say the same thing about why wasn't the protocol always that
way ;)

  
   and how the rest of the community deals with getting pools of
   100+GB offsite in less than a week of transfer time.
  
  100 Gigs might be feasible - it depends more on the file sizes and 
   how 
  many directory entries you have, though.  And you might have to make 
   the 
  first copy on-site so subsequently you only have to transfer the 
   changes.
 
 Does anyone actually have experience with rsyncing an existing pool to 
   an
 existing copy (as in: verification of obtaining a correct result)? I'm 
   kind of
 sceptical that pool chain renumbering will be handled correctly. At 
   least, it
 seems extremely complicated to get right.
   
   Why wouldn't rsync -H handle this correctly? 
  
  I'm not saying it doesn't. I'm saying it's complicated. I'm asking whether
  anyone has actually verified that it does. I'm asking because it's an
  extremely rare corner case that the developers may not have had in mind and
  thus may not have tested. The massive usage of hardlinks in a BackupPC pool
  clearly is something they did not anticipate (or, at least, feel the need to
  implement a solution for). There might be problems that appear only in
  conjunction with massive

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 Thread Les Mikesell
Jeffrey J. Kosowsky wrote:
  
 Now that doesn't mean it *couldn't* happen and it doesn't mean we
 shouldn't always be paranoid and test, test, test... but I just don't
 have any good reason to think it would fail algorithmically. Now that
 doesn't mean it couldn't slow down dramatically or run out of memory
 as some have claimed, it just seems unlikely (to me) that it would
 complete without error yet still have some hidden error.

Even if everything is done right it would depend on the source directory 
not changing link targets during the (likely long) transfer process. 
Consider what would happen if a collision chain fixup happens and 
renames pool files after rsync reads the directory list and makes the 
inode mapping table but before the transfers start.

   
And the renumbering will change the timestamps which should alert rsync 
 to
all the changes even without the --checksum flag.
   
   This part I'm not sure on. Is it actually *guaranteed* that a rename(2) 
 must
   be implemented in terms of unlink(2) and link(2) (but atomically), i.e. 
 that
   it must modify the inode change time? The inode is not really changed, 
 except
   for the side effect of (atomically) decrementing and re-incrementing the 
 link
   count. By virtue of the operation being atomical, the link count is
   *guaranteed* not to change, so I, were I to implement a file system, would
   feel free to optimize the inode change away (or simply not implement it in
   terms of unlink() and link()), unless it is documented somewhere that 
 updating
   the inode change time is mandatory (though it really is *not* an inode 
 change,
   so I don't see why it should be).
   
 
 Good catch!!! I hadn't realized that this was implementation
 dependent. It seems that most Unix implementations (including BSD)
 have historically changed the ctime, however, Linux (at least
 ext2/ext3) does not at least as of kernel 2.6.26.6

I sort of recall some arguments about this in the early reiserfs days. 
I guess the cheat and short-circuit side won even though it makes it 
impossible to do a correct incremental backup as expected with any 
ordinary tool (rsync still can but it needs a previous copy and a full 
block checksum comparison).

 In fact, the POSIS/SUS specifications specifically states:
Some implementations mark for update the st_ctime field of renamed
files and some do not. Applications which make use of the st_ctime
field may behave differently with respect to renamed files unless they
are designed to allow for either behavior.
 
 However, it wouldn't be hard to add a touch to the chain renumbering
 routine if you want to be able to identify newly renumbered files. One
 would need to make sure that this doesn't have other unintended side
 effects but I don't think that BackupPC otherwise uses the file mtime.

Or, just do the explicit link/unlink operations to force the filesystem 
to do the right thing with ctime().

   Does rsync even act on the inode change time? 
 No it doesn't. In fact, I have read that most linux systems don't allow
 you to set the ctime to anything other than the current system time.

You shouldn't be able to.  But backup-type operations should be able to 
use it to identify moved files in incrementals.

Or are you saying it would be difficult to do this manually with a
special purpose algorithm that tries to just track changes to the pool
and pc files?
   
   I haven't given that topic much thought. The advantage in a special purpose
   algorithm is that we can make assumptions about the data we are dealing 
 with.
   We shouldn't do this unnecessarily, but if it has notable advantages, then 
 why
   not? Difficult isn't really a point. The question is whether it can be 
 done
   efficiently.
 
 I meant more difficult in terms of being sure to track all special
 cases and that one would have to be careful, not that one shouldn't do
 it.
 
 Personally, I don't like the idea of chain collisions and would have
 preferred using full file md5sums which as I have mentioned earlier
 would not be very costly at least for the rsync/rsyncd transfer
 methods under protocol 30.

And I'd like a quick/cheap way so you could just ignore the pool during 
a copy and rebuild it the same way it was built in the first place 
without thinking twice.  And maybe do things like backing up other 
instances of backuppc archives ignoring their pools and merging them so 
you could restore individual files directly.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:   

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 Thread Adam Goryachev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Les Mikesell wrote:
 Jeffrey J. Kosowsky wrote:
  
 In fact, the POSIS/SUS specifications specifically states:
Some implementations mark for update the st_ctime field of renamed
files and some do not. Applications which make use of the st_ctime
field may behave differently with respect to renamed files unless they
are designed to allow for either behavior.

 However, it wouldn't be hard to add a touch to the chain renumbering
 routine if you want to be able to identify newly renumbered files. One
 would need to make sure that this doesn't have other unintended side
 effects but I don't think that BackupPC otherwise uses the file mtime.
 
 Or, just do the explicit link/unlink operations to force the filesystem 
 to do the right thing with ctime().

As long as the file you are dealing with has nlinks  1 and those other
files don't vanish in between the unlink/link rename is an atomic
operation... unlink + link is not.

 And I'd like a quick/cheap way so you could just ignore the pool during 
 a copy and rebuild it the same way it was built in the first place 
 without thinking twice.  And maybe do things like backing up other 
 instances of backuppc archives ignoring their pools and merging them so 
 you could restore individual files directly.

Would that mean your data transfer is equal to the un-pooled size
though? ie, if you transfer a single pc/hostname directory with 20
full backups, you would need to transfer 20 X size of full backup of
data. When it gets to the other side, you simply add the files from the
first full backup to the pool, and then throw away (and link) the other
19 copies.

Adds simplicity, but does it pose a problem with data sizes being
transferred ?

One optimisation would be to examine the backuppc log, and only send the
files that are not same or some such...

Anyway, I'll get out of the way and allow you to continue, I think you
understand the issue better than me by far ... :)

Regards,
Adam


- --
Adam Goryachev
Website Managers
www.websitemanagers.com.au
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkoxIp0ACgkQGyoxogrTyiUa9ACbBpbwsJjJ5VXJgL9E1K9ZNmNT
ahUAoK5Z+GyGrOk6YYuzIYAWH4ucwBqq
=MwA7
-END PGP SIGNATURE-

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-11 Thread Les Mikesell
Adam Goryachev wrote:
 
 In fact, the POSIS/SUS specifications specifically states:
Some implementations mark for update the st_ctime field of renamed
files and some do not. Applications which make use of the st_ctime
field may behave differently with respect to renamed files unless they
are designed to allow for either behavior.

 However, it wouldn't be hard to add a touch to the chain renumbering
 routine if you want to be able to identify newly renumbered files. One
 would need to make sure that this doesn't have other unintended side
 effects but I don't think that BackupPC otherwise uses the file mtime.
 Or, just do the explicit link/unlink operations to force the filesystem 
 to do the right thing with ctime().
 
 As long as the file you are dealing with has nlinks  1 and those other
 files don't vanish in between the unlink/link rename is an atomic
 operation... unlink + link is not.

But that doesn't matter in this case (and it's link/unlink or you lose 
it).  You are working with the pool file name - and you don't really 
want the contents related to that name to atomically change without 
anything else knowing about it anyway.  Originally, backups weren't 
permitted at the same time as the nightly run to avoid that.  Now there 
must be some kind of locking.

 And I'd like a quick/cheap way so you could just ignore the pool during 
 a copy and rebuild it the same way it was built in the first place 
 without thinking twice.  And maybe do things like backing up other 
 instances of backuppc archives ignoring their pools and merging them so 
 you could restore individual files directly.
 
 Would that mean your data transfer is equal to the un-pooled size
 though? ie, if you transfer a single pc/hostname directory with 20
 full backups, you would need to transfer 20 X size of full backup of
 data. When it gets to the other side, you simply add the files from the
 first full backup to the pool, and then throw away (and link) the other
 19 copies.

I'm not sure if rsync figures out the linked copies on the sending side 
or not.  It at least seems possible, and I've always been able to rsync 
any single pc tree.  Tar would only send one, but it includes one 
instance in each run so each incremental would repeat files you already 
have.

 Adds simplicity, but does it pose a problem with data sizes being
 transferred ?
 
 One optimisation would be to examine the backuppc log, and only send the
 files that are not same or some such...

Some sort of client/server protocol would be needed to get it completely 
right.

-- 
   Les Mikesell
lesmikes...@gmail.com


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-10 Thread Bowie Bailey
jhaglund wrote:
 I'd really like to know the specifics of the hardlink and inode problem 
 talked about in this thread like how to find out how many I have and what the 
 threshold is for Trouble and how the rest of the community deals with getting 
 pools of 100+GB offsite in less than a week of transfer time.
   


I don't know the details on the problem with rsyncing hardlinks.  I just 
know that rsync cannot deal with the number of hardlinks generated by 
BackupPC.

As to how I get my 750GBs of backups offsite...  sneakernet.  :)  I have 
a 3-member raid 1 array with the third member being a removable drive 
enclosure.  When I need an offsite backup, I pull this drive, deliver it 
to a secure storage location, and replace with a new drive.  It only 
takes about 3 hours for the new drive to sync up with the rest of the array.

-- 
Bowie

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-10 Thread Les Mikesell
jhaglund wrote:
 There are several implied references here to likely problems with rsync and 
 how they are all deal breakers.  I've been trying to find a solution to this 
 problem for weeks and have not found any direct documentation or evidence to 
 support what is being said here.  I'm not skeptical, though, I just need to 
 understand what's going on.

It boils down to how much RAM rsync needs to handle all the directory 
entries and hardlinks and the amount of time it takes to wade through 
them.

 Rsync is the only option for me, and I'm rather confused by the other 
 solutions floated in this and other threads.  On-site backup is precarious 
 and viable only in a datacenter type situation imho.  What about the fire 
 scenario?  Getting the data somewhere else is crucial, and in my case I am 
 limited to rsync through rsh.  I'm running rsync 3.0.6 but the server is 
 2.6.x.  I have ~ 1.9 files found by rsync and it always fails on some level.  
 I use -aH but it randomly exits with an unknown error during remote 
 comparison or the initial transfers.  During the transfer phase it says its 
 sending data, but nothing shows up on the server.  The server admins are not 
 aware of any incompatibility with their filesystem and the internet does not 
 seem to deal with this problem, which brings me back to the initial question.

3.x on both ends might help. It claims to not need the whole directory 
in memory at once - but you'll still need to build a table to map all 
the inodes with more than one link  (essentially everything) to 
re-create the hardlinks so you have to throw a lot of RAM at it anyway. 
  You shouldn't actually crash unless you run out of both ram and swap, 
but if you push the system into swap you might as well quit anyway.

Note that if you can do rsync over ssh initiated from the other site, 
you could just run the backuppc server there, or a separate independent 
copy.  Unless you have a lot of duplication among the on-site servers 
there wouldn't be a huge difference in traffic after the initial copy 
and you don't have a single point of failure.

 What does one use if not rsync?

The main alternative is some form of image-copy of the archive 
partition.  This is only practical if you have physical access to the 
server or very fast network connections.

 There's no way to justify or implement backing up the entire pool every time 
 without a lot of bandwidth, which I don't have.  What exactly is rsync's 
 problem?  Do I really need to shut down backuppc every time I want to attempt 
 a sync or would syncing to a local disk and rsync'ing from that be 
 sufficient?  I'd really like to know the specifics of the hardlink and inode 
 problem talked about in this thread like how to find out how many I have and 
 what the threshold is for Trouble and how the rest of the community deals 
 with getting pools of 100+GB offsite in less than a week of transfer time.

100 Gigs might be feasible - it depends more on the file sizes and how 
many directory entries you have, though.  And you might have to make the 
first copy on-site so subsequently you only have to transfer the changes.

 Lots of info requests, I know, but I really appreciate the help.  My ISP and 
 all the experts I've tapped are completely stumped on this one.

The root of the problem is that rsync has to include the entire archive 
in one pass to map the matching hardlinks - and it has to be able to 
hold the directory and inode table in RAM to do it at a usable speed. 
The other limiting issue is that the disk heads have to move around a 
lot to read and re-create all those directory entries and update the 
inode link counts.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-10 Thread Jon Forrest
jhaglund wrote:

 What does one use if not rsync? 

In an admittedly non-backuppc environment I've been
experimenting with using 'rsync -W' (this means
don't use the rsync algorithm) to see if
problems similar to the ones you describe go away.
I'm still not sure of the result.

Using rsync with the -W argument means that
complete files will be transfered instead
of changed pieces. In an environment where
files tend to change completely, or not at
all, it makes sense to try this because it
means that rsync itself has less to do.

I read somewhere that the rsync algorithm is
intended for environments where disk bandwidth
is greater than network bandwidth. That's a good
way to think about it.

Cordially,

-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
jlforr...@berkeley.edu

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables unlimited
royalty-free distribution of the report engine for externally facing 
server and web deployment.
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-10 Thread Holger Parplies
Hi,

Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup 
the backuppc pool with bacula]:
 jhaglund wrote:
  There are several implied references here to likely problems with rsync
  and how they are all deal breakers. [...] I just need to understand what's
  going on.
 
 It boils down to how much RAM rsync needs to handle all the directory 
 entries and hardlinks and the amount of time it takes to wade through 
 them.

... where the important part is the hardlinks (see below), because that simply
can't be optimized, the file list - while probably consuming more memory in
total - can and has been in 3.0 (probably meaning protocol version 30, i.e.
rsync 3.x on both sides).

  I'm running rsync 3.0.6 but the server is 2.6.x.  I have ~ 1.9 files
  found by rsync and it always fails on some level. [...]
 
 3.x on both ends might help. It claims to not need the whole directory 
 in memory at once - but you'll still need to build a table to map all 
 the inodes with more than one link  (essentially everything) to 
 re-create the hardlinks so you have to throw a lot of RAM at it anyway. 

Please read the above carefully. It's not about so many hardlinks (meaning
many links to one pool file), it's about so many files that have more than one
link - whether it's 2 or 32000 is unimportant (except for the size of the
complete file list, which additional hardlinks will make larger). In normal
situations, you have a file with more than one link every now and then. rsync
expects to have to handle a few of them. With a BackupPC pool it's practically
every single file, millions of them or more in some cases. And for each and
every one of them, rsync needs to store (at least) the inode number and the
full path (probably relative to the transfer root) to one link (probably the
first one it encounters, not necessarily the shortest one). Count for yourself:

cpool/1/2/3/12345678911234567892123456789312
pc/foo/0/f%2fhome/fuser/ffoo

pc/hostname/123/f%2fexport%2fhome/fwopp/f.gconf/fapps/fgnome-screensaver/f%25gconf.xml

Round up to a multiple of 8, add maybe 4 bytes of malloc overhead, 4 bytes for
a pointer, and factor in that we're simply not used anymore to optimizing
storage requirements at the byte level.


You're probably going to say, why not simply write that information to
disk/database?.

Reason 1: That's a lot of temporary space you'll need. If it doesn't fit in
  memory, we're talking about GB, not a few KB.
Reason 2: Access to this table will be in random order. It's not a nice linear
  scan. Chances are, you'll need to read from disk almost every time.
  No cache is going to speed this up much, because no cache will be
  large enough or smart enough to know when which information will be
  needed again. The same applies to a database.
Reason 3: rsync is a general purpose tool. It can't determine ahead of time
  how many hardlink entries it will need to handle. It could only
  react to running out of memory. Except for BackupPC pools, it would
  probably *never* need disk storage.

 You shouldn't actually crash unless you run out of both ram and swap, 
 but if you push the system into swap you might as well quit anyway.

This is the same as reason 2. You should realize that disk is not slightly
slower than RAM, it's many orders of magnitude slower. It won't take 2 hours
instead of 1 hour, it will take 1 hours (or more) instead of 1. That is
over one year. Swap works well, as long as your working set fits into RAM.
That is not the case here. [In reality, it might not be quite so dramatic,
but the point is: you don't know. It simply might take a year. Or 10.
Supposing your disks last that long ;-]

  What does one use if not rsync?
 
 The main alternative is some form of image-copy of the archive 
 partition.  This is only practical if you have physical access to the 
 server or very fast network connections.

Physical access probably meaning, that you can transport your copy to and
from the server. Never underestimate the bandwidth of a station waggon full
of tapes hurtling down the highway. (Andrew S. Tanenbaum).

  Do I really need to shut down backuppc every time I want to attempt a
  sync or would syncing to a local disk and rsync'ing from that be
  sufficient?

Try something like time find /var/lib/backuppc -ls  /dev/null to get a
feeling for just how long only traversing the BackupPC pool and doing a stat()
on each file really takes. Then remember that syncing to a local disk is in
no way simpler than syncing to a remote disk - the bandwidth for copying is
simply higher, so that is the only place you get a speedup.

From a different perspective: either it's going to be fast enough that
shutting down BackupPC won't hurt, or it's going to be *necessary* to shut
down BackupPC, because having it modify the file system would hurt.

Just imagine the pc/ directory links on your copy would point

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-10 Thread Jeffrey J. Kosowsky
Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009:
  Hi,
  
  Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup 
  the backuppc pool with bacula]:
   jhaglund wrote:
There are several implied references here to likely problems with rsync
and how they are all deal breakers. [...] I just need to understand 
what's
going on.
   
   It boils down to how much RAM rsync needs to handle all the directory 
   entries and hardlinks and the amount of time it takes to wade through 
   them.
  
  ... where the important part is the hardlinks (see below), because that 
  simply
  can't be optimized, the file list - while probably consuming more memory in
  total - can and has been in 3.0 (probably meaning protocol version 30, i.e.
  rsync 3.x on both sides).
  

Holger, I may be wrong here, but I think that you get the more
efficient memory usage as long as both client  server are version =3.0 
even if protocol version is set to  30 (which is true for BackupPC
where it defaults back to version 28). 

I think protocol 30 has more to do with the changes from md4sums to
md5sums plus the ability to have longer file names (255 characters I
think) plus other protocol extensions. But I'm not an expert and my
understanding is that the protocols themselves are not well documented
other than looking through the source code.

I'm running rsync 3.0.6 but the server is 2.6.x.  I have ~ 1.9 files
found by rsync and it always fails on some level. [...]
   
   3.x on both ends might help. It claims to not need the whole directory 
   in memory at once - but you'll still need to build a table to map all 
   the inodes with more than one link  (essentially everything) to 
   re-create the hardlinks so you have to throw a lot of RAM at it anyway. 
  
  Please read the above carefully. It's not about so many hardlinks (meaning
  many links to one pool file), it's about so many files that have more than 
  one
  link - whether it's 2 or 32000 is unimportant (except for the size of the
  complete file list, which additional hardlinks will make larger). In normal
  situations, you have a file with more than one link every now and then. rsync
  expects to have to handle a few of them. With a BackupPC pool it's 
  practically
  every single file, millions of them or more in some cases. And for each and
  every one of them, rsync needs to store (at least) the inode number and the
  full path (probably relative to the transfer root) to one link (probably the
  first one it encounters, not necessarily the shortest one). Count for 
  yourself:
  
   cpool/1/2/3/12345678911234567892123456789312
   pc/foo/0/f%2fhome/fuser/ffoo
  
  pc/hostname/123/f%2fexport%2fhome/fwopp/f.gconf/fapps/fgnome-screensaver/f%25gconf.xml
  
  Round up to a multiple of 8, add maybe 4 bytes of malloc overhead, 4 bytes 
  for
  a pointer, and factor in that we're simply not used anymore to optimizing
  storage requirements at the byte level.
  
  
  You're probably going to say, why not simply write that information to
  disk/database?.
  
  Reason 1: That's a lot of temporary space you'll need. If it doesn't fit in
memory, we're talking about GB, not a few KB.
  Reason 2: Access to this table will be in random order. It's not a nice 
  linear
scan. Chances are, you'll need to read from disk almost every time.
No cache is going to speed this up much, because no cache will be
large enough or smart enough to know when which information will be
needed again. The same applies to a database.
  Reason 3: rsync is a general purpose tool. It can't determine ahead of time
how many hardlink entries it will need to handle. It could only
react to running out of memory. Except for BackupPC pools, it would
probably *never* need disk storage.
  
   You shouldn't actually crash unless you run out of both ram and swap, 
   but if you push the system into swap you might as well quit anyway.
  
  This is the same as reason 2. You should realize that disk is not slightly
  slower than RAM, it's many orders of magnitude slower. It won't take 2 hours
  instead of 1 hour, it will take 1 hours (or more) instead of 1. That is
  over one year. Swap works well, as long as your working set fits into RAM.
  That is not the case here. [In reality, it might not be quite so dramatic,
  but the point is: you don't know. It simply might take a year. Or 10.
  Supposing your disks last that long ;-]
  
What does one use if not rsync?
   
   The main alternative is some form of image-copy of the archive 
   partition.  This is only practical if you have physical access to the 
   server or very fast network connections.
  
  Physical access probably meaning, that you can transport your copy to and
  from the server. Never underestimate the bandwidth of a station waggon full
  of tapes hurtling down the highway

Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-02 Thread Tino Schwarze
On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote:

 Is the blockdevel-level rsync-like solution going to be something 
 publicly available? 

blockdev-level rsync smells like drbd. I'm not sure whether it support
such huge amounts of unsynchronized data, but it might just be a matter
of configuration.

Tino.

-- 
What we nourish flourishes. - Was wir nähren erblüht.

www.lichtkreis-chemnitz.de
www.craniosacralzentrum.de

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-02 Thread Pieter Wuille
On Tue, Jun 02, 2009 at 11:44:11AM +0200, Tino Schwarze wrote:
 On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote:
 
  Is the blockdevel-level rsync-like solution going to be something 
  publicly available? 

We certainly intend to, but no guarantee it ever gets finished. Except for
implementation there's not that much work left, but we do it in our free
time.

It really seems strange something like that doesn't exist yet (and rsync
itself doesn't support blockdevices).

 blockdev-level rsync smells like drbd. I'm not sure whether it support
 such huge amounts of unsynchronized data, but it might just be a matter
 of configuration.

In fact, i'd say LVM should be able to do this: generate a blocklevel-diff
between two snapshots of the same volume, and create a new snapshot/volume
based on an old one + a diff. Eg. ZFS supports this using send/receive.
So far, i haven't read about support for such a feature.
On the other hand, i think i've read on this list that using zfs send/receive
for backuppc pools was very slow (but that's on a filesystem level, not
blockdev level).

Drdb might be a solution too - i haven't looked at it closely. It seems
more meant for high availability, probably it can be used for offsite-backup
too. It has support for recovery after disconnection/failure, so maybe you can
use it to keep older versions on a remote system by forcibly disconnecting the
nodes. I don't know how easy it would be to migrate a non-drdb volume either.
Anyone experience with this in combination with backuppc?

-- 
Pieter

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-01 Thread Pieter Wuille
On Sun, May 31, 2009 at 11:22:13AM -0400, Stephane Rouleau wrote:
 Pieter Wuille wrote:
  
  This is how we handle backups of the backuppc pool:
  * the pool itself is on a LUKS-encrypted XFS filesystem, on a LVM volume, 
  on a
software RAID1 of 2 1TB disks.
  * twice a week following procedure in run:
* Freeze the XFS filesystem, sync, lvm-snapshot the encrypted volume
* Unfreeze 
* send the snapshot over ssh to an offsite server (which thus only ever 
  sees
  the encrypted data)
* remove the snapshot
  * The offsite server has 2 smaller disks (not in RAID), and snapshots are 
  sent
in turn to one and to the other. This means we still have a complete pool 
  if
something would goes wrong during the transfer (which takes +- a day)
  * The consistency of the offsite backups can be verified by exporting them
over NBD (network block device), and mounting them on the
normal backup server (which has the encryption keys)
  
  We use a blockdevice-based solution instead of a filesystem-based one, 
  because
  the many small files (16 million inodes and growing) makes those very disk-
  and cpu intensive. (simply doing a find | wc -l in the root takes hours).
  Furthermore it makes encryption easier.
  We are also working on a rsync-like system for block devices (yet that might
  still take some time...), which would bring the time for synchronising the
  backup server with the offsite one down to 1-2 hours.
  
  Greetz,
  
 
 Pieter,
 
 This sounds rather close to what I'd like to have over the coming months.  I 
 just recently reset our backup pool, and rather stupidly did not select an 
 encrypted filesystem (Otherwise we're on XFS, LVM, RAID1 2x1.5TB).  Figured 
 I'd encrypt the offsite only, but I see now that it'd be much better to send 
 data at the block level.
 
 You mention the capacity of your pool file system, but how much space is 
 typically used on it?  Curious also what kind of connection speed you have 
 with your offsite backup solution.

Some numbers:
* backup server has 1TB of RAID1 storage
  * contains amonst others a 400GiB XFS volume for backuppc
* daily/weekly backups of +- 195GiB of data
* contains 256GiB of backups (expected to increase significantly still)
* contains 16.8 million inodes
* according to LVM snapshot usage, avg. 1.5 GiB of data blocks change on 
this volume daily
* offsite backup server has 2x 500GB of non-RAID storage
  * twice a week, the whole 400GiB volume is sent over a 100Mbps connection (at 
+- 8.1MiB/s)
* that's a huge waste for maybe 5GiB of changed data, but the bandwidth is 
generously provided by the university
* we hope to have a more efficient blockdevice-level synchronisation system 
in a few months

PS: sorry for the strange subject earlier - i used a wrong 'from' address first 
and forwared it

-- 
Pieter

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA,  Big Spaceship. http://p.sf.net/sfu/creativitycat-com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-06-01 Thread Stephane Rouleau
Thanks Pieter,

Is the blockdevel-level rsync-like solution going to be something 
publicly available? 

Stephane

Pieter Wuille wrote:
 On Sun, May 31, 2009 at 11:22:13AM -0400, Stephane Rouleau wrote:
   
 Pieter Wuille wrote:
 
 This is how we handle backups of the backuppc pool:
 * the pool itself is on a LUKS-encrypted XFS filesystem, on a LVM volume, 
 on a
   software RAID1 of 2 1TB disks.
 * twice a week following procedure in run:
   * Freeze the XFS filesystem, sync, lvm-snapshot the encrypted volume
   * Unfreeze 
   * send the snapshot over ssh to an offsite server (which thus only ever 
 sees
 the encrypted data)
   * remove the snapshot
 * The offsite server has 2 smaller disks (not in RAID), and snapshots are 
 sent
   in turn to one and to the other. This means we still have a complete pool 
 if
   something would goes wrong during the transfer (which takes +- a day)
 * The consistency of the offsite backups can be verified by exporting them
   over NBD (network block device), and mounting them on the
   normal backup server (which has the encryption keys)
 
 We use a blockdevice-based solution instead of a filesystem-based one, 
 because
 the many small files (16 million inodes and growing) makes those very disk-
 and cpu intensive. (simply doing a find | wc -l in the root takes hours).
 Furthermore it makes encryption easier.
 We are also working on a rsync-like system for block devices (yet that might
 still take some time...), which would bring the time for synchronising the
 backup server with the offsite one down to 1-2 hours.

 Greetz,

   
 Pieter,

 This sounds rather close to what I'd like to have over the coming months.  I 
 just recently reset our backup pool, and rather stupidly did not select an 
 encrypted filesystem (Otherwise we're on XFS, LVM, RAID1 2x1.5TB).  Figured 
 I'd encrypt the offsite only, but I see now that it'd be much better to send 
 data at the block level.

 You mention the capacity of your pool file system, but how much space is 
 typically used on it?  Curious also what kind of connection speed you have 
 with your offsite backup solution.
 

 Some numbers:
 * backup server has 1TB of RAID1 storage
   * contains amonst others a 400GiB XFS volume for backuppc
 * daily/weekly backups of +- 195GiB of data
 * contains 256GiB of backups (expected to increase significantly still)
 * contains 16.8 million inodes
 * according to LVM snapshot usage, avg. 1.5 GiB of data blocks change on 
 this volume daily
 * offsite backup server has 2x 500GB of non-RAID storage
   * twice a week, the whole 400GiB volume is sent over a 100Mbps connection 
 (at +- 8.1MiB/s)
 * that's a huge waste for maybe 5GiB of changed data, but the bandwidth 
 is generously provided by the university
 * we hope to have a more efficient blockdevice-level synchronisation 
 system in a few months

 PS: sorry for the strange subject earlier - i used a wrong 'from' address 
 first and forwared it

   

--
OpenSolaris 2009.06 is a cutting edge operating system for enterprises 
looking to deploy the next generation of Solaris that includes the latest 
innovations from Sun and the OpenSource community. Download a copy and 
enjoy capabilities such as Networking, Storage and Virtualization. 
Go to: http://p.sf.net/sfu/opensolaris-get
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-25 Thread Boniforti Flavio

 I have one system where I do backups of backuppc to tape for 
 disaster recover. Here's the system I use:
 - Stop backuppc to quiesce the filesystem. LVM snapshots are 
 not sufficient
   for this, because the disk load gets too high, data flow 
 rate gets too low,
   and the tape starts to 'shoeshine'.
 - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB 
 tape drive,
   which is sufficient to store the backuppc data from that machine.
 - In case you're curious, the tar command I use is:
 tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 
 /var/lib/backuppc/
 21  $LOGFILE
 - Restart backuppc.

Wow, thanks.
I do not have tape drives available, but would it also work if I'd be
doing this same procedure onto an external USB HDD?

Thanks,
F.

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-25 Thread Boniforti Flavio

 How often do you want to backup the server?  What about using 
 rsync to backup the pool?  I want to mirror my primary backup 
 server to another system daily so I can switch to my backup 
 server quickly.  I was going to use rsync for this.

Well, backing it up once a day would be ok for me too. You will use
rsync because it handles hardlinks, am I right?
My goal is the same as yours: switch immediately onto another server if
my BackupPC server dies.

Can you tell us about your results when you set it up?

Thanks,
F.

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-25 Thread Les Mikesell
Boniforti Flavio wrote:
 I have one system where I do backups of backuppc to tape for 
 disaster recover. Here's the system I use:
 - Stop backuppc to quiesce the filesystem. LVM snapshots are 
 not sufficient
   for this, because the disk load gets too high, data flow 
 rate gets too low,
   and the tape starts to 'shoeshine'.
 - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB 
 tape drive,
   which is sufficient to store the backuppc data from that machine.
 - In case you're curious, the tar command I use is:
 tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 
 /var/lib/backuppc/
 21  $LOGFILE
 - Restart backuppc.
 
 Wow, thanks.
 I do not have tape drives available, but would it also work if I'd be
 doing this same procedure onto an external USB HDD?

Yes, but you have to test a restore of a typically-sized archive before 
deciding that this method is suitable for your purpose. While it may 
take a few hours or less to make the tar copy, it will likely take at 
least a few days to restore a usable disk copy will all the hardlinks.

On the other hand, if your external disk has as much space as the live 
partition, you can stop backuppc, unmount the partition, and image-copy 
to a matching partition on the external drive and have a copy that is 
directly usable.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-25 Thread Les Mikesell
Boniforti Flavio wrote:
 How often do you want to backup the server?  What about using 
 rsync to backup the pool?  I want to mirror my primary backup 
 server to another system daily so I can switch to my backup 
 server quickly.  I was going to use rsync for this.
 
 Well, backing it up once a day would be ok for me too. You will use
 rsync because it handles hardlinks, am I right?
 My goal is the same as yours: switch immediately onto another server if
 my BackupPC server dies.
 
 Can you tell us about your results when you set it up?

If you want a 'live' backup on a nearby machine you might look at drbd 
which is sort of like raid over the network.  If you don't need 
auto-failover, though, you can get fairly high availability with normal 
RAID1 in a chassis with swappable drives.  In the fairly likely event of 
a single disk failure, you just swap in a new drive and rebuild the 
mirror.  In the less likely event of a motherboard component failure you 
  yank the drives and move them to a spare chassis that you've kept for 
that purpose.  It is still a good idea to have offsite copies of the 
archive in case of a building disaster or a software or operator error 
that destroys the running copy.

Rsync may work - depending on the size of your archive and the amount of 
RAM you have.

-- 
  Les Mikesell
   lesmikes...@gmail.com

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-21 Thread Dan Pritts
On Tue, May 19, 2009 at 11:35:36AM -0500, Les Mikesell wrote:
 Have you ever restored one of these tapes, and if so, how long did it 
 take?  As a wild guess, I'd expect a couple of days where an image copy 
 would be an hour or two.

I did this with solaris ufsdump/ufsrestore once.

Making the tape took one or two hours.

I gave up and cancelled the restore after 24 hours.

danno

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-21 Thread Rob Terhaar
On Tue, May 19, 2009 at 12:27 PM, Carl Wilhelm Soderstrom
chr...@real-time.com wrote:
 On 05/19 12:04 , Tim Cole wrote:
 How often do you want to backup the server?  What about using rsync to
 backup the pool?  I want to mirror my primary backup server to another
 system daily so I can switch to my backup server quickly.  I was going
 to use rsync for this.

 Rsync memory requirements have historically been too high for this.
 I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap;
 and it crushed the box to the point where I had to power-cycle it.

 Best thing to do is set up a duplicate backuppc server that independently
 backs up the hosts you want to have a redundant backup for.


Try Rsync v3, it has much lower memory requirements since it builds
the file list incrementally. I used it at one company to do nightly
syncs of their ~4TB backuppc pool offsite.

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-21 Thread Holger Parplies
Hi,

Rob Terhaar wrote on 2009-05-21 13:01:58 -0400 [Re: [BackupPC-users] backup the 
backuppc pool with bacula]:
 [...]
 Try Rsync v3, it has much lower memory requirements since it builds
 the file list incrementally.

by all means, try it. But it's not the file list that is the specific problem
of BackupPC.

 I used it at one company to do nightly
 syncs of their ~4TB backuppc pool offsite.

It's still a matter of file count (used inodes, to be exact), not pool storage
size. rsync V3 may perform significantly better if you have many links to
comparatively few inodes, but if you have many inodes (for some unknown value
of many), I am still convinced that you will hit a problem. Feel free to
convince me otherwise, but works for me is unlikely to succeed ;-).

Regards,
Holger

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-21 Thread Rob Terhaar
Good point Holger- I don't have hardlink counts or stats, but it
should only take about 10mns of work to download/compile rsync3, and
run a benchmark on your pool ;)

On 5/21/09, Holger Parplies wb...@parplies.de wrote:
 Hi,

 Rob Terhaar wrote on 2009-05-21 13:01:58 -0400 [Re: [BackupPC-users] backup
 the backuppc pool with bacula]:
 [...]
 Try Rsync v3, it has much lower memory requirements since it builds
 the file list incrementally.

 by all means, try it. But it's not the file list that is the specific
 problem
 of BackupPC.

 I used it at one company to do nightly
 syncs of their ~4TB backuppc pool offsite.

 It's still a matter of file count (used inodes, to be exact), not pool
 storage
 size. rsync V3 may perform significantly better if you have many links to
 comparatively few inodes, but if you have many inodes (for some unknown
 value
 of many), I am still convinced that you will hit a problem. Feel free to
 convince me otherwise, but works for me is unlikely to succeed ;-).

 Regards,
 Holger


-- 
Sent from my mobile device

--
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT
is a gathering of tech-side developers  brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing,  
iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian
Group, R/GA,  Big Spaceship. http://www.creativitycat.com 
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Boniforti Flavio

 Hi,
 
 there is a regular discussion on how to backup/move/copy the 
 backuppc pool. Did anyone try to backup the pool with bacula?

Hello there...

I don't know about bacula, but would like myself also to get a backup of
the BackupPC server: anybody got some suggestions and practical
examples?

Thanks,
F.

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Tim Cole


Boniforti Flavio wrote:
 Hi,

 there is a regular discussion on how to backup/move/copy the 
 backuppc pool. Did anyone try to backup the pool with bacula?
 

 Hello there...

 I don't know about bacula, but would like myself also to get a backup of
 the BackupPC server: anybody got some suggestions and practical
 examples?

 Thanks,
 F.

   
How often do you want to backup the server?  What about using rsync to 
backup the pool?  I want to mirror my primary backup server to another 
system daily so I can switch to my backup server quickly.  I was going 
to use rsync for this.
 --
 Crystal Reports - New Free Runtime and 30 Day Trial
 Check out the new simplified licensing option that enables 
 unlimited royalty-free distribution of the report engine 
 for externally facing server and web deployment. 
 http://p.sf.net/sfu/businessobjects
 ___
 BackupPC-users mailing list
 BackupPC-users@lists.sourceforge.net
 List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
 Wiki:http://backuppc.wiki.sourceforge.net
 Project: http://backuppc.sourceforge.net/
   
 


   

/
/

 

 

 


--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Les Mikesell
Boniforti Flavio wrote:
 Hi,

 there is a regular discussion on how to backup/move/copy the 
 backuppc pool. Did anyone try to backup the pool with bacula?
 
 Hello there...
 
 I don't know about bacula, but would like myself also to get a backup of
 the BackupPC server: anybody got some suggestions and practical
 examples?

I think the only way to do it at a reasonable speed is to unmount the 
partition where the archive is stored and image-copy it to an 
equal-sized partition.  Or, if you created it as a RAID1 with a missing 
mirror, you can add/sync a mirror while mounted - but you still won't 
have any performance if you try to do anything else for the duration of 
the copy and you need to unmount momentarily to get a clean filesystem 
as you fail/remove the mirror.

Some people are copying smaller archives with rsync (-aH), and the 
newest version of rsync is supposed to handle the hardlinks more 
efficiently.  You can always try that and see how long it takes.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Carl Wilhelm Soderstrom
On 05/19 05:51 , Boniforti Flavio wrote:
 I don't know about bacula, but would like myself also to get a backup of
 the BackupPC server: anybody got some suggestions and practical
 examples?

I have one system where I do backups of backuppc to tape for disaster
recover. Here's the system I use:
- Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient
  for this, because the disk load gets too high, data flow rate gets too low,
  and the tape starts to 'shoeshine'.
- Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive,
  which is sufficient to store the backuppc data from that machine.
- In case you're curious, the tar command I use is:
tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/
21  $LOGFILE
- Restart backuppc.

-- 
Carl Soderstrom
Systems Administrator
Real-Time Enterprises
www.real-time.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Carl Wilhelm Soderstrom
On 05/19 12:04 , Tim Cole wrote:
 How often do you want to backup the server?  What about using rsync to 
 backup the pool?  I want to mirror my primary backup server to another 
 system daily so I can switch to my backup server quickly.  I was going 
 to use rsync for this.

Rsync memory requirements have historically been too high for this.
I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap;
and it crushed the box to the point where I had to power-cycle it.

Best thing to do is set up a duplicate backuppc server that independently
backs up the hosts you want to have a redundant backup for.

-- 
Carl Soderstrom
Systems Administrator
Real-Time Enterprises
www.real-time.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Les Mikesell
Carl Wilhelm Soderstrom wrote:
 On 05/19 05:51 , Boniforti Flavio wrote:
 I don't know about bacula, but would like myself also to get a backup of
 the BackupPC server: anybody got some suggestions and practical
 examples?
 
 I have one system where I do backups of backuppc to tape for disaster
 recover. Here's the system I use:
 - Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient
   for this, because the disk load gets too high, data flow rate gets too low,
   and the tape starts to 'shoeshine'.
 - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive,
   which is sufficient to store the backuppc data from that machine.
 - In case you're curious, the tar command I use is:
 tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/
 21  $LOGFILE
 - Restart backuppc.

Have you ever restored one of these tapes, and if so, how long did it 
take?  As a wild guess, I'd expect a couple of days where an image copy 
would be an hour or two.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Les Mikesell
Carl Wilhelm Soderstrom wrote:
 On 05/19 12:04 , Tim Cole wrote:
 How often do you want to backup the server?  What about using rsync to 
 backup the pool?  I want to mirror my primary backup server to another 
 system daily so I can switch to my backup server quickly.  I was going 
 to use rsync for this.
 
 Rsync memory requirements have historically been too high for this.
 I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap;
 and it crushed the box to the point where I had to power-cycle it.
 
 Best thing to do is set up a duplicate backuppc server that independently
 backs up the hosts you want to have a redundant backup for.

These days, I'm not sure 512MB RAM and 'server' belong in the same 
sentence, but it is still hard to deal with the disk head motion needed 
to traverse and recreate all those hardlinks splattered more or less 
randomly across the disk.

-- 
   Les Mikesell
lesmikes...@gmail.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Carl Wilhelm Soderstrom
On 05/19 11:35 , Les Mikesell wrote:
 Have you ever restored one of these tapes, and if so, how long did it 
 take?  As a wild guess, I'd expect a couple of days where an image copy 
 would be an hour or two.

I think it's about 8 hours to create the tape (whereas when using LVM
snapshots it took almost 2 days). I've scanned and recovered files from the
tape, and it takes a few hours, depending on where on the tape the file
is. I've never convinced the company to do a full test restore (we're
contractors, and doing that test would cost more money); but I have
scanned the tapes and checked their integrity, and that takes a healthy long 
time.

An image copy would indeed be much faster. I prefer a file-level backup
rather that filesystem-level backup simply because recovery in case of
corruption is much better.

-- 
Carl Soderstrom
Systems Administrator
Real-Time Enterprises
www.real-time.com

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Holger Parplies
Hi,

Carl Wilhelm Soderstrom wrote on 2009-05-19 11:54:16 -0500 [Re: 
[BackupPC-users] backup the backuppc pool with bacula]:
 On 05/19 11:35 , Les Mikesell wrote:
  Have you ever restored one of these tapes, and if so, how long did it 
  take?  As a wild guess, I'd expect a couple of days where an image copy 
  would be an hour or two.
 
 I think it's about 8 hours to create the tape (whereas when using LVM
 snapshots it took almost 2 days). I've scanned and recovered files from the
 tape, and it takes a few hours, depending on where on the tape the file
 is. I've never convinced the company to do a full test restore (we're
 contractors, and doing that test would cost more money); but I have
 scanned the tapes and checked their integrity, and that takes a healthy long 
 time.
 
 An image copy would indeed be much faster. I prefer a file-level backup
 rather that filesystem-level backup simply because recovery in case of
 corruption is much better.

it really depends on what you want to do in case of disaster.

1.) Restore the pool
Forget it.

2.) Restore files from previous backups or even single whole backups
No problem.
[Though I have no idea how tar handles restoring a subset of the archive
 that is made up of hardlinks to various files outside this subset. This
 may well prove not to work or require insane amounts of tape seeks - if
 tar seeks on its input at all.]

With an image level copy it's the other way around, but I agree that I
wouldn't trust tape media very far. Has anyone invented tape-RAID-6 yet? :)

Regards,
Holger

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/


Re: [BackupPC-users] backup the backuppc pool with bacula

2009-05-19 Thread Holger Parplies
Hi,

Les Mikesell wrote on 2009-05-19 11:12:25 -0500 [Re: [BackupPC-users] backup 
the backuppc pool with bacula]:
 [...] the newest version of rsync is supposed to handle the hardlinks more 
 efficiently.

reason suggests that this is an urban myth. The newest version of rsync
handles *large file lists* better, not *hardlinks*. To handle hardlinks
better, you would almost certainly need to create a temporary file (which
can easily be several *GB* in size in our cases). I somehow doubt any general
purpose tool would dare do that (let alone find a spot where it can - my /tmp
simply isn't large enough). The issue is that the temporary file will either
be very large or unneeded, and an algorithm able to handle extreme numbers of
hardlinks will probably be slow in the overwhelming majority of cases with
very few hardlinks.

  You can always try that and see how long it takes.

That you can. Just remember that your space usage (and hardlink counts) will
grow over time. How does the time it takes grow in proportion to space and/or
hardlink counts? At some random point it will stop working. You may never
reach that point. But what if you do? Can you simply find another solution
then? How long can you keep your pool offline for the copy process? At what
point do you abort the copy process? How can you monitor its progress?

Just some things to think about ...

Regards,
Holger

--
Crystal Reports - New Free Runtime and 30 Day Trial
Check out the new simplified licensing option that enables 
unlimited royalty-free distribution of the report engine 
for externally facing server and web deployment. 
http://p.sf.net/sfu/businessobjects
___
BackupPC-users mailing list
BackupPC-users@lists.sourceforge.net
List:https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/