Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, Jeffrey J. Kosowsky wrote on 2009-06-11 00:25:37 -0400 [Re: [BackupPC-users] backup the backuppc pool with bacula]: Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009: Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: [...] the file list [...] can and has been [optimized] in 3.0 (probably meaning protocol version 30, i.e. rsync 3.x on both sides). Holger, I may be wrong here, but I think that you get the more efficient memory usage as long as both client server are version =3.0 even if protocol version is set to 30 (which is true for BackupPC where it defaults back to version 28). firstly, it's *not* true. BackupPC (as client side rsync) is not version = 3.0. It's not even really rsync at all, and I doubt File::RsyncP is more memory efficient than rsync, even if the core code is in C and copied from rsync. Secondly, I'm *guessing* that for an incremental file list you'd need a protocol modification. I understand it that instead of one big file list comparison done before transfer, 3.0 does partial file list comparisons during transfer (otherwise it would need to traverse the file tree at least twice, which is something you'd normally avoid). That would clearly require a protocol change, wouldn't it? Actually, I would think that rsync 3.0 *does* need to traverse the file tree twice, so the change might even have been made because of the wish to speed up the transfer rather than to decrease the file list size (it does both, of course, as well as better utilize network bandwidth by starting the transfer earlier and allowing more parallelism between network I/O and disk I/O - presuming my assumptions are correct). But I'm not an expert and my understanding is that the protocols themselves are not well documented other than looking through the source code. Neither am I. I admit that I haven't even looked for documentation (or at the source code). It just seems logical to implement it that way. I can't rule out that the optimization could be possible with the older protocol versions, but then, why wouldn't rsync have always operated that way? and how the rest of the community deals with getting pools of 100+GB offsite in less than a week of transfer time. 100 Gigs might be feasible - it depends more on the file sizes and how many directory entries you have, though. And you might have to make the first copy on-site so subsequently you only have to transfer the changes. Does anyone actually have experience with rsyncing an existing pool to an existing copy (as in: verification of obtaining a correct result)? I'm kind of sceptical that pool chain renumbering will be handled correctly. At least, it seems extremely complicated to get right. Why wouldn't rsync -H handle this correctly? I'm not saying it doesn't. I'm saying it's complicated. I'm asking whether anyone has actually verified that it does. I'm asking because it's an extremely rare corner case that the developers may not have had in mind and thus may not have tested. The massive usage of hardlinks in a BackupPC pool clearly is something they did not anticipate (or, at least, feel the need to implement a solution for). There might be problems that appear only in conjunction with massive counts of inodes with nlinks 1. In another thread, an issue was described that *could* have been caused by this *not* working as expected (maybe crashing rather than doing something wrong, not sure). It's unclear at the moment, and I'd like to be able to rule it out on the basis of something more than it should work, so it probably does. I'm also saying that pool backups are important enough to verify the contents by looking closely at the corner cases we are aware of. And the renumbering will change the timestamps which should alert rsync to all the changes even without the --checksum flag. This part I'm not sure on. Is it actually *guaranteed* that a rename(2) must be implemented in terms of unlink(2) and link(2) (but atomically), i.e. that it must modify the inode change time? The inode is not really changed, except for the side effect of (atomically) decrementing and re-incrementing the link count. By virtue of the operation being atomical, the link count is *guaranteed* not to change, so I, were I to implement a file system, would feel free to optimize the inode change away (or simply not implement it in terms of unlink() and link()), unless it is documented somewhere that updating the inode change time is mandatory (though it really is *not* an inode change, so I don't see why it should be). Does rsync even act on the inode change time? File modification time will be unchanged, obviously. rsync's focus is on the file contents and optionally keeping the attributes in sync (as far as it can). ctime is an indication that attributes have been changed (which may mask a content change
Re: [BackupPC-users] backup the backuppc pool with bacula
Holger Parplies wrote at about 14:31:02 +0200 on Thursday, June 11, 2009: Hi, Jeffrey J. Kosowsky wrote on 2009-06-11 00:25:37 -0400 [Re: [BackupPC-users] backup the backuppc pool with bacula]: Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009: Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: [...] the file list [...] can and has been [optimized] in 3.0 (probably meaning protocol version 30, i.e. rsync 3.x on both sides). Holger, I may be wrong here, but I think that you get the more efficient memory usage as long as both client server are version =3.0 even if protocol version is set to 30 (which is true for BackupPC where it defaults back to version 28). firstly, it's *not* true. BackupPC (as client side rsync) is not version = 3.0. It's not even really rsync at all, and I doubt File::RsyncP is more memory efficient than rsync, even if the core code is in C and copied from rsync. I had (perhaps mistakenly) assumed that BackupPC still used rsync since at least in the Fedora installation, the rpm requires rsync. Still, I believe you do get at least some of the advantages of rsync =3.0 when you have it on the client side at least for the rsyncd method. In fact, this might explain the following situation: rsync 2.x and rsync method: Backups hang on certain files rsync 3.x and rsync method: Backups hang on certain files rsync 3.x and rsyncd method: Backups always work Perhaps the combination of rsyncd and rsync 3.x on the client is what allows taking advantage of some of the benefits of version 3.x. Secondly, I'm *guessing* that for an incremental file list you'd need a protocol modification. I understand it that instead of one big file list comparison done before transfer, 3.0 does partial file list comparisons during transfer (otherwise it would need to traverse the file tree at least twice, which is something you'd normally avoid). That would clearly require a protocol change, wouldn't it? Maybe not if using rsyncd makes the server into the master so that it controls the file listing. Stepping back, I think it all depends on what you define as protocol - if protocol is more about recognized commands and encoding, then the ordering of file listing may not be part of the protocol but instead might be more part of the control structure which could be protocol independent if the control is ceded to the master side -- i.e., at least some changes to the control structure could be made without having to coordinate the change with master and slave. I'm just speculating because there isn't much documentation that I have been able to find. Actually, I would think that rsync 3.0 *does* need to traverse the file tree twice, so the change might even have been made because of the wish to speed up the transfer rather than to decrease the file list size (it does both, of course, as well as better utilize network bandwidth by starting the transfer earlier and allowing more parallelism between network I/O and disk I/O - presuming my assumptions are correct). But I'm not an expert and my understanding is that the protocols themselves are not well documented other than looking through the source code. Neither am I. I admit that I haven't even looked for documentation (or at the source code). It just seems logical to implement it that way. I can't rule out that the optimization could be possible with the older protocol versions, but then, why wouldn't rsync have always operated that way? You could say the same thing about why wasn't the protocol always that way ;) and how the rest of the community deals with getting pools of 100+GB offsite in less than a week of transfer time. 100 Gigs might be feasible - it depends more on the file sizes and how many directory entries you have, though. And you might have to make the first copy on-site so subsequently you only have to transfer the changes. Does anyone actually have experience with rsyncing an existing pool to an existing copy (as in: verification of obtaining a correct result)? I'm kind of sceptical that pool chain renumbering will be handled correctly. At least, it seems extremely complicated to get right. Why wouldn't rsync -H handle this correctly? I'm not saying it doesn't. I'm saying it's complicated. I'm asking whether anyone has actually verified that it does. I'm asking because it's an extremely rare corner case that the developers may not have had in mind and thus may not have tested. The massive usage of hardlinks in a BackupPC pool clearly is something they did not anticipate (or, at least, feel the need to implement a solution for). There might be problems that appear only in conjunction with massive
Re: [BackupPC-users] backup the backuppc pool with bacula
Jeffrey J. Kosowsky wrote: Now that doesn't mean it *couldn't* happen and it doesn't mean we shouldn't always be paranoid and test, test, test... but I just don't have any good reason to think it would fail algorithmically. Now that doesn't mean it couldn't slow down dramatically or run out of memory as some have claimed, it just seems unlikely (to me) that it would complete without error yet still have some hidden error. Even if everything is done right it would depend on the source directory not changing link targets during the (likely long) transfer process. Consider what would happen if a collision chain fixup happens and renames pool files after rsync reads the directory list and makes the inode mapping table but before the transfers start. And the renumbering will change the timestamps which should alert rsync to all the changes even without the --checksum flag. This part I'm not sure on. Is it actually *guaranteed* that a rename(2) must be implemented in terms of unlink(2) and link(2) (but atomically), i.e. that it must modify the inode change time? The inode is not really changed, except for the side effect of (atomically) decrementing and re-incrementing the link count. By virtue of the operation being atomical, the link count is *guaranteed* not to change, so I, were I to implement a file system, would feel free to optimize the inode change away (or simply not implement it in terms of unlink() and link()), unless it is documented somewhere that updating the inode change time is mandatory (though it really is *not* an inode change, so I don't see why it should be). Good catch!!! I hadn't realized that this was implementation dependent. It seems that most Unix implementations (including BSD) have historically changed the ctime, however, Linux (at least ext2/ext3) does not at least as of kernel 2.6.26.6 I sort of recall some arguments about this in the early reiserfs days. I guess the cheat and short-circuit side won even though it makes it impossible to do a correct incremental backup as expected with any ordinary tool (rsync still can but it needs a previous copy and a full block checksum comparison). In fact, the POSIS/SUS specifications specifically states: Some implementations mark for update the st_ctime field of renamed files and some do not. Applications which make use of the st_ctime field may behave differently with respect to renamed files unless they are designed to allow for either behavior. However, it wouldn't be hard to add a touch to the chain renumbering routine if you want to be able to identify newly renumbered files. One would need to make sure that this doesn't have other unintended side effects but I don't think that BackupPC otherwise uses the file mtime. Or, just do the explicit link/unlink operations to force the filesystem to do the right thing with ctime(). Does rsync even act on the inode change time? No it doesn't. In fact, I have read that most linux systems don't allow you to set the ctime to anything other than the current system time. You shouldn't be able to. But backup-type operations should be able to use it to identify moved files in incrementals. Or are you saying it would be difficult to do this manually with a special purpose algorithm that tries to just track changes to the pool and pc files? I haven't given that topic much thought. The advantage in a special purpose algorithm is that we can make assumptions about the data we are dealing with. We shouldn't do this unnecessarily, but if it has notable advantages, then why not? Difficult isn't really a point. The question is whether it can be done efficiently. I meant more difficult in terms of being sure to track all special cases and that one would have to be careful, not that one shouldn't do it. Personally, I don't like the idea of chain collisions and would have preferred using full file md5sums which as I have mentioned earlier would not be very costly at least for the rsync/rsyncd transfer methods under protocol 30. And I'd like a quick/cheap way so you could just ignore the pool during a copy and rebuild it the same way it was built in the first place without thinking twice. And maybe do things like backing up other instances of backuppc archives ignoring their pools and merging them so you could restore individual files directly. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:
Re: [BackupPC-users] backup the backuppc pool with bacula
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Les Mikesell wrote: Jeffrey J. Kosowsky wrote: In fact, the POSIS/SUS specifications specifically states: Some implementations mark for update the st_ctime field of renamed files and some do not. Applications which make use of the st_ctime field may behave differently with respect to renamed files unless they are designed to allow for either behavior. However, it wouldn't be hard to add a touch to the chain renumbering routine if you want to be able to identify newly renumbered files. One would need to make sure that this doesn't have other unintended side effects but I don't think that BackupPC otherwise uses the file mtime. Or, just do the explicit link/unlink operations to force the filesystem to do the right thing with ctime(). As long as the file you are dealing with has nlinks 1 and those other files don't vanish in between the unlink/link rename is an atomic operation... unlink + link is not. And I'd like a quick/cheap way so you could just ignore the pool during a copy and rebuild it the same way it was built in the first place without thinking twice. And maybe do things like backing up other instances of backuppc archives ignoring their pools and merging them so you could restore individual files directly. Would that mean your data transfer is equal to the un-pooled size though? ie, if you transfer a single pc/hostname directory with 20 full backups, you would need to transfer 20 X size of full backup of data. When it gets to the other side, you simply add the files from the first full backup to the pool, and then throw away (and link) the other 19 copies. Adds simplicity, but does it pose a problem with data sizes being transferred ? One optimisation would be to examine the backuppc log, and only send the files that are not same or some such... Anyway, I'll get out of the way and allow you to continue, I think you understand the issue better than me by far ... :) Regards, Adam - -- Adam Goryachev Website Managers www.websitemanagers.com.au -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkoxIp0ACgkQGyoxogrTyiUa9ACbBpbwsJjJ5VXJgL9E1K9ZNmNT ahUAoK5Z+GyGrOk6YYuzIYAWH4ucwBqq =MwA7 -END PGP SIGNATURE- -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Adam Goryachev wrote: In fact, the POSIS/SUS specifications specifically states: Some implementations mark for update the st_ctime field of renamed files and some do not. Applications which make use of the st_ctime field may behave differently with respect to renamed files unless they are designed to allow for either behavior. However, it wouldn't be hard to add a touch to the chain renumbering routine if you want to be able to identify newly renumbered files. One would need to make sure that this doesn't have other unintended side effects but I don't think that BackupPC otherwise uses the file mtime. Or, just do the explicit link/unlink operations to force the filesystem to do the right thing with ctime(). As long as the file you are dealing with has nlinks 1 and those other files don't vanish in between the unlink/link rename is an atomic operation... unlink + link is not. But that doesn't matter in this case (and it's link/unlink or you lose it). You are working with the pool file name - and you don't really want the contents related to that name to atomically change without anything else knowing about it anyway. Originally, backups weren't permitted at the same time as the nightly run to avoid that. Now there must be some kind of locking. And I'd like a quick/cheap way so you could just ignore the pool during a copy and rebuild it the same way it was built in the first place without thinking twice. And maybe do things like backing up other instances of backuppc archives ignoring their pools and merging them so you could restore individual files directly. Would that mean your data transfer is equal to the un-pooled size though? ie, if you transfer a single pc/hostname directory with 20 full backups, you would need to transfer 20 X size of full backup of data. When it gets to the other side, you simply add the files from the first full backup to the pool, and then throw away (and link) the other 19 copies. I'm not sure if rsync figures out the linked copies on the sending side or not. It at least seems possible, and I've always been able to rsync any single pc tree. Tar would only send one, but it includes one instance in each run so each incremental would repeat files you already have. Adds simplicity, but does it pose a problem with data sizes being transferred ? One optimisation would be to examine the backuppc log, and only send the files that are not same or some such... Some sort of client/server protocol would be needed to get it completely right. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
jhaglund wrote: I'd really like to know the specifics of the hardlink and inode problem talked about in this thread like how to find out how many I have and what the threshold is for Trouble and how the rest of the community deals with getting pools of 100+GB offsite in less than a week of transfer time. I don't know the details on the problem with rsyncing hardlinks. I just know that rsync cannot deal with the number of hardlinks generated by BackupPC. As to how I get my 750GBs of backups offsite... sneakernet. :) I have a 3-member raid 1 array with the third member being a removable drive enclosure. When I need an offsite backup, I pull this drive, deliver it to a secure storage location, and replace with a new drive. It only takes about 3 hours for the new drive to sync up with the rest of the array. -- Bowie -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
jhaglund wrote: There are several implied references here to likely problems with rsync and how they are all deal breakers. I've been trying to find a solution to this problem for weeks and have not found any direct documentation or evidence to support what is being said here. I'm not skeptical, though, I just need to understand what's going on. It boils down to how much RAM rsync needs to handle all the directory entries and hardlinks and the amount of time it takes to wade through them. Rsync is the only option for me, and I'm rather confused by the other solutions floated in this and other threads. On-site backup is precarious and viable only in a datacenter type situation imho. What about the fire scenario? Getting the data somewhere else is crucial, and in my case I am limited to rsync through rsh. I'm running rsync 3.0.6 but the server is 2.6.x. I have ~ 1.9 files found by rsync and it always fails on some level. I use -aH but it randomly exits with an unknown error during remote comparison or the initial transfers. During the transfer phase it says its sending data, but nothing shows up on the server. The server admins are not aware of any incompatibility with their filesystem and the internet does not seem to deal with this problem, which brings me back to the initial question. 3.x on both ends might help. It claims to not need the whole directory in memory at once - but you'll still need to build a table to map all the inodes with more than one link (essentially everything) to re-create the hardlinks so you have to throw a lot of RAM at it anyway. You shouldn't actually crash unless you run out of both ram and swap, but if you push the system into swap you might as well quit anyway. Note that if you can do rsync over ssh initiated from the other site, you could just run the backuppc server there, or a separate independent copy. Unless you have a lot of duplication among the on-site servers there wouldn't be a huge difference in traffic after the initial copy and you don't have a single point of failure. What does one use if not rsync? The main alternative is some form of image-copy of the archive partition. This is only practical if you have physical access to the server or very fast network connections. There's no way to justify or implement backing up the entire pool every time without a lot of bandwidth, which I don't have. What exactly is rsync's problem? Do I really need to shut down backuppc every time I want to attempt a sync or would syncing to a local disk and rsync'ing from that be sufficient? I'd really like to know the specifics of the hardlink and inode problem talked about in this thread like how to find out how many I have and what the threshold is for Trouble and how the rest of the community deals with getting pools of 100+GB offsite in less than a week of transfer time. 100 Gigs might be feasible - it depends more on the file sizes and how many directory entries you have, though. And you might have to make the first copy on-site so subsequently you only have to transfer the changes. Lots of info requests, I know, but I really appreciate the help. My ISP and all the experts I've tapped are completely stumped on this one. The root of the problem is that rsync has to include the entire archive in one pass to map the matching hardlinks - and it has to be able to hold the directory and inode table in RAM to do it at a usable speed. The other limiting issue is that the disk heads have to move around a lot to read and re-create all those directory entries and update the inode link counts. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
jhaglund wrote: What does one use if not rsync? In an admittedly non-backuppc environment I've been experimenting with using 'rsync -W' (this means don't use the rsync algorithm) to see if problems similar to the ones you describe go away. I'm still not sure of the result. Using rsync with the -W argument means that complete files will be transfered instead of changed pieces. In an environment where files tend to change completely, or not at all, it makes sense to try this because it means that rsync itself has less to do. I read somewhere that the rsync algorithm is intended for environments where disk bandwidth is greater than network bandwidth. That's a good way to think about it. Cordially, -- Jon Forrest Research Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforr...@berkeley.edu -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: jhaglund wrote: There are several implied references here to likely problems with rsync and how they are all deal breakers. [...] I just need to understand what's going on. It boils down to how much RAM rsync needs to handle all the directory entries and hardlinks and the amount of time it takes to wade through them. ... where the important part is the hardlinks (see below), because that simply can't be optimized, the file list - while probably consuming more memory in total - can and has been in 3.0 (probably meaning protocol version 30, i.e. rsync 3.x on both sides). I'm running rsync 3.0.6 but the server is 2.6.x. I have ~ 1.9 files found by rsync and it always fails on some level. [...] 3.x on both ends might help. It claims to not need the whole directory in memory at once - but you'll still need to build a table to map all the inodes with more than one link (essentially everything) to re-create the hardlinks so you have to throw a lot of RAM at it anyway. Please read the above carefully. It's not about so many hardlinks (meaning many links to one pool file), it's about so many files that have more than one link - whether it's 2 or 32000 is unimportant (except for the size of the complete file list, which additional hardlinks will make larger). In normal situations, you have a file with more than one link every now and then. rsync expects to have to handle a few of them. With a BackupPC pool it's practically every single file, millions of them or more in some cases. And for each and every one of them, rsync needs to store (at least) the inode number and the full path (probably relative to the transfer root) to one link (probably the first one it encounters, not necessarily the shortest one). Count for yourself: cpool/1/2/3/12345678911234567892123456789312 pc/foo/0/f%2fhome/fuser/ffoo pc/hostname/123/f%2fexport%2fhome/fwopp/f.gconf/fapps/fgnome-screensaver/f%25gconf.xml Round up to a multiple of 8, add maybe 4 bytes of malloc overhead, 4 bytes for a pointer, and factor in that we're simply not used anymore to optimizing storage requirements at the byte level. You're probably going to say, why not simply write that information to disk/database?. Reason 1: That's a lot of temporary space you'll need. If it doesn't fit in memory, we're talking about GB, not a few KB. Reason 2: Access to this table will be in random order. It's not a nice linear scan. Chances are, you'll need to read from disk almost every time. No cache is going to speed this up much, because no cache will be large enough or smart enough to know when which information will be needed again. The same applies to a database. Reason 3: rsync is a general purpose tool. It can't determine ahead of time how many hardlink entries it will need to handle. It could only react to running out of memory. Except for BackupPC pools, it would probably *never* need disk storage. You shouldn't actually crash unless you run out of both ram and swap, but if you push the system into swap you might as well quit anyway. This is the same as reason 2. You should realize that disk is not slightly slower than RAM, it's many orders of magnitude slower. It won't take 2 hours instead of 1 hour, it will take 1 hours (or more) instead of 1. That is over one year. Swap works well, as long as your working set fits into RAM. That is not the case here. [In reality, it might not be quite so dramatic, but the point is: you don't know. It simply might take a year. Or 10. Supposing your disks last that long ;-] What does one use if not rsync? The main alternative is some form of image-copy of the archive partition. This is only practical if you have physical access to the server or very fast network connections. Physical access probably meaning, that you can transport your copy to and from the server. Never underestimate the bandwidth of a station waggon full of tapes hurtling down the highway. (Andrew S. Tanenbaum). Do I really need to shut down backuppc every time I want to attempt a sync or would syncing to a local disk and rsync'ing from that be sufficient? Try something like time find /var/lib/backuppc -ls /dev/null to get a feeling for just how long only traversing the BackupPC pool and doing a stat() on each file really takes. Then remember that syncing to a local disk is in no way simpler than syncing to a remote disk - the bandwidth for copying is simply higher, so that is the only place you get a speedup. From a different perspective: either it's going to be fast enough that shutting down BackupPC won't hurt, or it's going to be *necessary* to shut down BackupPC, because having it modify the file system would hurt. Just imagine the pc/ directory links on your copy would point
Re: [BackupPC-users] backup the backuppc pool with bacula
Holger Parplies wrote at about 04:22:03 +0200 on Thursday, June 11, 2009: Hi, Les Mikesell wrote on 2009-06-10 15:45:22 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: jhaglund wrote: There are several implied references here to likely problems with rsync and how they are all deal breakers. [...] I just need to understand what's going on. It boils down to how much RAM rsync needs to handle all the directory entries and hardlinks and the amount of time it takes to wade through them. ... where the important part is the hardlinks (see below), because that simply can't be optimized, the file list - while probably consuming more memory in total - can and has been in 3.0 (probably meaning protocol version 30, i.e. rsync 3.x on both sides). Holger, I may be wrong here, but I think that you get the more efficient memory usage as long as both client server are version =3.0 even if protocol version is set to 30 (which is true for BackupPC where it defaults back to version 28). I think protocol 30 has more to do with the changes from md4sums to md5sums plus the ability to have longer file names (255 characters I think) plus other protocol extensions. But I'm not an expert and my understanding is that the protocols themselves are not well documented other than looking through the source code. I'm running rsync 3.0.6 but the server is 2.6.x. I have ~ 1.9 files found by rsync and it always fails on some level. [...] 3.x on both ends might help. It claims to not need the whole directory in memory at once - but you'll still need to build a table to map all the inodes with more than one link (essentially everything) to re-create the hardlinks so you have to throw a lot of RAM at it anyway. Please read the above carefully. It's not about so many hardlinks (meaning many links to one pool file), it's about so many files that have more than one link - whether it's 2 or 32000 is unimportant (except for the size of the complete file list, which additional hardlinks will make larger). In normal situations, you have a file with more than one link every now and then. rsync expects to have to handle a few of them. With a BackupPC pool it's practically every single file, millions of them or more in some cases. And for each and every one of them, rsync needs to store (at least) the inode number and the full path (probably relative to the transfer root) to one link (probably the first one it encounters, not necessarily the shortest one). Count for yourself: cpool/1/2/3/12345678911234567892123456789312 pc/foo/0/f%2fhome/fuser/ffoo pc/hostname/123/f%2fexport%2fhome/fwopp/f.gconf/fapps/fgnome-screensaver/f%25gconf.xml Round up to a multiple of 8, add maybe 4 bytes of malloc overhead, 4 bytes for a pointer, and factor in that we're simply not used anymore to optimizing storage requirements at the byte level. You're probably going to say, why not simply write that information to disk/database?. Reason 1: That's a lot of temporary space you'll need. If it doesn't fit in memory, we're talking about GB, not a few KB. Reason 2: Access to this table will be in random order. It's not a nice linear scan. Chances are, you'll need to read from disk almost every time. No cache is going to speed this up much, because no cache will be large enough or smart enough to know when which information will be needed again. The same applies to a database. Reason 3: rsync is a general purpose tool. It can't determine ahead of time how many hardlink entries it will need to handle. It could only react to running out of memory. Except for BackupPC pools, it would probably *never* need disk storage. You shouldn't actually crash unless you run out of both ram and swap, but if you push the system into swap you might as well quit anyway. This is the same as reason 2. You should realize that disk is not slightly slower than RAM, it's many orders of magnitude slower. It won't take 2 hours instead of 1 hour, it will take 1 hours (or more) instead of 1. That is over one year. Swap works well, as long as your working set fits into RAM. That is not the case here. [In reality, it might not be quite so dramatic, but the point is: you don't know. It simply might take a year. Or 10. Supposing your disks last that long ;-] What does one use if not rsync? The main alternative is some form of image-copy of the archive partition. This is only practical if you have physical access to the server or very fast network connections. Physical access probably meaning, that you can transport your copy to and from the server. Never underestimate the bandwidth of a station waggon full of tapes hurtling down the highway
Re: [BackupPC-users] backup the backuppc pool with bacula
On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote: Is the blockdevel-level rsync-like solution going to be something publicly available? blockdev-level rsync smells like drbd. I'm not sure whether it support such huge amounts of unsynchronized data, but it might just be a matter of configuration. Tino. -- What we nourish flourishes. - Was wir nähren erblüht. www.lichtkreis-chemnitz.de www.craniosacralzentrum.de -- OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On Tue, Jun 02, 2009 at 11:44:11AM +0200, Tino Schwarze wrote: On Mon, Jun 01, 2009 at 06:15:52PM -0400, Stephane Rouleau wrote: Is the blockdevel-level rsync-like solution going to be something publicly available? We certainly intend to, but no guarantee it ever gets finished. Except for implementation there's not that much work left, but we do it in our free time. It really seems strange something like that doesn't exist yet (and rsync itself doesn't support blockdevices). blockdev-level rsync smells like drbd. I'm not sure whether it support such huge amounts of unsynchronized data, but it might just be a matter of configuration. In fact, i'd say LVM should be able to do this: generate a blocklevel-diff between two snapshots of the same volume, and create a new snapshot/volume based on an old one + a diff. Eg. ZFS supports this using send/receive. So far, i haven't read about support for such a feature. On the other hand, i think i've read on this list that using zfs send/receive for backuppc pools was very slow (but that's on a filesystem level, not blockdev level). Drdb might be a solution too - i haven't looked at it closely. It seems more meant for high availability, probably it can be used for offsite-backup too. It has support for recovery after disconnection/failure, so maybe you can use it to keep older versions on a remote system by forcibly disconnecting the nodes. I don't know how easy it would be to migrate a non-drdb volume either. Anyone experience with this in combination with backuppc? -- Pieter -- OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On Sun, May 31, 2009 at 11:22:13AM -0400, Stephane Rouleau wrote: Pieter Wuille wrote: This is how we handle backups of the backuppc pool: * the pool itself is on a LUKS-encrypted XFS filesystem, on a LVM volume, on a software RAID1 of 2 1TB disks. * twice a week following procedure in run: * Freeze the XFS filesystem, sync, lvm-snapshot the encrypted volume * Unfreeze * send the snapshot over ssh to an offsite server (which thus only ever sees the encrypted data) * remove the snapshot * The offsite server has 2 smaller disks (not in RAID), and snapshots are sent in turn to one and to the other. This means we still have a complete pool if something would goes wrong during the transfer (which takes +- a day) * The consistency of the offsite backups can be verified by exporting them over NBD (network block device), and mounting them on the normal backup server (which has the encryption keys) We use a blockdevice-based solution instead of a filesystem-based one, because the many small files (16 million inodes and growing) makes those very disk- and cpu intensive. (simply doing a find | wc -l in the root takes hours). Furthermore it makes encryption easier. We are also working on a rsync-like system for block devices (yet that might still take some time...), which would bring the time for synchronising the backup server with the offsite one down to 1-2 hours. Greetz, Pieter, This sounds rather close to what I'd like to have over the coming months. I just recently reset our backup pool, and rather stupidly did not select an encrypted filesystem (Otherwise we're on XFS, LVM, RAID1 2x1.5TB). Figured I'd encrypt the offsite only, but I see now that it'd be much better to send data at the block level. You mention the capacity of your pool file system, but how much space is typically used on it? Curious also what kind of connection speed you have with your offsite backup solution. Some numbers: * backup server has 1TB of RAID1 storage * contains amonst others a 400GiB XFS volume for backuppc * daily/weekly backups of +- 195GiB of data * contains 256GiB of backups (expected to increase significantly still) * contains 16.8 million inodes * according to LVM snapshot usage, avg. 1.5 GiB of data blocks change on this volume daily * offsite backup server has 2x 500GB of non-RAID storage * twice a week, the whole 400GiB volume is sent over a 100Mbps connection (at +- 8.1MiB/s) * that's a huge waste for maybe 5GiB of changed data, but the bandwidth is generously provided by the university * we hope to have a more efficient blockdevice-level synchronisation system in a few months PS: sorry for the strange subject earlier - i used a wrong 'from' address first and forwared it -- Pieter -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp as they present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://p.sf.net/sfu/creativitycat-com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Thanks Pieter, Is the blockdevel-level rsync-like solution going to be something publicly available? Stephane Pieter Wuille wrote: On Sun, May 31, 2009 at 11:22:13AM -0400, Stephane Rouleau wrote: Pieter Wuille wrote: This is how we handle backups of the backuppc pool: * the pool itself is on a LUKS-encrypted XFS filesystem, on a LVM volume, on a software RAID1 of 2 1TB disks. * twice a week following procedure in run: * Freeze the XFS filesystem, sync, lvm-snapshot the encrypted volume * Unfreeze * send the snapshot over ssh to an offsite server (which thus only ever sees the encrypted data) * remove the snapshot * The offsite server has 2 smaller disks (not in RAID), and snapshots are sent in turn to one and to the other. This means we still have a complete pool if something would goes wrong during the transfer (which takes +- a day) * The consistency of the offsite backups can be verified by exporting them over NBD (network block device), and mounting them on the normal backup server (which has the encryption keys) We use a blockdevice-based solution instead of a filesystem-based one, because the many small files (16 million inodes and growing) makes those very disk- and cpu intensive. (simply doing a find | wc -l in the root takes hours). Furthermore it makes encryption easier. We are also working on a rsync-like system for block devices (yet that might still take some time...), which would bring the time for synchronising the backup server with the offsite one down to 1-2 hours. Greetz, Pieter, This sounds rather close to what I'd like to have over the coming months. I just recently reset our backup pool, and rather stupidly did not select an encrypted filesystem (Otherwise we're on XFS, LVM, RAID1 2x1.5TB). Figured I'd encrypt the offsite only, but I see now that it'd be much better to send data at the block level. You mention the capacity of your pool file system, but how much space is typically used on it? Curious also what kind of connection speed you have with your offsite backup solution. Some numbers: * backup server has 1TB of RAID1 storage * contains amonst others a 400GiB XFS volume for backuppc * daily/weekly backups of +- 195GiB of data * contains 256GiB of backups (expected to increase significantly still) * contains 16.8 million inodes * according to LVM snapshot usage, avg. 1.5 GiB of data blocks change on this volume daily * offsite backup server has 2x 500GB of non-RAID storage * twice a week, the whole 400GiB volume is sent over a 100Mbps connection (at +- 8.1MiB/s) * that's a huge waste for maybe 5GiB of changed data, but the bandwidth is generously provided by the university * we hope to have a more efficient blockdevice-level synchronisation system in a few months PS: sorry for the strange subject earlier - i used a wrong 'from' address first and forwared it -- OpenSolaris 2009.06 is a cutting edge operating system for enterprises looking to deploy the next generation of Solaris that includes the latest innovations from Sun and the OpenSource community. Download a copy and enjoy capabilities such as Networking, Storage and Virtualization. Go to: http://p.sf.net/sfu/opensolaris-get ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
I have one system where I do backups of backuppc to tape for disaster recover. Here's the system I use: - Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient for this, because the disk load gets too high, data flow rate gets too low, and the tape starts to 'shoeshine'. - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive, which is sufficient to store the backuppc data from that machine. - In case you're curious, the tar command I use is: tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/ 21 $LOGFILE - Restart backuppc. Wow, thanks. I do not have tape drives available, but would it also work if I'd be doing this same procedure onto an external USB HDD? Thanks, F. -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. Well, backing it up once a day would be ok for me too. You will use rsync because it handles hardlinks, am I right? My goal is the same as yours: switch immediately onto another server if my BackupPC server dies. Can you tell us about your results when you set it up? Thanks, F. -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Boniforti Flavio wrote: I have one system where I do backups of backuppc to tape for disaster recover. Here's the system I use: - Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient for this, because the disk load gets too high, data flow rate gets too low, and the tape starts to 'shoeshine'. - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive, which is sufficient to store the backuppc data from that machine. - In case you're curious, the tar command I use is: tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/ 21 $LOGFILE - Restart backuppc. Wow, thanks. I do not have tape drives available, but would it also work if I'd be doing this same procedure onto an external USB HDD? Yes, but you have to test a restore of a typically-sized archive before deciding that this method is suitable for your purpose. While it may take a few hours or less to make the tar copy, it will likely take at least a few days to restore a usable disk copy will all the hardlinks. On the other hand, if your external disk has as much space as the live partition, you can stop backuppc, unmount the partition, and image-copy to a matching partition on the external drive and have a copy that is directly usable. -- Les Mikesell lesmikes...@gmail.com -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Boniforti Flavio wrote: How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. Well, backing it up once a day would be ok for me too. You will use rsync because it handles hardlinks, am I right? My goal is the same as yours: switch immediately onto another server if my BackupPC server dies. Can you tell us about your results when you set it up? If you want a 'live' backup on a nearby machine you might look at drbd which is sort of like raid over the network. If you don't need auto-failover, though, you can get fairly high availability with normal RAID1 in a chassis with swappable drives. In the fairly likely event of a single disk failure, you just swap in a new drive and rebuild the mirror. In the less likely event of a motherboard component failure you yank the drives and move them to a spare chassis that you've kept for that purpose. It is still a good idea to have offsite copies of the archive in case of a building disaster or a software or operator error that destroys the running copy. Rsync may work - depending on the size of your archive and the amount of RAM you have. -- Les Mikesell lesmikes...@gmail.com -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On Tue, May 19, 2009 at 11:35:36AM -0500, Les Mikesell wrote: Have you ever restored one of these tapes, and if so, how long did it take? As a wild guess, I'd expect a couple of days where an image copy would be an hour or two. I did this with solaris ufsdump/ufsrestore once. Making the tape took one or two hours. I gave up and cancelled the restore after 24 hours. danno -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On Tue, May 19, 2009 at 12:27 PM, Carl Wilhelm Soderstrom chr...@real-time.com wrote: On 05/19 12:04 , Tim Cole wrote: How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. Rsync memory requirements have historically been too high for this. I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap; and it crushed the box to the point where I had to power-cycle it. Best thing to do is set up a duplicate backuppc server that independently backs up the hosts you want to have a redundant backup for. Try Rsync v3, it has much lower memory requirements since it builds the file list incrementally. I used it at one company to do nightly syncs of their ~4TB backuppc pool offsite. -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, Rob Terhaar wrote on 2009-05-21 13:01:58 -0400 [Re: [BackupPC-users] backup the backuppc pool with bacula]: [...] Try Rsync v3, it has much lower memory requirements since it builds the file list incrementally. by all means, try it. But it's not the file list that is the specific problem of BackupPC. I used it at one company to do nightly syncs of their ~4TB backuppc pool offsite. It's still a matter of file count (used inodes, to be exact), not pool storage size. rsync V3 may perform significantly better if you have many links to comparatively few inodes, but if you have many inodes (for some unknown value of many), I am still convinced that you will hit a problem. Feel free to convince me otherwise, but works for me is unlikely to succeed ;-). Regards, Holger -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Good point Holger- I don't have hardlink counts or stats, but it should only take about 10mns of work to download/compile rsync3, and run a benchmark on your pool ;) On 5/21/09, Holger Parplies wb...@parplies.de wrote: Hi, Rob Terhaar wrote on 2009-05-21 13:01:58 -0400 [Re: [BackupPC-users] backup the backuppc pool with bacula]: [...] Try Rsync v3, it has much lower memory requirements since it builds the file list incrementally. by all means, try it. But it's not the file list that is the specific problem of BackupPC. I used it at one company to do nightly syncs of their ~4TB backuppc pool offsite. It's still a matter of file count (used inodes, to be exact), not pool storage size. rsync V3 may perform significantly better if you have many links to comparatively few inodes, but if you have many inodes (for some unknown value of many), I am still convinced that you will hit a problem. Feel free to convince me otherwise, but works for me is unlikely to succeed ;-). Regards, Holger -- Sent from my mobile device -- Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT is a gathering of tech-side developers brand creativity professionals. Meet the minds behind Google Creative Lab, Visual Complexity, Processing, iPhoneDevCamp asthey present alongside digital heavyweights like Barbarian Group, R/GA, Big Spaceship. http://www.creativitycat.com ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, there is a regular discussion on how to backup/move/copy the backuppc pool. Did anyone try to backup the pool with bacula? Hello there... I don't know about bacula, but would like myself also to get a backup of the BackupPC server: anybody got some suggestions and practical examples? Thanks, F. -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Boniforti Flavio wrote: Hi, there is a regular discussion on how to backup/move/copy the backuppc pool. Did anyone try to backup the pool with bacula? Hello there... I don't know about bacula, but would like myself also to get a backup of the BackupPC server: anybody got some suggestions and practical examples? Thanks, F. How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/ / / -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Boniforti Flavio wrote: Hi, there is a regular discussion on how to backup/move/copy the backuppc pool. Did anyone try to backup the pool with bacula? Hello there... I don't know about bacula, but would like myself also to get a backup of the BackupPC server: anybody got some suggestions and practical examples? I think the only way to do it at a reasonable speed is to unmount the partition where the archive is stored and image-copy it to an equal-sized partition. Or, if you created it as a RAID1 with a missing mirror, you can add/sync a mirror while mounted - but you still won't have any performance if you try to do anything else for the duration of the copy and you need to unmount momentarily to get a clean filesystem as you fail/remove the mirror. Some people are copying smaller archives with rsync (-aH), and the newest version of rsync is supposed to handle the hardlinks more efficiently. You can always try that and see how long it takes. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On 05/19 05:51 , Boniforti Flavio wrote: I don't know about bacula, but would like myself also to get a backup of the BackupPC server: anybody got some suggestions and practical examples? I have one system where I do backups of backuppc to tape for disaster recover. Here's the system I use: - Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient for this, because the disk load gets too high, data flow rate gets too low, and the tape starts to 'shoeshine'. - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive, which is sufficient to store the backuppc data from that machine. - In case you're curious, the tar command I use is: tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/ 21 $LOGFILE - Restart backuppc. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On 05/19 12:04 , Tim Cole wrote: How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. Rsync memory requirements have historically been too high for this. I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap; and it crushed the box to the point where I had to power-cycle it. Best thing to do is set up a duplicate backuppc server that independently backs up the hosts you want to have a redundant backup for. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Carl Wilhelm Soderstrom wrote: On 05/19 05:51 , Boniforti Flavio wrote: I don't know about bacula, but would like myself also to get a backup of the BackupPC server: anybody got some suggestions and practical examples? I have one system where I do backups of backuppc to tape for disaster recover. Here's the system I use: - Stop backuppc to quiesce the filesystem. LVM snapshots are not sufficient for this, because the disk load gets too high, data flow rate gets too low, and the tape starts to 'shoeshine'. - Dump /var/lib/backuppc to tape with 'tar'. I have a 500GB tape drive, which is sufficient to store the backuppc data from that machine. - In case you're curious, the tar command I use is: tar -cv --totals --exclude-from=$EXCLUDEFILE -f /dev/st0 /var/lib/backuppc/ 21 $LOGFILE - Restart backuppc. Have you ever restored one of these tapes, and if so, how long did it take? As a wild guess, I'd expect a couple of days where an image copy would be an hour or two. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Carl Wilhelm Soderstrom wrote: On 05/19 12:04 , Tim Cole wrote: How often do you want to backup the server? What about using rsync to backup the pool? I want to mirror my primary backup server to another system daily so I can switch to my backup server quickly. I was going to use rsync for this. Rsync memory requirements have historically been too high for this. I tried rsync'ing 100GB on a machine with 512MB RAM and about the same swap; and it crushed the box to the point where I had to power-cycle it. Best thing to do is set up a duplicate backuppc server that independently backs up the hosts you want to have a redundant backup for. These days, I'm not sure 512MB RAM and 'server' belong in the same sentence, but it is still hard to deal with the disk head motion needed to traverse and recreate all those hardlinks splattered more or less randomly across the disk. -- Les Mikesell lesmikes...@gmail.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
On 05/19 11:35 , Les Mikesell wrote: Have you ever restored one of these tapes, and if so, how long did it take? As a wild guess, I'd expect a couple of days where an image copy would be an hour or two. I think it's about 8 hours to create the tape (whereas when using LVM snapshots it took almost 2 days). I've scanned and recovered files from the tape, and it takes a few hours, depending on where on the tape the file is. I've never convinced the company to do a full test restore (we're contractors, and doing that test would cost more money); but I have scanned the tapes and checked their integrity, and that takes a healthy long time. An image copy would indeed be much faster. I prefer a file-level backup rather that filesystem-level backup simply because recovery in case of corruption is much better. -- Carl Soderstrom Systems Administrator Real-Time Enterprises www.real-time.com -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, Carl Wilhelm Soderstrom wrote on 2009-05-19 11:54:16 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: On 05/19 11:35 , Les Mikesell wrote: Have you ever restored one of these tapes, and if so, how long did it take? As a wild guess, I'd expect a couple of days where an image copy would be an hour or two. I think it's about 8 hours to create the tape (whereas when using LVM snapshots it took almost 2 days). I've scanned and recovered files from the tape, and it takes a few hours, depending on where on the tape the file is. I've never convinced the company to do a full test restore (we're contractors, and doing that test would cost more money); but I have scanned the tapes and checked their integrity, and that takes a healthy long time. An image copy would indeed be much faster. I prefer a file-level backup rather that filesystem-level backup simply because recovery in case of corruption is much better. it really depends on what you want to do in case of disaster. 1.) Restore the pool Forget it. 2.) Restore files from previous backups or even single whole backups No problem. [Though I have no idea how tar handles restoring a subset of the archive that is made up of hardlinks to various files outside this subset. This may well prove not to work or require insane amounts of tape seeks - if tar seeks on its input at all.] With an image level copy it's the other way around, but I agree that I wouldn't trust tape media very far. Has anyone invented tape-RAID-6 yet? :) Regards, Holger -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] backup the backuppc pool with bacula
Hi, Les Mikesell wrote on 2009-05-19 11:12:25 -0500 [Re: [BackupPC-users] backup the backuppc pool with bacula]: [...] the newest version of rsync is supposed to handle the hardlinks more efficiently. reason suggests that this is an urban myth. The newest version of rsync handles *large file lists* better, not *hardlinks*. To handle hardlinks better, you would almost certainly need to create a temporary file (which can easily be several *GB* in size in our cases). I somehow doubt any general purpose tool would dare do that (let alone find a spot where it can - my /tmp simply isn't large enough). The issue is that the temporary file will either be very large or unneeded, and an algorithm able to handle extreme numbers of hardlinks will probably be slow in the overwhelming majority of cases with very few hardlinks. You can always try that and see how long it takes. That you can. Just remember that your space usage (and hardlink counts) will grow over time. How does the time it takes grow in proportion to space and/or hardlink counts? At some random point it will stop working. You may never reach that point. But what if you do? Can you simply find another solution then? How long can you keep your pool offline for the copy process? At what point do you abort the copy process? How can you monitor its progress? Just some things to think about ... Regards, Holger -- Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/