Re: [BackupPC-users] Renaming files causes retransfer?
also sprach Holger Parplies wb...@parplies.de [2011.04.20.2001 +0200]: We'd all like to be able to choose an existing *pool file* as reference - this would save us transfers of *any* file already existing in the pool (e.g. from other hosts). Unfortunately, this is technically not possible without a specialized BackupPC client. […] I hope that clears things up a bit. Yes, thanks! also sprach Jeffrey J. Kosowsky backu...@kosowsky.org [2011.04.21.0707 +0200]: Holger Parplies wrote at about 20:01:28 +0200 on Wednesday, April 20, 2011: 4.) There *was* an attempt to write a specialized BackupPC client (BackupPCd) quite a while back. I believe this was given up for lack of human resources. I always found this matter rather interesting, but I've never gotten around to even taking a look at the code, let alone do anything with it. Interesting... it might be worthwhile to revisit this in the context of the new BackupPC ver 4. In particular, with ver 4 using full file md5sums (and potentially maybe other sha checksums), one could imagine an rsync extension that in the absence of a matching file, would first compute the local md5sum and transmit it to the BackupPC server to look for an exact pool match... Either that, or maybe it would be possible to work on extending the rsync protocol, so that with a future version, it would be possible to transfer additional data alongside. -- martin | http://madduck.net/ | http://two.sentenc.es/ it is only the modern that ever becomes old-fashioned. -- oscar wilde spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) -- WhatsUp Gold - Download Free Network Management Software The most intuitive, comprehensive, and cost-effective network management toolset available today. Delivers lowest initial acquisition cost and overall TCO of any competing solution. http://p.sf.net/sfu/whatsupgold-sd___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
Hi, martin f krafft wrote on 2011-04-17 16:43:07 +0200 [Re: [BackupPC-users] Renaming files causes retransfer?]: also sprach John Rouillard rouilj-backu...@renesys.com [2011.04.17.1625 +0200]: In terms of backuppc, this means that the files will have to be transferred again, completely, right? Correct. Actually, I just did a test, using iptables to count bytes between the two hosts, and then renamed a 33M file. backuppc, using rsync, only transferred 370k. Hence I think that it actually does *not* transfer the whole file. it always feels strange to contradict reality, but, in theory, there is no way to get around transferring the file. For the rsync algorithm to work, you need a local reference copy of the file you want to transfer. While you and I know that there *is* a local copy, BackupPC would need to know (a) that there is and (b) where to find it. The only available information at the point in time where this decision needs to be made is the (new) file name. For this, there is no candidate in the reference backup (or any other backup, for that matter). So the file needs to be transferred in full. We'd all like to be able to choose an existing *pool file* as reference - this would save us transfers of *any* file already existing in the pool (e.g. from other hosts). Unfortunately, this is technically not possible without a specialized BackupPC client. (btw, I also think that what I wrote in http://comments.gmane.org/gmane.comp.sysutils.backup.backuppc.general/24352 is wrong, but I shall follow up on this when I have verified my findings). Is that a backuppc-users thread I somehow missed? I see where your question is going now, so I'll go into a bit more detail (not sure if any of this was already mentioned in that thread). 1.) BackupPC uses already existing transfer methods for the sake of not needing to install anything non-mainstream on the clients. In your case, that is probably ssh + rsync. Consequentially, BackupPC is limited to what the rsync protocol will allow, which does *not* include, hey, send me the 1st and 8th 128kB chunk of the file before I'll tell you the checksum I have on my side. Such a request just doesn't make any sense for standalone rsync. We need to select a candidate before we can start transferring blocks that don't match (and skip blocks that do). It's really quite obvious, if you think about it, and it only gets more complicated (but doesn't change) if you go into the details of which rsync end plays which role in the file delta exchange. The same is basically true for tar and smb, respectively. The remote end decides what data to transfer (which is whole file or nothing), and you can take it or ignore it, but you can't prevent it from being transferred. 2.) BackupPC reads the first 1MB into memory. It needs to do so to determine the pool file name. That should not be a problem memory-wise. 3.) BackupPC cannot, obviously, read any arbitrary size file into memory. It also wants to avoid unnecessary (possibly extremely large) writes to the pool FS. So it does this: - Determine pool file candidates (possibly several, in case of pool collisions). - Read pool file candidates in parallel with the network transfer. - As soon as something doesn't match, discard the respective candidate. - If that was the last available candidate, copy everything so far (which *did* match) from that candidate to a new file. We need to get this content from somewhere, and the network stream is, obviously, not seekable, so we can't re-get it from there (but then, we don't need to and wouldn't want to, because, hopefully, our local disk is faster ;-). - If the whole candidate file matched our complete network stream, we have a pool match and only need to link to that. 4.) There *was* an attempt to write a specialized BackupPC client (BackupPCd) quite a while back. I believe this was given up for lack of human resources. I always found this matter rather interesting, but I've never gotten around to even taking a look at the code, let alone do anything with it. I hope that clears things up a bit. Regards, Holger -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
Holger Parplies wrote at about 20:01:28 +0200 on Wednesday, April 20, 2011: 4.) There *was* an attempt to write a specialized BackupPC client (BackupPCd) quite a while back. I believe this was given up for lack of human resources. I always found this matter rather interesting, but I've never gotten around to even taking a look at the code, let alone do anything with it. Interesting... it might be worthwhile to revisit this in the context of the new BackupPC ver 4. In particular, with ver 4 using full file md5sums (and potentially maybe other sha checksums), one could imagine an rsync extension that in the absence of a matching file, would first compute the local md5sum and transmit it to the BackupPC server to look for an exact pool match... -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
On Sun, 2011-04-17 at 15:43 -0400, Jeffrey J. Kosowsky wrote: Well... you could write a script (or even one-liner) to do the same name change (modulof-mangling) on the last backup... this would be pretty easy if your name change is well-defined... What about the file attributes stored in attrib in each subdir of the pc tree? Regards, Tyler -- A human being should be able to change a diaper, plan an invasion, butcher a hog, conn a ship, design a building, write a sonnet, balance accounts, build a wall, set a bone, comfort the dying, take orders, give orders, cooperate, act alone, solve equations, analyze a new problem, pitch manure, program a computer, cook a tasty meal, fight efficiently, die gallantly. Specialization is for insects. -- Lazarus Long, Time Enough for Love, by Robert A. Heinlein -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
[BackupPC-users] Renaming files causes retransfer?
Dear list, we are facing a policy change requiring people to rename data files in a trivial way (replace ':' with '-'). In terms of backuppc, this means that the files will have to be transferred again, completely, right? Or is there a way in which I can prepare the server for this change and prevent the completely unnecessary transfer of terabytes of data, just so backuppc can find out that the data haven't changed? Thanks, -- martin | http://madduck.net/ | http://two.sentenc.es/ when faced with a new problem, the wise algorithmist will first attempt to classify it as np-complete. this will avoid many tears and tantrums as algorithm after algorithm fails. -- g. niruta spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
On Sun, Apr 17, 2011 at 09:23:07AM +0200, martin f krafft wrote: we are facing a policy change requiring people to rename data files in a trivial way (replace ':' with '-'). In terms of backuppc, this means that the files will have to be transferred again, completely, right? Correct. Or is there a way in which I can prepare the server for this change and prevent the completely unnecessary transfer of terabytes of data, just so backuppc can find out that the data haven't changed? I assume you are using rsync as your backup method. Hence I claim you can prepare the server, but I mention YMMV, not valid in months with a full moon or days whose english name ends in y etc. It requires surgery on your last valid backup to account for the renaming, and may make your last valid backup invalid for restoration. I had had this work (I believe since the backup time/bytes transferred was much less than it would take for it to transfer the files) a few times. I suggest taking 2 full backups just before the rename. The first captures any data that has changed and can be used to do restores. The second is what you are going to operate on. Let's call the top of your backup pc data dir (where the cpool, pool and pc directories reside) /backuppc. Lets assume the files are being renamed on the host client1 in the share /data and the (sub) directory /set1. You will have a directory: /backuppc/pc/client1/backupnumber/f%2fdata/fset1 under there each file/directory will be represented as ffilename or fdirectoryname. Change into the /backuppc/pc/client1/backupnumber directory where backupnumber is the number of your last full backup. Navigate to the directory where a file that is going to be renamed in the last full backup and change its (mangled) name. So mv f20110204_11:23.dat f20110204_11-23.dat for example. Once you have done the surgery on the pc tree and the renames have occurred on client1, run another full backup. What should happen is: the rsync full backup will use the last backup (i.e. the full you operated on) as it's reference backup since the file names in the reference backup match the file names on client1 it should do a block comparison rather than transferring the file(s) all over again. Since you moved the file to the new name, it will still be linked into the pool, if you copy the file that will not be the case (and your data needs will grow since this surgey won't cause the altered backup to be linked into the cpool). Rather than move I suppose you could use ln if you wanted to keep both sets of names in the modified backup. Note that there is an attrib file in the same directory as your data. It is a binary file that is needed to restore infomation like uid/gid/mode ... when the files are restored. Because you renamed the files, you won't be able to restore them cleanly from that backup tree since there is no entry in the attrib file for the renamed file. (Linking should get around this issue, but I haven't tried it.) When I did this (a rename of a bunch of hard disk images backed up across a wan), I did a couple of test files first and did full backup and verified that I didn't move enough data (or take long enough) to have copied the images again, so I claim it works. Good luck. Use at your own risk etc. I hope a supported mechanism to allow this will be in backuppc version 4 along with some way of easily importing a copy of the data taken by other means (e.g. in a tarball on a hard drive). As it comes up often enough with users who have large data sets. -- -- rouilj John Rouillard System Administrator Renesys Corporation 603-244-9084 (cell) 603-643-9300 x 111 -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
also sprach John Rouillard rouilj-backu...@renesys.com [2011.04.17.1625 +0200]: In terms of backuppc, this means that the files will have to be transferred again, completely, right? Correct. Actually, I just did a test, using iptables to count bytes between the two hosts, and then renamed a 33M file. backuppc, using rsync, only transferred 370k. Hence I think that it actually does *not* transfer the whole file. (btw, I also think that what I wrote in http://comments.gmane.org/gmane.comp.sysutils.backup.backuppc.general/24352 is wrong, but I shall follow up on this when I have verified my findings). -- martin | http://madduck.net/ | http://two.sentenc.es/ beauty, brains, availability, personality; pick any two. spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
On 4/17/11 2:23 AM, martin f krafft wrote: Dear list, we are facing a policy change requiring people to rename data files in a trivial way (replace ':' with '-'). In terms of backuppc, this means that the files will have to be transferred again, completely, right? Or is there a way in which I can prepare the server for this change and prevent the completely unnecessary transfer of terabytes of data, just so backuppc can find out that the data haven't changed? If you are on a reasonably fast local LAN with the clients, it may not be a serious problem, since the server will discard the file as soon as it detects the duplicate data content and a transfer isn't much slower than a normal full's read for checksum comparisons. However, you should probably force a full run afterward or make the change immediately ahead of a scheduled full. If you are doing incrementals without incremental levels, you'll transfer the changed file every run until a full. If you are on a slower WAN connection, you might need to follow some of the other advice about renaming the underlying backuppc archive files. If I had to do it, I'd try making another hardlink in the last full tree so both the old mangled name and the expected new one appears (i.e. link the new name to the existing one, you don't need to figure out the pool location) - but I really don't know if that is good advice or not. -- Les Mikesell lesmikes...@gmail.com -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
Re: [BackupPC-users] Renaming files causes retransfer?
martin f krafft wrote at about 09:23:07 +0200 on Sunday, April 17, 2011: Dear list, we are facing a policy change requiring people to rename data files in a trivial way (replace ':' with '-'). In terms of backuppc, this means that the files will have to be transferred again, completely, right? Or is there a way in which I can prepare the server for this change and prevent the completely unnecessary transfer of terabytes of data, just so backuppc can find out that the data haven't changed? Thanks, Well... you could write a script (or even one-liner) to do the same name change (modulof-mangling) on the last backup... this would be pretty easy if your name change is well-defined... you might even be able to use the 'rename' Unix function -- Benefiting from Server Virtualization: Beyond Initial Workload Consolidation -- Increasing the use of server virtualization is a top priority.Virtualization can reduce costs, simplify management, and improve application availability and disaster protection. Learn more about boosting the value of server virtualization. http://p.sf.net/sfu/vmware-sfdev2dev ___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/
[BackupPC-users] Renaming files causes retransfer?
Dear list, we are facing a policy change requiring people to rename data files in a trivial way (replace ':' with '-'). In terms of backuppc, this means that the files will have to be transferred again, completely, right? Or is there a way in which I can prepare the server for this change and prevent the completely unnecessary transfer of terabytes of data, just so backuppc can find out that the data haven't changed? Thanks, -- martin | http://madduck.net/ | http://two.sentenc.es/ my experience is that as soon as people are old enough to know better, they don't know anything at all. -- oscar wilde spamtraps: madduck.bo...@madduck.net digital_signature_gpg.asc Description: Digital signature (see http://martin-krafft.net/gpg/sig-policy/999bbcc4/current) -- The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: Pinpoint memory and threading errors before they happen. Find and fix more than 250 security defects in the development cycle. Locate bottlenecks in serial and parallel code that limit performance. http://p.sf.net/sfu/intel-dev2devfeb___ BackupPC-users mailing list BackupPC-users@lists.sourceforge.net List:https://lists.sourceforge.net/lists/listinfo/backuppc-users Wiki:http://backuppc.wiki.sourceforge.net Project: http://backuppc.sourceforge.net/