Re: rsync in-place (was Re: rsync 1tb+ each day)
Is it possible to tell rsync to update the blocks of the target file=20 'in-place' without creating the temp file (the 'dot file')? I can=20 guarantee that no other operations are being performed on the file at=20 the same time. The docs don't seem to indicate such an option. No, it's not possible, and making it possible would require a deep and fundamental redesign and re-implementation of rsync; the result wouldn't resemble the current program much. I disagree. An --inplace option wouldn't be too hard to implement. The trick is that when --inplace is specified the block matching algorithm (on the sender) would only match blocks at or after that block's location (on the receiver). No protocol change is required. The receiver can then operate in-place since no matching blocks are earlier in the file. This could be relaxed to allow a fixed number of earlier blocks, based on the knowledge the receiver will buffer reads. But that is more risky. Caveat user: if you specify --inplace and the source file has a single byte added to the beginning then the entire file will be sent as literal data. Of course, a major issue with --inplace is that the file will be in an intermediate state if rsync is killed mid-transfer. Rsync currently ensures that every file is either the original or new. Another independent optimization would be to do lazy writes. Currently, if you specify -I (--ignore-times) the output file is written (to a tmp file and then renamed) even if the contents are identical. Instead, creation of the tmp file could be delayed until the output file is known to be different. This is detected either by an out-of-sequence block number from the sender, or any literal data. If the file contains only in-sequence block numbers and no literal data, then there is no need to write anything. Craig -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
CB == Craig Barratt [EMAIL PROTECTED] wrote the following on Wed, 05 Feb 2003 04:41:22 -0800 CB Of course, a major issue with --inplace is that the file will be CB in an intermediate state if rsync is killed mid-transfer. Rsync CB currently ensures that every file is either the original or new. I'm curious, how does it ensure this? -- Ben Escoto msg06453/pgp0.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Wed, 5 Feb 2003, Craig Barratt wrote: Of course, a major issue with --inplace is that the file will be in an intermediate state if rsync is killed mid-transfer. Rsync currently ensures that every file is either the original or new. I hate silent corruption. Much better to have things either work, or fail in an obvious way. I don't have a need for --inplace. So someone who does have a need, should say if my reasoning makes sense in the real world. For databases, I'd want the --inplace option to rename the file before it starts changing it, and then rename it back. With that mode, rsync would ensure that every file is in one of three states: 1. original 2. new 3. gone (but not forgotten, since the intermediate state file would still be there, just with a temporary name) You wouldn't want the rename in a bunch of situations: on systems where renames are expensive where it doesn't hurt to have a mixture of old and new contents in a file for raw devices, unless you know how /dev really works on your system This means, that there should either be two --inplace options, like --inplace-rename and --inplace-norename or maybe, --inplace-dangerous and --inplace-more-dangerous or a modifier option. In many cases, writing the documentation, is more work than writing the code. -- Paul Haas [EMAIL PROTECTED] -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Wed, 5 Feb 2003, Ben Escoto wrote: CB == Craig Barratt [EMAIL PROTECTED] wrote the following on Wed, 05 Feb 2003 04:41:22 -0800 CB Of course, a major issue with --inplace is that the file will be CB in an intermediate state if rsync is killed mid-transfer. Rsync CB currently ensures that every file is either the original or new. I'm curious, how does it ensure this? During the copy, rsync writes to a temporary file in the same directory (the temp file is hidden; it starts with a .). Then, once the transfer is done, it mv's that temp file over the original. My understanding is that mv is atomic under unix, so this action either happens in its entirety or not at all. Mike -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
2003-02-05T07:41:22 Craig Barratt: The trick is that when --inplace is specified the block matching algorithm (on the sender) would only match blocks at or after that block's location (on the receiver). ... and only when the source block in question remains unchanged in the new file? No protocol change is required. But some serious cleverness, resulting in lost opportunities for reusing blocks (where the block is no longer available by the time the receiver gets to that point in the file). -Bennett msg06457/pgp0.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Wed, Feb 05, 2003 at 09:17:03AM -0800, Mike Rubel wrote: CB Of course, a major issue with --inplace is that the file will be CB in an intermediate state if rsync is killed mid-transfer. Rsync CB currently ensures that every file is either the original or new. I'm curious, how does it ensure this? During the copy, rsync writes to a temporary file in the same directory (the temp file is hidden; it starts with a .). Then, once the transfer is done, it mv's that temp file over the original. My understanding is that mv is atomic under unix, so this action either happens in its entirety or not at all. Even if mv is atomic, how does rsync make sure that the move doesn't happen before the last of the data is written to the tempfile? Does it explicity fsync the tempfile, or is there some other way it knows? -- Ben Escoto msg06459/pgp0.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Wed, Feb 05, 2003 at 10:52:45AM -0800, Ben Escoto wrote: On Wed, Feb 05, 2003 at 09:17:03AM -0800, Mike Rubel wrote: CB Of course, a major issue with --inplace is that the file will be CB in an intermediate state if rsync is killed mid-transfer. Rsync CB currently ensures that every file is either the original or new. I'm curious, how does it ensure this? During the copy, rsync writes to a temporary file in the same directory (the temp file is hidden; it starts with a .). Then, once the transfer is done, it mv's that temp file over the original. My understanding is that mv is atomic under unix, so this action either happens in its entirety or not at all. Even if mv is atomic, how does rsync make sure that the move doesn't happen before the last of the data is written to the tempfile? Does it explicity fsync the tempfile, or is there some other way it knows? Rsync doesn't fsync nor open the files O_SYNC so IF your system crashed before flushing it is possible that the file could be in a corrupted state on reboot. If your system crashes in the middle of an rsync you probably would want to rerun the rsync anyway. After all unless you are only syncing one file you probably would have an inconsistant tree (some files up-to-date, some not). Adding fsync or O_SYNC would not exactly help performance much either. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
rsync in-place (was Re: rsync 1tb+ each day)
2003-02-04T14:29:48 Kenny Gorman: Is it possible to tell rsync to update the blocks of the target file 'in-place' without creating the temp file (the 'dot file')? I can guarantee that no other operations are being performed on the file at the same time. The docs don't seem to indicate such an option. No, it's not possible, and making it possible would require a deep and fundamental redesign and re-implementation of rsync; the result wouldn't resemble the current program much. Here's a sketch of the heart of the rsync algorithm (for finer details, see the tech report available from[1]). Let's call the two endpoints the sender (who has the newer version of the file), and the receiver (who wants to update its older local copy to match that on the sender). The receiver computes checksums on each block of the destination file, and streams them to the sender. The sender finds all instances of any of those blocks in the source file. Then the sender transmits instructions to the receiver, describing how to build a spiffy new copy of the newer source file, using a mixture of actual chunks of new contents, and blocks taken from the older version of the file. The receiver follows these instructions, copying blocks as needed from the old version and combining them with the new bits to construct the new file. It's then moved into place. This algorithm by nature expects that the old version of the destination file is used as a source for taking blocks, in building the new version. Adjusting this algorithm to work in-place is non-trivial. -Bennett [1] URL:http://samba.anu.edu.au/rsync/tech_report/ msg06429/pgp0.pgp Description: PGP signature -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Tue, Feb 04, 2003 at 02:37:26PM -0500, Bennett Todd wrote: 2003-02-04T14:29:48 Kenny Gorman: Is it possible to tell rsync to update the blocks of the target file 'in-place' without creating the temp file (the 'dot file')? I can guarantee that no other operations are being performed on the file at the same time. The docs don't seem to indicate such an option. No, it's not possible, and making it possible would require a deep and fundamental redesign and re-implementation of rsync; the result wouldn't resemble the current program much. Here's a sketch of the heart of the rsync algorithm (for finer details, see the tech report available from[1]). Let's call the two endpoints the sender (who has the newer version of the file), and the receiver (who wants to update its older local copy to match that on the sender). The receiver computes checksums on each block of the destination file, and streams them to the sender. The sender finds all instances of any of those blocks in the source file. Then the sender transmits instructions to the receiver, describing how to build a spiffy new copy of the newer source file, using a mixture of actual chunks of new contents, and blocks taken from the older version of the file. The receiver follows these instructions, copying blocks as needed from the old version and combining them with the new bits to construct the new file. It's then moved into place. This algorithm by nature expects that the old version of the destination file is used as a source for taking blocks, in building the new version. Adjusting this algorithm to work in-place is non-trivial. The reason why in-place updating is difficult is that rsync expects the unchanged blocks in the old file may be relocated. Data inserted into or removed from the file does not require the rest of the file to be retransmitted. Unchanged blocks will be copied from the old locations in the old file to new locations in the new file. In-place updates requires that blocks not relocate. It may be possible by disallowing matches having differing offsets. That would require deeper investigation. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On 4 Feb 2003, jw schultz [EMAIL PROTECTED] wrote: The reason why in-place updating is difficult is that rsync expects the unchanged blocks in the old file may be relocated. Data inserted into or removed from the file does not require the rest of the file to be retransmitted. Unchanged blocks will be copied from the old locations in the old file to new locations in the new file. In-place updates requires that blocks not relocate. It may be possible by disallowing matches having differing offsets. That would require deeper investigation. Of course the other place where people want this is for transfers of block devices, where the rename is just not possible. I looked a little at doing this in librsync. The naive solution is to merely prohibit the delta from referring to blocks that have been already overwritten. I will probably eventually add at least this option. You might try this in rsync. A lot of other code to do with e.g. setting permissions makes the assumption of the rename model, though. It would take a fair amount of testing. Of course this model really falls down in some cases. Consider the case of one block inserted at the beginning. Then with the naive no backreferences approach every block will be overwritten just before it's needed. :( You can imagine a smarter algorithm that does non-sequential writes to the output so as to avoid writing over blocks that will be needed later. Alternatively, if you assume some amount of temporary storage, then it might be possible to still produce output as a stream. Really for your problem the practical solution is just to dump the whole file, perhaps allowing for sparse blocks. As other people have observed, by design rsync does a lot more disk IO than network. -- Martin -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
On Wed, Feb 05, 2003 at 12:47:49PM +1100, Martin Pool wrote: On 4 Feb 2003, jw schultz [EMAIL PROTECTED] wrote: The reason why in-place updating is difficult is that rsync expects the unchanged blocks in the old file may be relocated. Data inserted into or removed from the file does not require the rest of the file to be retransmitted. Unchanged blocks will be copied from the old locations in the old file to new locations in the new file. In-place updates requires that blocks not relocate. It may be possible by disallowing matches having differing offsets. That would require deeper investigation. Of course the other place where people want this is for transfers of block devices, where the rename is just not possible. I looked a little at doing this in librsync. The naive solution is to merely prohibit the delta from referring to blocks that have been already overwritten. I will probably eventually add at least this option. You might try this in rsync. A lot of other code to do with e.g. setting permissions makes the assumption of the rename model, though. It would take a fair amount of testing. I certainly am not interested in coding it. Too small a target use. Of course this model really falls down in some cases. Consider the case of one block inserted at the beginning. Then with the naive no backreferences approach every block will be overwritten just before it's needed. :( I was thinking more in terms of no block relocation at all. Checksums only match if at the same offset. The receiver simply discards (or never gets) info about blocks that are unchanged. It would just lseek and write with a possible truncate at the end. You can imagine a smarter algorithm that does non-sequential writes to the output so as to avoid writing over blocks that will be needed later. Alternatively, if you assume some amount of temporary storage, then it might be possible to still produce output as a stream. I really doubt it is worthwhile doing to rsync. This principly applies to block oriented files such as devices and database files. For the most part rsync handles these fine. If someone really does feel they must have this i'd suggest creating a different tool just for this job. It could operate on just one file at a time and either be smart enough to suss the optimal block size by knowing the file type or accept a block size from the command-line. Block sizes for this would clearly be of the power-of-two variety. -- J.W. SchultzPegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html
Re: rsync in-place (was Re: rsync 1tb+ each day)
jw schultz wrote: I was thinking more in terms of no block relocation at all. Checksums only match if at the same offset. The receiver simply discards (or never gets) info about blocks that are unchanged. It would just lseek and write with a possible truncate at the end. This would seem to help a lot on larger database files. Why look at a 700 byte block of data from a source file and try to find a matching block by fully scanning block checksums at all offsets in a 8G destination datafile? And then doing it again for every 700 bytes? (I read the rsync technical paper -- but I might be confused) In the case of Oracle data files the only place a meaningful/syncable delta will occur is at the same offset. Yes this is a special case -- but it has the potential to really help in rsyncing oracle datafiles during a hotbackup or when syncing from a snapshot to nearstore storage. This approach should be faster than the -W option for very large Oracle datafiles (which often have small amounts of changed blocks). It should also be faster than deleting the destination files and resending (-W) like has been suggested. You can imagine a smarter algorithm that does non-sequential writes to the output so as to avoid writing over blocks that will be needed later. Alternatively, if you assume some amount of temporary storage, then it might be possible to still produce output as a stream. I really doubt it is worthwhile doing to rsync. This principly applies to block oriented files such as devices and database files. For the most part rsync handles these fine. agreed. The original post still raises an interesting issue -- it should not be faster to remove destination files before running rsync. That is counter to one of the main purposes of rsync -- efficiently detect and send only the deltas. eric -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html