Re: inefficient: --checksum calculation shouldn't be done for new files
Wayne Davison (way...@samba.org) wrote on 4 July 2011 17:10: >On Sat, Jul 2, 2011 at 5:46 PM, Carlos Carvalho wrote: > >When --checksum is used they're calculated in both ends to see if the file >should be transfered. This is of course not necessary if the file doesn't >exist in the destination. However, the checksum is still calculated by the >sender, which is often a very large overhead. > >Would it be possible to avoid it? > > >To do so would involve adding an extra round-trip request to a transfer, so it >is feasible, but is not currently supported. [...] >That all sounds interesting, but would require a new >--favor-missing-files (or some such) option to tell rsync to use the >alternate checksum method. It would be interesting to try something >like that and see how much time it saves in checksum generating vs >time it consumes in round-trip lag. Understood. I asked just in case it was an easy optimization but it looks like some significant complication for a rather rare use. >As for what is currently possible, see the patches/db.diff, patches/ >checksum-reading.diff, patches/checksum-updating.diff, and >(possibly) patches/ checksum-xattrs.diff patches for example ways to >make the checksum sending more efficient. I can't use this because we're the destination and the origins are spread all over the world. Instead I've separated the files we need to checksum and do them in a different rsync run. Thanks for the detailed explanation. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: inefficient: --checksum calculation shouldn't be done for new files
On Sat, Jul 2, 2011 at 5:46 PM, Carlos Carvalho wrote: > When --checksum is used they're calculated in both ends to see if the file > should be transfered. This is of course not necessary if the file doesn't > exist in the destination. However, the checksum is still calculated by the > sender, which is often a very large overhead. > > Would it be possible to avoid it? To do so would involve adding an extra round-trip request to a transfer, so it is feasible, but is not currently supported. Such a feature would look like this: Instead of the sender including checksum information for all files in the file-list, it would send a checksum-less list, and let the generator look for files that already exist on the receiver, at which point the generator would send a new request to the sender to ask for the checksum for the file. While waiting for the sender's checksum, it would compute its own checksum on the current file and then wait around for the sender's value, at which point it would compare them, and either request a file transfer or handle the up-to-date file. That all sounds interesting, but would require a new --favor-missing-files (or some such) option to tell rsync to use the alternate checksum method. It would be interesting to try something like that and see how much time it saves in checksum generating vs time it consumes in round-trip lag. As for what is currently possible, see the patches/db.diff, patches/checksum-reading.diff, patches/checksum-updating.diff, and (possibly) patches/checksum-xattrs.diff patches for example ways to make the checksum sending more efficient. This presumes that the sender is something that has rarely-changing files, and that caching the checksums based on a more stringent time method (down to the ctime in the case of the first 3 patches) is something that would help your use case. Of all those choices, using db.diff (possibly with a sqlite DB) is a pretty nice solution that could speed up checksum transfers by a huge amount while also avoiding sprinkling around a bunch of checksum files (or xattrs). ..wayne.. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: inefficient: --checksum calculation shouldn't be done for new files
Jamie Lokier (ja...@shareable.org) wrote on 4 July 2011 00:00: >Carlos Carvalho wrote: >> When --checksum is used they're calculated in both ends to see if the >> file should be transfered. This is of course not necessary if the file >> doesn't exist in the destination. However, the checksum is still >> calculated by the sender, which is often a very large overhead. >> >> Would it be possible to avoid it? > >Doesn't the receiver use the checksum to verify it received the file >with no errors? Yes, but this always happens, not only with -c. That checksum is calculated during the download and has no overhead. The one with -c is done before, just to decide whether to download. -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
Re: inefficient: --checksum calculation shouldn't be done for new files
Carlos Carvalho wrote: > When --checksum is used they're calculated in both ends to see if the > file should be transfered. This is of course not necessary if the file > doesn't exist in the destination. However, the checksum is still > calculated by the sender, which is often a very large overhead. > > Would it be possible to avoid it? Doesn't the receiver use the checksum to verify it received the file with no errors? -- Jamie -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html
inefficient: --checksum calculation shouldn't be done for new files
When --checksum is used they're calculated in both ends to see if the file should be transfered. This is of course not necessary if the file doesn't exist in the destination. However, the checksum is still calculated by the sender, which is often a very large overhead. Would it be possible to avoid it? -- Please use reply-all for most replies to avoid omitting the mailing list. To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html