On Tue, Aug 26, 2003 at 11:28:12AM -0700, Jon Howell wrote: > So I was transferring a 2GB virtual machine disk image image over a slow > wireless link. Of course I used --sparse, to keep the image small on the > destination end as well as on the source end. > > Much to my surprise, I noticed that the transfer took a long time even > when it got past the first 0.5GB of actually-populated file. A little > sleuthing with strace revealed that the source rsync was dutifully reading > block after block of zeros, sending them to ssh, who compressed them and > send them across the wire(less), where another rsync got the zero blocks, > realized that they were sparse, and just bode its time until it could do > one big seek to the next non-sparse block. ("bode its time"? Who writes > like that?)
Had you been updating an existing image file it would have the blocks of zeroes would have had matches and not been sent. A workaround if you do this again in future would be to create an original file full of zeros. dd if=/dev/zero of=$dest bs=1024 count=$block_size > > Of course, it never survived to see that moment; a cruel SIGINT arrived > and dispatched both rsyncs. > > It seems like the right thing would be for the local end to skim past the > zero blocks and send some metainformation, to avoid encrypting and > transferring many GB of zeros. > > I worked around the problem by adding -z to compress the stream first > (blocks of zeros compress remarkably well), and that made the virtual disk > image transfer go much faster. Of course, all of the .tgzs and .tbzs in > the same transfer got slower waiting on the source CPU to compress the > incompressible. That is what i would have recommended. > The obvious solution is to <music type=organ register=bass>change the > protocol</music>, but that seems like a scary thing to do for a > performance tweak. What about an option for "really-crappy-compression"? > Something really cheezy (RLE) that can decide in a hurry whether to > compress away a string of zeros, and if not, just send them raw. That way, > performance on compressed files stays I/O bound even on systems with pokey > CPUs, but sparse files are disk-bound on the source system (as they should > be). (And, of course, --sparse would automatically promote the compression > level to "really-crappy" if it was at "none" before.) This is really only an issue when rsync hits a new file. I agree an RLE of the stream _sounds_ lika a good idea. But even better might be an extra phantom block that represents all zeros. That too would require a protocol bump. > Well, okay, they shouldn't even be disk bound; the source system should be > able to discover the sparsity of the file without making 1.5GB-worth of > read calls. Does POSIX (or do specific OSes) offer a call that provides a > map of allocated regions in the file? There is no way in user-mode to distinguish between a sparse file and a file full of zeroed blocks. > Source rsync: 2.5.6 > Destination rsync: 2.5.5 > Diligence: I searched for 'sparse' in the faqomatic, the bug database, the > current issues page, the TODO document, and the mailing list archive, and > didn't find anything relevant; please don't flame if I missed an existing > comment. > > Thanks! > > --Jon > > > -- > To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync > Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html > -- ________________________________________________________________ J.W. Schultz Pegasystems Technologies email address: [EMAIL PROTECTED] Remember Cernan and Schmitt -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html