Maybe don't sync one big file, hack the image up in small chunks, then whatever the gap size is rsync might have a bigger chance to resync with including --fuzzy. Though it might not help at all since the number of files would be large.

IF it is only a "once every few month" thing you could: Defrag the machine (virtual or real, no difference) including the MFT and let the defrag tool put everything at the start of the drive. Ultradefrag and Auslogics (free) Defrag are both usable programs, though the latter does not defrag the MFT, but is _way_ more efficient for the rest..
THEN create the "before" image, install the program, and create the "after" image.

For everything beyond that you need a different approach, but that requires a _lot_ more knowledge about your setup, how those XP machines run etc.

regards,

Joachim Otahal

Matt Van Mater schrieb:
I agree with your assessment somewhat Joachim and think you're following the same line of reasoning as I am.  Some details I did not include in my first post:

FOG/partimage does indeed only capture the used blocks in its images when you select "ntfs - resizable".  So running a clean utility (e.g. writing zeros to free space) will not make an impact because partimage does not copy those blocks anyway.  However, the technique you describe would be useful if I was using dd to capture the image.  I am unsure how large a block size partimage uses when copying only the used blocks, so it takes some trial and error to determine the appropriate block size within rsync/rdiff.

Regarding the size of the delta, I had the same exact thought... I have a hunch that the new file I downloaded was included in the middle of the partimage image file and that rsync somehow was not able to associate the last 6.9 GB after the "gap" as existing content.

Regarding the out of memory error, this occurs immediately after executing the command, it does not run for a while and then fail.  It is one reason I gave my VM a very large amount of RAM to compute the deltas; to ensure that it did not run out due to a memory leak or something like that.  The command dies so quickly I am confident that it couldn't even have a chance to consume the entire 16 GB of RAM... it isn't running out of memory, but seems to be some other memory allocation error.

I don't think the fuzzy option will help me, but it is on my list of options to try.  Unfortunately any test I perform takes a long time to complete due to the size of the image, so it will be a little while before i can report the results of the test.

And in case someone asks "why don't you use rdiff if that seems to work for you?", I would have to install that software on over 325 remote servers over satellite.  I would MUCH prefer to not touch the remote servers and be able to use the existing rsync software.

Matt

On Tue, Mar 20, 2012 at 3:10 PM, Joachim Otahal (privat) <j...@gmx.net> wrote:
Matt Van Mater schrieb:

      1. image1 size in bytes: 17,062,442,700
      2. image2 size in bytes: 16,993,256,652

about 70 MB of change between a boot with a small program install. That is realistic. This also means: FOG/Partimage only captures the used sectors.
IF you would capture ALL sectors (used and unused) the rsync difference would be those about 70 MB. You shuld run a "clean slack" utility before imaging though, like the microsoft precompact.exe (supplied with Virtual PC 2007).

But here it looks like this: about the first half of the image contain sectors which were not changed between the reboots.
Then, in the middle of the image, a few bytes (~70 MB) got added, and rsync cannot get a match across that 70 MB gap and therefore treats everything after that as "new".



  1. Command:
    1. rsync --block-size=512 –only-write-batch=img1toimg2_diff image2 image1
  2. Error message:
    1. ERROR: Out of memory in receive_sums [sender]
    2. rsync error: error allocating core memory buffers (code 22) at util.c(117) [sender=3.0.7]

A block size below the cluster size doesn't make much sense, it only wastes your memory. Hence the out of memory problem, let your taskmanager run while doing that and you'll see. AFAIK rsync adjusts the block size dynamically, uses large blocks (several MB) if there is no change, and switches down to small blocks then there is a change to keep the amount of data to transfer low.

What I cannot tell: A option to tell rsync to try harder to search for a match within one big file, across a larger desynced region. I only know and use "--fuzzy" which only helps on large amounts of files, and only makes sense on slow connections.

Joachim

--
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html


-- 
Please use reply-all for most replies to avoid omitting the mailing list.
To unsubscribe or change options: https://lists.samba.org/mailman/listinfo/rsync
Before posting, read: http://www.catb.org/~esr/faqs/smart-questions.html

Reply via email to