On 13 Feb 2006 13:13:51 -0800 Paul Rubin <"http://phr.cx"@NOSPAM.invalid> wrote: > "VSmirk" <[EMAIL PROTECTED]> writes: > > Aweseme!!! I got as far as segmenting the large file on > > my own, and I ran out of ideas. I kind of thought about > > checksum, but I never put the two together. > > > > Thanks. You've helped a lot.... > > The checksum method I described works ok if bytes change > in the middle of the file but don't get inserted (piecs of > the file don't move around). If you insert on byte in the > middle of a 1GB file (so it becomes 1GB+1 byte) then all > the checksums after the middle block change, which is no > good for your purpose.
But of course, the OS will (I hope) give you the exact length of the file, so you *could* assume that the beginning and end are the same, then work towards the middle. Somewhere in between, when you hit the insertion point, both will disagree, and you've found it. Same for deletion. Of course, if *many* changes have been made to the file, then this will break down. But then, if that's the case, you're going to have to do an expensive transfer anyway, so expensive analysis is justified. In fact, you could proceed by analyzing the top and bottom checksum lists at the point of failure -- download that frame, do a byte by byte compare and see if you can derive the frameshift. Then compensate, and go back to checksums until they fail again. Actually, that will work just coming from the beginning, too. If instead, the region continues to be unrecognizeable to the end of the frame, then you need the next frame anyway. Seems like it could get pretty close to optimal (but we probably are re-inventing rsync). Cheers, Terry -- Terry Hancock ([EMAIL PROTECTED]) Anansi Spaceworks http://www.AnansiSpaceworks.com -- http://mail.python.org/mailman/listinfo/python-list