Terry, Yeah, I was sketching out a scenario much like that. It does break things down pretty well, and that gets my file sync scenario up to much larger files. Even if many changes are made to a file, if you keep track of the number of bytes and checksum over from 1 to the number of bytes different by shifting the sequence ( that is [abcd]ef, a[bced]f, ab[cdef]), until a checksum is a match again, you should be able to find some point where the checksums match again and you can continue up (or down) doing only the checksums again without all the overhead.
The question in my mind that I will have to test is how much overhead this causes. One of the business rules underlying this task is to work with files that are being continuously written to, say by logging systems or database servers. This brings with it some obvious problems of file access, but even in cases where you don't have file access issues, I am very concerned about race conditions where one of the already-handled blocks of data are written to. The synched copy on the remote system now no longer represents a true image of the local file. This is one of the reasons I was looking into a device-level solution that would let me know when a hard disk write had occurred. One colleagues suggested I was going to have to write assembler to do this, and I may have to ultimately just use the solutions described here for files that don't have locking and race-condition issues. Regardless, it's a fun project, and I have to say this list is one of the more polite lists I've been involved with. Thanks! V -- http://mail.python.org/mailman/listinfo/python-list