On Fri, Jan 12, 2001 at 04:03:24PM +1100, Martin Pool wrote:
[Rsync 3.0]
> It will be a more standard request-response protocol, with some kind
> of stream multiplexing inside a single tcp or ssh connection.
> Requests might be `LIST n FILES', or `DELTA file basis'. It's still
> being discussed -- unfortunately largely face-to-face in Canberra.
> rusty has some pre-release code for the framework, and I have part
> of the encoding library in libhsync.
[snip]
> Your suggestions are welcome.
Well, this is probably not too exciting, but I'd like to mention it
nevertheless: The quality of the rolling checksum can be increased a
bit by not just accumulating the byte values, and neither with a
non-zero CHAR_OFFSET (cf. current rsync sources). Instead, use a
256-word lookup table and accumulate the contents of the word at index
x instead of x itself. E.g., adding one byte to the end of the
checksum then becomes:
uint32 a = sum;
uint32 b = sum >> 16;
a += charTable[x];
b += a;
sum = ((a & 0xffff) + (b << 16)) & 0xffffffff;
The values in the table should be purely random.
This adds an overhead of one table lookup per byte. Since the table
ends up in the processor's cache at some point, this should not
constitute much of a performance hit.
BTW, I'm using this modified rolling checksum algorithm in a program
of mine, but since the checksum's size is not important there, I
extended it to 64 bit. IMHO, the modification is absolutely necessary
with 64 bits to get enough entropy into the top bits of the two
halves; to a lesser degree, this might also apply to rsync's 32 bit
version.
Cheers,
Richard
--
__ _
|_) /| Richard Atterer
| \/¯| http://atterer.net
¯ ´` ¯