People have proposed network-endianness, ascii fields, etc. Here's a straw-man proposal on handling this for people to criticize, ignite, feed to horses, etc. I don't have any specific numbers to back it up, so take it with a grain of salt. Experiments would be pretty straightforward.
Swabbing to/from network endianness is very cheap. On 486s and higher it is a single inlined instruction and I think takes about one cycle. On non-x86 it is free. The cost is barely worth considering: if you are flipping words as fast as you can you will almost certainly be limited by memory bandwidth, not by the work of swapping them. BER-style variable length fields, on the other hand, are very intensive, because you need to look at the top bit, mask it, shift, continue. If you're going to use a protocol that difficult, I think you might as well use ASCII hex or decimal numbers. All other things being equal having a readable protocol is good. A little redundancy in the protocol can help make it readable and also help detect errors. For example, distcc's 4-char commands make it easy for humans to visually parse a packet, and they make errors in transmission almost always immediately cause an error. At the same time they're cheap to process -- it's just a uint32 compare. Arguably we should use x86-endianness because it's the most common architecture at the moment, but I don't think the performance justifies using something non-standard. Anyhow, I would hope that if it gets off the ground, this protocol might still be in use in ten years, in which time x86 may no longer be dominant. Bigendian also has the minor advantage that it's easier to read in packet dumps. Negotiated protocols are a bad idea because they needlessly multiply the test domain. Samba has to deal with Microsoft protocols which are in theory negotiated-endian, but in practice of course Microsoft never test anything but Intel, so BE support is broken and people writing non-x86 servers need to negotiate Intel endianness. Even assuming we're smarter than they are, I don't think we need to make our lives difficult in this way. Lempel-Ziv is ideal for the exact case of compressing 0x0000000000000001 into a couple of bits. Even a very cheap compressor such as lzo (about half the speed of memcpy) will do well on that kind of case; presumably numbers like uint64 0, 1, 2, etc will occur often in packet headers and get tightly compressed. I think it will probably deal with filenames for us too. So, as a straw man: - use XDR-like network-endian 32 and 64 bit fields - keep all fields 4-byte aligned - make strings int32 length-preceded, and padded to a 4-byte boundary - don't worry about interning or compressing filenames, just send then as plain UTF-8 relative to a working directory - send things like usernames as strings too - make operation names (or whatever) be human-readable, either variable-length strings or 4-byte tokens that happen to be readable as ascii -- Martin -- To unsubscribe or change options: http://lists.samba.org/mailman/listinfo/rsync Before posting, read: http://www.tuxedo.org/~esr/faqs/smart-questions.html