The variable-length encoding allows multiple representations of the same value - for example, 1 could be written as:
0000 0001 or it could be (I believe): 1000 0001 1000 0000 0000 0000 Is there anything in the core (or other) implementations that would object to this second form? In particular, the scenario I'm thinking about is where a message is know to be pretty deep - for example: A > B > C > D > D > C > D > D At the moment, my code leaves an optimistic single-byte dummy-value for the prefix, and then backfills this value when the length is known (i.e. after writing this subtree), shuffling the data if needed. I'm toying with making this voluntarily lossy; for example it might (in some cases) decide to leave a longer (2-5) dummy value, and write the alternative form if the value turns out to be small (i.e. if actually only 4 bytes were written). This would reduce the number of times I need to block-copy the data (noting that it might have to copy the same data multiple times - potentially every non-root object might be more than 127 bytes). I'm tempted to spike it (to gauge the performance benefit), but I don't want to waste my time if this is going to make it incompatible with the core implementations (i.e. if they would actively spot this and cause an error). And if it *is* valid, would it be possible to make it explicit in the encoding spec? (or indeed, if it *isn't* valid make it explicit in the encoding spec). Thanks, Marc Gravell -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to proto...@googlegroups.com. To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.