Google Protocol Buffers use something similar, which they call " <https://developers.google.com/protocol-buffers/docs/encoding#top_of_page>Base 128 Varints"
https://developers.google.com/protocol-buffers/docs/encoding#varints I prefer the way this handles negative numbers. - Andrew On 10 July 2018 at 02:53, MRAB <pyt...@mrabarnett.plus.com> wrote: > In the regex module I use an encoding scheme when pickling pattern objects > which is based on the way MIDI encodes variable-length integers, and I > think it might have a use in a pickle protocol. > > In the basic format, an integer is split up into 7-bit chunks, each chunk > put into a byte, and the most-significant bit of the byte used to signal > whether the value continues into the following byte. > > And integer must be encoded into the minimum number of bytes, so an > encoded sequence of bytes would never start with 0x80. > > MIDI deviates from the basic idea in order to reduce the number of bytes, > so as sequence of bytes in MIDI _can_ start with x080; this is fine for > MIDI because it doesn't need to represent negative integers. > > The format I'm suggesting uses an initial 0x80 as a way of letting it > encode negative integers. > > Here are a couple of Python functions that summarise the encoding and > decoding (minus obvious optimisations for simplicity): > > > def encode_varint(value: int) -> List[int]: > negative = value < 0 > encoded = [] > > if negative: > final = -1 > else: > final = 0 > > while True: > encoded.append(0x80 | (value & 0x7F)) > value >>= 7 > > if value == final: > break > > if negative: > encoded.append(0x80) > > encoded.reverse() > > encoded[-1] &= 0x7F > > return encoded > > > def decode_varint(encoded: Iterable[int]) -> int: > byte = next(encoded) > > if byte == 0x80: > value = -1 > byte = next(encoded) > else: > value = 0 > > value = (value << 7) | (byte & 0x7F) > > while (byte & 0x80) != 0: > byte = next(encoded) > value = (value << 7) | (byte & 0x7F) > > return value > > > The advantage of encoding integers in this way is that there's no limit to > their size, so there's no need to add a new protocol to support larger > counts. > > They can also make pickles smaller. > > Example: > > # Pickle (None, ) > 0: \x80 PROTO 4 > 2: \x95 FRAME 4 > 11: N NONE > 12: \x85 TUPLE1 > 13: \x94 MEMOIZE (as 0) > 14: . STOP > > Here, FRAME takes an argument of 8 bytes. If you replaced FRAME with a > version that accepted a variable-length count, you could reduce that > argument to 1 byte. > > You also wouldn't need to have different fixed-length versions of an > opcode, e.g. BINUNICODE and SHORT_BINUNICODE. > > Whether you do anything with this is entirely up to the core devs, I just > thought someone might find it useful. > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/lists% > 40andros.org.uk >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com