I posted this on SO, but… yeah… 

I'm doing some serial protocol stuff and want to implement a basic byte 
stuffing algorithm in python. Though really what this really generalizes to is 
“what is the most pythonic way to transform one sequence of bytes where some 
bytes are passed through 1:1, but others are transformed to longer subsequences 
of bytes?” I’m pretty sure this rules out the use of transform() which expects 
a 1:1 mapping.


So far, I've come with 5 different approaches, and each of them has something I 
don't like about it:

1 Via Generator

def stuff1(bits):
    for byte in bits:
        if byte in _EscapeCodes:
            yield PacketCode.Escape
            yield byte ^ 0xFF
        else:
            yield byte

This may be my favorite, but maybe just because I'm kind of fascinated by yield 
based generators. I worried that the generator would make it slow, but it's 
actually the second fastest of the bunch.


2 Simply bytes()

def stuff2(bits):
    result = bytes()
    for byte in bits:
        if byte in _EscapeCodes:
            result += bytes([PacketCode.Escape, byte ^ 0xFF])
        else:
            result += bytes([byte])
        return result

Constantly has to create single element arrays just to throw them out because 
I'm not aware of any "copy with one additional element" operation. It ties for 
the slowest of the bunch.


3 Use bytearray()

def stuff3(bits):
    result = bytearray()
    for byte in bits:
        if byte in _EscapeCodes:
            result.append(PacketCode.Escape)
            result.append(byte ^ 0xFF)
        else:
            result.append(byte)
    return result

Seems better than the direct bytes() approach. Actually slower than the yield 
generator and can do one byte at a time (instead of needing intermediate 1 
element collections). But it feels brutish. It's middle of the pack performance.


4 BytesIO()

def stuff4(bits):
    bio = BytesIO()
    for byte in bits:
        if byte in _EscapeCodes:
            bio.write(bytes([PacketCode.Escape, byte ^ 0xFF]))
        else:
            bio.write(bytes([byte]))
    return bio.getbuffer()

I like the stream based approach here. But it is annoying that there doesn't 
seem to be something like a write1() API that could just add 1 byte, so I have 
to make those intermediate bytes again. If there was a "write single byte", I'd 
like this one. It ties for slowest.


5 Use replace()

def stuff5(bits):
    escapeStuffed = bytes(bits).replace(bytes([PacketCode.Escape]), 
bytes([PacketCode.Escape, PacketCode.Escape ^ 0xFF]))
    stopStuffed= escapeStuffed.replace(bytes([PacketCode.Stop]), 
bytes([PacketCode.Escape, PacketCode.Stop ^ 0xFF]))
    return stopStuffed.replace(bytes([PacketCode.Start]), 
bytes([PacketCode.Escape, PacketCode.Start ^ 0xFF]))

This is the fastest. But I don't like the way the code reads and the 
intermediate sweeps.
-- 
https://mail.python.org/mailman/listinfo/python-list

Reply via email to