The following function interpolates bytes, bytearrays, and formatted strings, the latter two auto-converted to bytes, into a bytes (or auto-converted bytearray) format. This function automates much of what some people have recommended for combining ascii text and binary blogs. The test passes on 2.7.6 as well as 3.3.3, though a 2.7-only version would be simpler.
===============

# bf.py -- Terry Jan Reedy, 2014 Jan 11
"Define byteformat(): a bytes version of str.format as a function."
import re

def byteformat(form, obs):
    '''Return bytes-formated objects interpolated into bytes format.

    The bytes or bytearray format has two types of replacement fields.
    b'{}' and b'{:}': The object can be any raw bytes or bytearray object.
    b'{:<format_spec>}: The object can by any object ob that can be
    string-formated with <format_spec>. Bytearray are converted to bytes.

    The text encoding is the default (encoding="utf-8", errors="strict").
    Users should be explicitly encode to bytes for any other encoding.
    The struct module can by used to produce bytes, such as binary-formated
    integers, that are not encoded text.

    Test passes on both 2.7.6 and 3.3.3.
    '''

    if isinstance(form, bytearray):
        form = bytes(form)
    fields = re.split(b'{:?([^}]*)}', form)
    # print(fields)
    if len(fields) != 2*len(obs)+1:
raise ValueError('Number of replacement fields not same as len(obs)')
    j = 1 # index into fields
    for ob in obs:
        if isinstance(ob, bytearray):
            ob = bytes(ob)
        field = fields[j]
        fields[j] = format(ob, field.decode()).encode() if field else ob
        j += 2
    return b''.join(fields)

# test code
bformat = b"bytes: {}; bytearray: {:}; unicode: {:s}; int: {:5d}; float: {:7.2f}; end"
objects = (b'abc', bytearray(b'def'), u'ghi', 123, 12.3)
result = byteformat(bformat, objects)
result2 = byteformat(bytearray(bformat), objects)
strings = (ob.decode()  if isinstance(ob, (bytes, bytearray)) else ob
               for ob in objects)
expect = bformat.decode().format(*strings).encode()

#print(result)
#print(result2)
print(expect)
assert result == result2 == expect

=====
This has been edited from what I posted to issue 3982 to expand the docstrings and to work the same with both bytes and bytearrays on both 2.7 and 3.3. When I posted before, I though of it merely as a proof-of-concept prototype. After reading the seemingly endless discussion of possible variations of byte formatting with % and .format, I now present it as a real, concrete, proposal.

There are, of course, details that could be tweaked. The encoding uses the default, which on 3.x is (encoding='utf-8', errors='strict'). This could be changed to an explicit encoding='ascii'. If that were done, the encoding could be made a parameter that defaults to 'ascii'. The joiner could be defined as type(form)() so the output type matches the input form type. I did not do that because it complicates the test.

The coercion of interpolated bytearray objects to bytes is needed for 2.7 because in 2.7, str/bytes.join raises TypeError for bytearrays in the input sequence. A 3.x-only version could drop this.

One objection to the function is that it is neither % or .format. To me, this is an advantage in that a new function will not be expected to exactly match the % or .format behavior in either 2.x or 3.x. It eliminates the 'matching the old' arguments so we can focus on what actual functionality is needed. There is no need to convert true binary bytes to text with either latin-1 or surrogates. There is no need to add anything to bytes. The code above uses the built-in facilities that we already have, which to me should be the first thing to try, not the last.

One new feature that does not match old behavior is that {} and {:} are changed (in 3.x) to indicate bytes whereas {:s} continues to indicate (in 3.x) unicode text. ({:s} might be changed to mean unicode for 2.7 also, but I did not explore that idea.) Similarly, a new function is free to borrow only the format_spec part of replace of replacement fields and use format(ob, format_spec) to format each object. Anyone who needs the full power of str.format is free to use it explicitly. I think format_specs cover most of what people have asked for.

For future releases, the function could go in the string module. It could otherwise be added to existing or future 2&3 porting packages.

--
Terry Jan Reedy

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to