This feels chatty. I'd like the PEP to call out the specific proposals and put the more verbose motivation later. It took me a long time to realize that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3). Also your mention of bytes.byte() as the counterpart to ord() confused me -- I think it's more similar to chr(). I don't like iterbytes as a builtin, let's keep it as a method on affected types.
On Thu, Aug 14, 2014 at 10:50 PM, Nick Coghlan <ncogh...@gmail.com> wrote: > I just posted an updated version of PEP 467 after recently finishing > the updates to the Python 3.4+ binary sequence docs to decouple them > from the str docs. > > Key points in the proposal: > > * deprecate passing integers to bytes() and bytearray() > * add bytes.zeros() and bytearray.zeros() as a replacement > * add bytes.byte() and bytearray.byte() as counterparts to ord() for > binary data > * add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes() > > As far as I am aware, that last item poses the only open question, > with the alternative being to add an "iterbytes" builtin with a > definition along the lines of the following: > > def iterbytes(data): > try: > getiter = type(data).__iterbytes__ > except AttributeError: > iter = map(bytes.byte, data) > else: > iter = getiter(data) > return iter > > Regards, > Nick. > > PEP URL: http://www.python.org/dev/peps/pep-0467/ > > Full PEP text: > ============================= > PEP: 467 > Title: Minor API improvements for bytes and bytearray > Version: $Revision$ > Last-Modified: $Date$ > Author: Nick Coghlan <ncogh...@gmail.com> > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 2014-03-30 > Python-Version: 3.5 > Post-History: 2014-03-30 2014-08-15 > > > Abstract > ======== > > During the initial development of the Python 3 language specification, the > core ``bytes`` type for arbitrary binary data started as the mutable type > that is now referred to as ``bytearray``. Other aspects of operating in > the binary domain in Python have also evolved over the course of the Python > 3 series. > > This PEP proposes a number of small adjustments to the APIs of the > ``bytes`` > and ``bytearray`` types to make it easier to operate entirely in the binary > domain. > > > Background > ========== > > To simplify the task of writing the Python 3 documentation, the ``bytes`` > and ``bytearray`` types were documented primarily in terms of the way they > differed from the Unicode based Python 3 ``str`` type. Even when I > `heavily revised the sequence documentation > <http://hg.python.org/cpython/rev/463f52d20314>`__ in 2012, I retained > that > simplifying shortcut. > > However, it turns out that this approach to the documentation of these > types > had a problem: it doesn't adequately introduce users to their hybrid > nature, > where they can be manipulated *either* as a "sequence of integers" type, > *or* as ``str``-like types that assume ASCII compatible data. > > That oversight has now been corrected, with the binary sequence types now > being documented entirely independently of the ``str`` documentation in > `Python 3.4+ < > https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview > >`__ > > The confusion isn't just a documentation issue, however, as there are also > some lingering design quirks from an earlier pre-release design where there > was *no* separate ``bytearray`` type, and instead the core ``bytes`` type > was mutable (with no immutable counterpart). > > Finally, additional experience with using the existing Python 3 binary > sequence types in real world applications has suggested it would be > beneficial to make it easier to convert integers to length 1 bytes objects. > > > Proposals > ========= > > As a "consistency improvement" proposal, this PEP is actually about a few > smaller micro-proposals, each aimed at improving the usability of the > binary > data model in Python 3. Proposals are motivated by one of two main factors: > > * removing remnants of the original design of ``bytes`` as a mutable type > * allowing users to easily convert integer values to a length 1 ``bytes`` > object > > > Alternate Constructors > ---------------------- > > The ``bytes`` and ``bytearray`` constructors currently accept an integer > argument, but interpret it to mean a zero-filled object of the given > length. > This is a legacy of the original design of ``bytes`` as a mutable type, > rather than a particularly intuitive behaviour for users. It has become > especially confusing now that some other ``bytes`` interfaces treat > integers > and the corresponding length 1 bytes instances as equivalent input. > Compare:: > > >>> b"\x03" in bytes([1, 2, 3]) > True > >>> 3 in bytes([1, 2, 3]) > True > > >>> bytes(b"\x03") > b'\x03' > >>> bytes(3) > b'\x00\x00\x00' > > This PEP proposes that the current handling of integers in the bytes and > bytearray constructors by deprecated in Python 3.5 and targeted for > removal in Python 3.7, being replaced by two more explicit alternate > constructors provided as class methods. The initial python-ideas thread > [ideas-thread1]_ that spawned this PEP was specifically aimed at > deprecating > this constructor behaviour. > > Firstly, a ``byte`` constructor is proposed that converts integers > in the range 0 to 255 (inclusive) to a ``bytes`` object:: > > >>> bytes.byte(3) > b'\x03' > >>> bytearray.byte(3) > bytearray(b'\x03') > >>> bytes.byte(512) > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > ValueError: bytes must be in range(0, 256) > > One specific use case for this alternate constructor is to easily convert > the result of indexing operations on ``bytes`` and other binary sequences > from an integer to a ``bytes`` object. The documentation for this API > should note that its counterpart for the reverse conversion is ``ord()``. > The ``ord()`` documentation will also be updated to note that while > ``chr()`` is the counterpart for ``str`` input, ``bytes.byte`` and > ``bytearray.byte`` are the counterparts for binary input. > > Secondly, a ``zeros`` constructor is proposed that serves as a direct > replacement for the current constructor behaviour, rather than having to > use > sequence repetition to achieve the same effect in a less intuitive way:: > > >>> bytes.zeros(3) > b'\x00\x00\x00' > >>> bytearray.zeros(3) > bytearray(b'\x00\x00\x00') > > The chosen name here is taken from the corresponding initialisation > function > in NumPy (although, as these are sequence types rather than N-dimensional > matrices, the constructors take a length as input rather than a shape > tuple) > > While ``bytes.byte`` and ``bytearray.zeros`` are expected to be the more > useful duo amongst the new constructors, ``bytes.zeros`` and > `bytearray.byte`` are provided in order to maintain API consistency between > the two types. > > > Iteration > --------- > > While iteration over ``bytes`` objects and other binary sequences produces > integers, it is sometimes desirable to iterate over length 1 bytes objects > instead. > > To handle this situation more obviously (and more efficiently) than would > be > the case with the ``map(bytes.byte, data)`` construct enabled by the above > constructor changes, this PEP proposes the addition of a new ``iterbytes`` > method to ``bytes``, ``bytearray`` and ``memoryview``:: > > for x in data.iterbytes(): > # x is a length 1 ``bytes`` object, rather than an integer > > Third party types and arbitrary containers of integers that lack the new > method can still be handled by combining ``map`` with the new > ``bytes.byte()`` alternate constructor proposed above:: > > for x in map(bytes.byte, data): > # x is a length 1 ``bytes`` object, rather than an integer > # This works with *any* container of integers in the range > # 0 to 255 inclusive > > > Open questions > ^^^^^^^^^^^^^^ > > * The fallback case above suggests that this could perhaps be better > handled > as an ``iterbytes(data)`` *builtin*, that used ``data.__iterbytes__()`` > if defined, but otherwise fell back to ``map(bytes.byte, data)``:: > > for x in iterbytes(data): > # x is a length 1 ``bytes`` object, rather than an integer > # This works with *any* container of integers in the range > # 0 to 255 inclusive > > > References > ========== > > .. [ideas-thread1] > https://mail.python.org/pipermail/python-ideas/2014-March/027295.html > .. [empty-buffer-issue] http://bugs.python.org/issue20895 > .. [GvR-initial-feedback] > https://mail.python.org/pipermail/python-ideas/2014-March/027376.html > > > Copyright > ========= > > This document has been placed in the public domain. > > -- > Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org > -- --Guido van Rossum (python.org/~guido)
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com