subject:"\[Python\-Dev\] PEP 467\: Minor API improvements for bytes bytearray"

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-07-18 Thread Ethan Furman


On 06/07/2016 02:34 PM, Koos Zevenhoven wrote:


Why not bytes.viewbytes (or whatever name) so that one could also
subscript it? And if it were a property, one could perhaps
conveniently get the n'th byte:

b'abcde'.viewbytes[n]   # compared to b'abcde'[n:n+1]


AFAICT, 'viewbytes' doesn't add much over bytes itself if we add a 'getbyte' 
method.


Also, would it not be more clear to call the int -> bytes method
something like bytes.fromint or bytes.fromord and introduce the same
thing on str? And perhaps allow multiple arguments to create a
str/bytes of length > 1. I guess this may violate TOOWTDI, but anyway,
just a thought.


Yes, it would.  Changing to 'bytes.fromint'.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-07-16 Thread Ethan Furman


On 06/07/2016 10:42 PM, Serhiy Storchaka wrote:

On 07.06.16 23:28, Ethan Furman wrote:



* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
   ``memoryview.iterbytes`` alternative iterators


"Byte" is an alias to "octet" (8-bit integer) in modern terminology.


Maybe so, but not, to my knowledge, in Python terminology.


Iterating bytes and bytearray already produce bytes.


No, it produces integers:


for b in b'abcid':

...   print(b)
...
97
98
99
105
100

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-10 Thread Nick Coghlan

On 9 June 2016 at 19:21, Barry Warsaw  wrote:
> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
>
>>Deprecation of current "zero-initialised sequence" behaviour
>>
>>
>>Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
>>argument and interpret it as meaning to create a zero-initialised sequence of
>>the given size::
>>
>> >>> bytes(3)
>> b'\x00\x00\x00'
>> >>> bytearray(3)
>> bytearray(b'\x00\x00\x00')
>>
>>This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
>>entirely in Python 3.7.
>>
>>No other changes are proposed to the existing constructors.
>
> Does it need to be *actually* removed?  That does break existing code for not
> a lot of benefit.  Yes, the default constructor is a little wonky, but with
> the addition of the new constructors, and the fact that you're not proposing
> to eventually change the default constructor, removal seems unnecessary.
> Besides, once it's removed, what would `bytes(3)` actually do?  The PEP
> doesn't say.

Raise TypeError, presumably. However, I agree this isn't worth the
hassle of breaking working code, especially since truly ludicrous
values will fail promptly with MemoryError - it's only a particular
range of values that fit within the limits of the machine, but also
push it into heavy swapping that are a potential problem.

> Also, since you're proposing to add `bytes.byte(3)` have you considered also
> adding an optional count argument?  E.g. `bytes.byte(3, count=7)` would yield
> b'\x03\x03\x03\x03\x03\x03\x03'.  That seems like it could be useful.

The purpose of bytes.byte() in the PEP is to provide a way to
roundtrip ord() calls with binary inputs, since the current spelling
is pretty unintuitive:

>>> ord("A")
65
>>> chr(ord("A"))
'A'
>>> ord(b"A")
65
>>> bytes([ord(b"A")])
b'A'

That said, perhaps it would make more sense for the corresponding
round-trip to be:

>>> bchr(ord("A"))
b'A'

With the "b" prefix on "chr" reflecting the "b" prefix on the output.
This also inverts the chr/unichr pairing that existed in Python 2
(replacing it with bchr/chr), and is hence very friendly to
compatibility modules like six and future (future.builtins already
provides a chr that behaves like the Python 3 one, and bchr would be
much easier to add to that than a new bytes object method).

In terms of an efficient memory-preallocation interface, the
equivalent NumPy operation to request a pre-filled array is
"ndarray.full":
http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.full.html
(there's also an inplace mutation operation, "fill")

For bytes and bytearray though, that has an unfortunate name collision
with "zfill", which refers to zero-padding numeric values for fixed
width display.

If the PEP just added bchr() to complement chr(), and [bytes,
bytearray].zeros() as a more discoverable alternative to passing
integers to the default constructor, I think that would be a decent
step forward, and the question of pre-initialising with arbitrary
values can be deferred for now (and perhaps left to NumPy
indefinitely)

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Franklin? Lee

On Jun 8, 2016 8:13 AM, "Paul Sokolovsky"  wrote:
>
> Hello,
>
> On Wed, 8 Jun 2016 14:45:22 +0300
> Serhiy Storchaka  wrote:
>
> []
>
> > > $ ./run-bench-tests bench/bytealloc*
> > > bench/bytealloc:
> > >  3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
> > >  11.244s (+237.35%) bench/bytealloc-2-repeat.py
> >
> > If the performance of creating an immutable array of n zero bytes is
> > important in MicroPython, it is worth to optimize b"\0" * n.
>
> No matter how you optimize calloc + something, it's always slower than
> just calloc.

`bytes(n)` *is* calloc + something. It's a lookup of and call to a global
function. (Unless MicroPython optimizes away lookups for builtins, in which
case it can theoretically optimize b"\0".__mul__.)

On the other hand, b"\0" is a constant, and * is an operator lookup that
succeeds on the first argument (meaning, perhaps, a successful branch
prediction). As a constant, it is only created once, so there's no
intermediate object created.

AFAICT, the first requires optimizing global function lookups + calls, and
the second requires optimizing lookup and *successful* application of
__mul__ (versus failure + fallback to some __rmul__), and repetitions of a
particular `bytes` object (which can be interned and checked against). That
means there is room for either to win, depending on the efforts of the
implementers.

(However, `bytearray` has no syntax for literals (and therefore easy
constants), and is a more valid and, AFAIK, more practical concern.)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Steven D'Aprano

On Wed, Jun 08, 2016 at 10:04:08AM +0200, Victor Stinner wrote:

> It's common that users complain that Python core developers like
> breaking the compatibility at each release.

No more common as users complaining that Python features are badly 
designed and crufty and should be fixed.

Whatever we do, we can't win. If we fix misfeatures, people complain. If 
we don't fix them, people complain. Sometimes the same people, depending 
on their specific needs. "Fix this, because it annoys me, but don't fix 
that, because I'm used to it and it doesn't annoy me any more."

*shrug*

Ultimately it comes down to a subjective feeling as to which is worse. 
My own subjective feeling is that, in the long run, we'll be better off 
fixing bytes than keeping it, and the longer we wait to fix it, the 
harder it will be.

-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Barry Warsaw

On Jun 08, 2016, at 02:01 AM, Martin Panter wrote:

>Bytes.byte() is a great idea. But what’s the point or use case of
>bytearray.byte(), a mutable array of one pre-defined byte?

I like Bytes.byte() too.  I would guess you'd want the same method on
bytearray for duck typing APIs.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Paul Sokolovsky

Hello,

On Wed, 8 Jun 2016 14:45:22 +0300
Serhiy Storchaka  wrote:

[]

> > $ ./run-bench-tests bench/bytealloc*
> > bench/bytealloc:
> >  3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
> >  11.244s (+237.35%) bench/bytealloc-2-repeat.py
> 
> If the performance of creating an immutable array of n zero bytes is 
> important in MicroPython, it is worth to optimize b"\0" * n.

No matter how you optimize calloc + something, it's always slower than
just calloc.

> For now CPython is the main implementation of Python 3 

Indeed, and it already has bytes(N). So, perhaps nothing should be done
about it except leaving it alone. Perhaps, more discussion should go
into whether there's need for .iterbytes() if there's [i:i+1] already.
(I personally skip that, as I find [i:i+1] perfectly ok, and while I
can't understand how people may be not ok with it up to wanting
something more, I leave such possibility).

> and bytes(n)
> is slower than b"\0" * n in CPython.

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Serhiy Storchaka


On 08.06.16 14:26, Paul Sokolovsky wrote:

On Wed, 8 Jun 2016 14:05:19 +0300
Serhiy Storchaka  wrote:


On 08.06.16 13:37, Paul Sokolovsky wrote:

The obvious way to create the bytes object of length n is b'\0' *
n.


That's very inefficient: it requires allocating useless b'\0', then
a generic function to repeat arbitrary memory block N times. If
there's a talk of Python to not be laughed at for being SLOW, there
would rather be efficient ways to deal with blocks of binary data.


Do you have any evidences for this claim?


Yes, it's written above, let me repeat it: bytes(n) is (can be)
calloc(1, n) underlyingly, while b"\0" * n is a more complex algorithm.



$ ./python -m timeit -s 'n = 1' -- 'bytes(n)'
100 loops, best of 3: 1.32 usec per loop
$ ./python -m timeit -s 'n = 1' -- 'b"\0" * n'
100 loops, best of 3: 0.858 usec per loop


I don't know how inefficient CPython's bytes(n) or how efficient
repetition (maybe 1-byte repetitions are optimized into memset()?), but
MicroPython (where bytes(n) is truly calloc(n)) gives expected results:

$ ./run-bench-tests bench/bytealloc*
bench/bytealloc:
 3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
 11.244s (+237.35%) bench/bytealloc-2-repeat.py


If the performance of creating an immutable array of n zero bytes is 
important in MicroPython, it is worth to optimize b"\0" * n.


For now CPython is the main implementation of Python 3 and bytes(n) is 
slower than b"\0" * n in CPython.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Paul Sokolovsky

Hello,

On Wed, 8 Jun 2016 14:05:19 +0300
Serhiy Storchaka  wrote:

> On 08.06.16 13:37, Paul Sokolovsky wrote:
> >> The obvious way to create the bytes object of length n is b'\0' *
> >> n.
> >
> > That's very inefficient: it requires allocating useless b'\0', then
> > a generic function to repeat arbitrary memory block N times. If
> > there's a talk of Python to not be laughed at for being SLOW, there
> > would rather be efficient ways to deal with blocks of binary data.
> 
> Do you have any evidences for this claim?

Yes, it's written above, let me repeat it: bytes(n) is (can be)
calloc(1, n) underlyingly, while b"\0" * n is a more complex algorithm. 

> 
> $ ./python -m timeit -s 'n = 1' -- 'bytes(n)'
> 100 loops, best of 3: 1.32 usec per loop
> $ ./python -m timeit -s 'n = 1' -- 'b"\0" * n'
> 100 loops, best of 3: 0.858 usec per loop

I don't know how inefficient CPython's bytes(n) or how efficient
repetition (maybe 1-byte repetitions are optimized into memset()?), but
MicroPython (where bytes(n) is truly calloc(n)) gives expected results:

$ ./run-bench-tests bench/bytealloc*
bench/bytealloc:
3.333s (+00.00%) bench/bytealloc-1-bytes_n.py
11.244s (+237.35%) bench/bytealloc-2-repeat.py


-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Serhiy Storchaka


On 08.06.16 13:37, Paul Sokolovsky wrote:

The obvious way to create the bytes object of length n is b'\0' * n.


That's very inefficient: it requires allocating useless b'\0', then a
generic function to repeat arbitrary memory block N times. If there's a
talk of Python to not be laughed at for being SLOW, there would rather
be efficient ways to deal with blocks of binary data.


Do you have any evidences for this claim?

$ ./python -m timeit -s 'n = 1' -- 'bytes(n)'
100 loops, best of 3: 1.32 usec per loop
$ ./python -m timeit -s 'n = 1' -- 'b"\0" * n'
100 loops, best of 3: 0.858 usec per loop


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Paul Sokolovsky

Hello,

On Wed, 8 Jun 2016 11:53:06 +0300
Serhiy Storchaka  wrote:

> On 08.06.16 11:04, Victor Stinner wrote:
> >> Currently, the ``bytes`` and ``bytearray`` constructors accept an
> >> integer argument and interpret it as meaning to create a
> >> zero-initialised sequence of the given size::
> >> (...)
> >> This PEP proposes to deprecate that behaviour in Python 3.6, and
> >> remove it entirely in Python 3.7.
> >
> > I'm opposed to this change (presented like that). Please stop
> > breaking the backward compatibility in minor versions.
> 
> The argument for deprecating bytes(n) is that this has different
> meaning in Python 2,

That's artifact (as in: defect) of "bytes" (apparently) being a flat
alias of "str" in Python2, without trying to validate its arguments. It
would be sad if thinkos in Python2 implementation dictate how Python3
should work. It's not too late to fix it in Python2 by issuing s CVE
along the lines of "Lack of argument validation in Python2 bytes()
constructor may lead to insecure code."

> and when backport a code to Python 2 or write
> 2+3 compatible code there is a risk to make a mistake. This argument
> is not applicable to bytearray(n).
> 
> > *If* you still want to deprecate bytes(n), you must introduce an
> > helper working on *all* Python versions. Obviously, the helper must
> > be avaialble and work for Python 2.7. Maybe it can be the six
> > module. Maybe something else.
> 
> The obvious way to create the bytes object of length n is b'\0' * n.

That's very inefficient: it requires allocating useless b'\0', then a
generic function to repeat arbitrary memory block N times. If there's a
talk of Python to not be laughed at for being SLOW, there would rather
be efficient ways to deal with blocks of binary data.

> It works in all Python versions starting from 2.6. I don't see the
> need in bytes(n) and bytes.zeros(n). There are no special methods for
> creating a list or a string of size n.

So, above, unless you specifically mean having bytearray.zero() and not
having bytes.zero(). But then the whole purpose of the presented PEP is
make API more, not less consistent. Having random gaps in bytes vs
bytearray API isn't going to help anyone.

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Serhiy Storchaka


On 08.06.16 02:03, Nick Coghlan wrote:

That said, it occurs to me that there's a reasonably strong
composability argument in favour of a view-based approach: a view will
work with operator.itemgetter() and other sequence consuming APIs,
while special methods won't. The "like-memoryview-but-not" view type
could also take any bytes-like object as input, similar to memoryview
itself.


Something like:

class chunks:
def __init__(self, seq, size):
self._seq = seq
self._size = size

def __len__(self):
return len(self._seq) // self._size

def __getitem__(self, i):
chunk = self._seq[i: i + self._size]
if len(chunk) != self._size:
raise IndexError
return chunk

(but needs more checks and slices support).

It would be useful for general sequences too.

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Serhiy Storchaka


On 08.06.16 11:04, Victor Stinner wrote:

Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::
(...)
This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
entirely in Python 3.7.


I'm opposed to this change (presented like that). Please stop breaking
the backward compatibility in minor versions.


The argument for deprecating bytes(n) is that this has different meaning 
in Python 2, and when backport a code to Python 2 or write 2+3 
compatible code there is a risk to make a mistake. This argument is not 
applicable to bytearray(n).



*If* you still want to deprecate bytes(n), you must introduce an
helper working on *all* Python versions. Obviously, the helper must be
avaialble and work for Python 2.7. Maybe it can be the six module.
Maybe something else.


The obvious way to create the bytes object of length n is b'\0' * n. It 
works in all Python versions starting from 2.6. I don't see the need in 
bytes(n) and bytes.zeros(n). There are no special methods for creating a 
list or a string of size n.



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Victor Stinner

Hi,

> Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
> argument and interpret it as meaning to create a zero-initialised sequence
> of the given size::
> (...)
> This PEP proposes to deprecate that behaviour in Python 3.6, and remove it
> entirely in Python 3.7.

I'm opposed to this change (presented like that). Please stop breaking
the backward compatibility in minor versions.

I'm porting Python 2 code to Python 3 for longer than 2 years. First,
Python 3 only proposed to immediatly drop Python 2 support using the
2to3 tool. It simply doesn't work because you must port incrementally
all dependencies, so you must write code working with Python 2 and
Python 3 using the same code base. A few people tried to duplicate
repositories, projects, project name, etc. to have one version for
Python 2 and one version for Python 3, but IMHO it's even worse. It's
very difficult to handle dependencies using that.

It took a few years until six was widely used and that pip was popular
enough to be able to add six as a *dependency* (and not put an old
copy in the project).

Basically, you propose to introduce a backward incompatible change for
free (I fail to see the benefit of replacing bytes(n) with
bytes.zeros(n)) and without obvious way to write code compatible with
Python <= 3.6 and Python >= 3.7.

Moreover, a single cycle is way too short to port all code in the wild.

It's common that users complain that Python core developers like
breaking the compatibility at each release. Recently, I saw a list of
applications which need to be ported to Python 3.5, while they work
perfectly on Python 3.4.

*If* you still want to deprecate bytes(n), you must introduce an
helper working on *all* Python versions. Obviously, the helper must be
avaialble and work for Python 2.7. Maybe it can be the six module.
Maybe something else.

In Perl 5, there is a nice "use 5.12;" pragma to explicitly ask to
keep the compatiiblity with Perl 5.12. This pragma allows to change
the language more easily, since you can port code file by file. I
don't know if it's technically possible in Python, maybe not for all
kinds of backward incompatible changes.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-08 Thread Stephen J. Turnbull

Ethan Furman writes:

 > * Deprecate passing single integer values to ``bytes`` and
 >   ``bytearray``

Why?  This is a slightly awkward idiom compared to .zeros (EITBI etc),
but your 32-bit clock will roll over before we can actually remove it.
There are a lot of languages that do this kind of initialization of
arrays based on ``count``.  If you want to do something useful here,
add an optional argument (here in ridiculous :-) generality:

bytes(count, tile=[0]) -> bytes(tile * count)

where ``tile`` is a Sequence of a type that is acceptable to bytes
anyway, or Sequence[int], which is treated as

b"".join([bytes(chr(i)) for i in tile] * count])

Interpretation of ``count`` of course  i bikesheddable, with at least
one alternative interpretation (length of result bytes, with last tile
truncated if necessary).

 > * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors

this is an API break if you take the deprecation as a mandate (which
eventual removal does indicate).  And backward compatibility for
clients of the bytes API means that we violate TOOWTDI indefinitely,
on a constructor of quite specialized utility.  Yuck.

-1 on both.

Barry Warsaw writes later in thread:

 > We can't change bytes.__getitem__ but we can add another method
 > that returns single byte objects?  I think it's still a bit of a
 > pain to extract single bytes even with .iterbytes().

+1  ISTM that more than the other changes, this is the most important
one.

Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Serhiy Storchaka


On 07.06.16 23:28, Ethan Furman wrote:

Minor changes: updated version numbers, add punctuation.

The current text seems to take into account Guido's last comments.

Thoughts before asking for acceptance?




PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16


Abstract


During the initial development of the Python 3 language specification,
the core ``bytes`` type for arbitrary binary data started as the mutable
type that is now referred to as ``bytearray``. Other aspects of
operating in the binary domain in Python have also evolved over the
course of the Python 3 series.

This PEP proposes four small adjustments to the APIs of the ``bytes``,
``bytearray`` and ``memoryview`` types to make it easier to operate
entirely in the binary domain:

* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
   ``memoryview.iterbytes`` alternative iterators


"Byte" is an alias to "octet" (8-bit integer) in modern terminology. 
Iterating bytes and bytearray already produce bytes. Wouldn't this be 
confused? May be name these methods "iterbytestrings", since they adds 
str-like behavior?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Martin Panter

On 7 June 2016 at 21:56, Nick Coghlan  wrote:
> On 7 June 2016 at 14:33, Paul Sokolovsky  wrote:
>> Ethan Furman  wrote:
>>> Deprecation of current "zero-initialised sequence" behaviour
>>> 
>>>
>>> Currently, the ``bytes`` and ``bytearray`` constructors accept an
>>> integer argument and interpret it as meaning to create a
>>> zero-initialised sequence of the given size::
>>>
>>>  >>> bytes(3)
>>>  b'\x00\x00\x00'
>>>  >>> bytearray(3)
>>>  bytearray(b'\x00\x00\x00')
>>>
>>> This PEP proposes to deprecate that behaviour in Python 3.6, and
>>> remove it entirely in Python 3.7.
>>
>> Why the desire to break applications of thousands and thousands of
>> people?
>
> Same argument as any deprecation: to make existing and future defects
> easier to find or easier to debug.
>
> That said, this is the main part I was referring to in the other
> thread when I mentioned some of the constructor changes were
> potentially controversial and probably not worth the hassle - it's the
> only one with the potential to break currently working code, while the
> others are just a matter of choosing suitable names.

An argument against deprecating bytearray(n) in particular is that
this is supported in Python 2. I think I have (ab)used this fact to
work around the problem with bytes(n) in Python 2 & 3 compatible code.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Steven D'Aprano

On Wed, Jun 08, 2016 at 02:17:12AM +0300, Paul Sokolovsky wrote:
> Hello,
> 
> On Tue, 07 Jun 2016 15:46:00 -0700
> Ethan Furman  wrote:
> 
> > On 06/07/2016 02:33 PM, Paul Sokolovsky wrote:
> > 
> > >> This PEP proposes to deprecate that behaviour in Python 3.6, and
> > >> remove it entirely in Python 3.7.
> > >
> > > Why the desire to break applications of thousands and thousands of
> > > people? 

I'm not so sure that *thousands* of people are relying on this 
behaviour, but your point is taken that it is a backwards-incompatible 
change.


> > > Besides, bytes(3) behavior is very logical. Everyone who
> > > knows what malloc(3) does also knows what bytes(3) does.

Most Python coders are not C coders. Knowing C is not and should not be 
a pre-requisite for using Python.


> > > Who
> > > doesn't, can learn, and eventually be grateful that learning Python
> > > actually helped them to learn other language as well.

I really don't think that learning Python will help with C.


> > Two reasons:
> > 
> > 1) bytes are immutable, so creating a 3-byte 0x00 string seems
> > ridiculous;
> 
> There's nothing ridiculous in sending N zero bytes over network,
> writing to a file, transferring to a hardware device.

True, but there is a good way of writing N identical bytes, not limited 
to nulls, using the replication operator:

py> b'\xff'*10
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

which is more useful than `bytes(10)` since that can only produce 
zeroes.


> That however
> raises questions e.g. how to (efficiently) fill a (subsection) of
> bytearray with something but a 0

Slicing.

py> b = bytearray(10)
py> b[4:4] = b'\xff'*4
py> b
bytearray(b'\x00\x00\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00')


-- 
Steve
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Paul Sokolovsky

Hello,

On Tue, 07 Jun 2016 15:46:00 -0700
Ethan Furman  wrote:

> On 06/07/2016 02:33 PM, Paul Sokolovsky wrote:
> 
> >> This PEP proposes to deprecate that behaviour in Python 3.6, and
> >> remove it entirely in Python 3.7.
> >
> > Why the desire to break applications of thousands and thousands of
> > people? Besides, bytes(3) behavior is very logical. Everyone who
> > knows what malloc(3) does also knows what bytes(3) does. Who
> > doesn't, can learn, and eventually be grateful that learning Python
> > actually helped them to learn other language as well.
> 
> Two reasons:
> 
> 1) bytes are immutable, so creating a 3-byte 0x00 string seems
> ridiculous;

There's nothing ridiculous in sending N zero bytes over network,
writing to a file, transferring to a hardware device. That however
raises questions e.g. how to (efficiently) fill a (subsection) of
bytearray with something but a 0, and how to apply all that
consistently to array.array, but I don't even want to bring it,
because the answer will be "we need first to deal with subjects of this
PEP".

> 
> 2) Python is not C, and the vagaries of malloc are not relevant to
> Python.

Yes, but Python has always had some traits nicely similar to C, (%
formatting, os.read/write at the fingertips, this bytes/bytearray
constructor, etc.), and that certainly catered for sizable share of its
audience. It's nice that nowadays Python is truly multi-paradigm and
taught to pre-schools and used by folks who know how to analyze data
much better than how to allocate memory to hold that data in the first
place. But hopefully people who used Python since 1.x as a nice
system-level integration language, concise without much ambiguity
(definitely less than other languages, maybe COBOL excluded) shouldn't
suffer and have their stuff broken.

> 
> However, there is little point in breaking working code, so a 
> deprecation without removal is fine by me.

Thanks.

> 
> --
> ~Ethan~

-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Nick Coghlan

On 7 June 2016 at 15:22, Koos Zevenhoven  wrote:
> On Wed, Jun 8, 2016 at 12:57 AM, Barry Warsaw  wrote:
>> On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote:
>>
>>>On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky  wrote:
 What's wrong with b[i:i+1] ?
>>>It always succeeds while indexing can trigger an IndexError.
>>
>> Right.  You want a method with the semantics of __getitem__() but that 
>> returns
>> the desired type.
>>
>
> And if this is called __getitem__ (with slices delegated to
> bytes.__getitem__) and implemented in a class, one has a view. Maybe
> I'm missing something, but I fail to understand what makes this
> significantly more problematic than an iterator. Ok, I guess we might
> also need __len__.

Right, it's the fact that a view is a much broader API than we need,
since most of the operations on the base type are already fine. The
two alternate operations that people are interested in are:

- like indexing, but producing bytes instead of ints
- like iteration, but producing bytes instead of ints

That said, it occurs to me that there's a reasonably strong
composability argument in favour of a view-based approach: a view will
work with operator.itemgetter() and other sequence consuming APIs,
while special methods won't. The "like-memoryview-but-not" view type
could also take any bytes-like object as input, similar to memoryview
itself.

Cheers,
Nick.

P.S. I'm starting to remember why I stopped working on this - I'm
genuinely unsure of the right way forward, so I wasn't prepared to
advocate strongly for the particular approach in the PEP :)

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Ethan Furman


On 06/07/2016 02:33 PM, Paul Sokolovsky wrote:


This PEP proposes to deprecate that behaviour in Python 3.6, and
remove it entirely in Python 3.7.


Why the desire to break applications of thousands and thousands of
people? Besides, bytes(3) behavior is very logical. Everyone who knows
what malloc(3) does also knows what bytes(3) does. Who doesn't, can
learn, and eventually be grateful that learning Python actually helped
them to learn other language as well.


Two reasons:

1) bytes are immutable, so creating a 3-byte 0x00 string seems
   ridiculous;

2) Python is not C, and the vagaries of malloc are not relevant to
   Python.

However, there is little point in breaking working code, so a 
deprecation without removal is fine by me.


--
~Ethan~

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Koos Zevenhoven

On Wed, Jun 8, 2016 at 12:57 AM, Barry Warsaw  wrote:
> On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote:
>
>>On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky  wrote:
>>> What's wrong with b[i:i+1] ?
>>It always succeeds while indexing can trigger an IndexError.
>
> Right.  You want a method with the semantics of __getitem__() but that returns
> the desired type.
>

And if this is called __getitem__ (with slices delegated to
bytes.__getitem__) and implemented in a class, one has a view. Maybe
I'm missing something, but I fail to understand what makes this
significantly more problematic than an iterator. Ok, I guess we might
also need __len__.

-- Koos

> -Barry
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Barry Warsaw

On Jun 07, 2016, at 09:40 PM, Brett Cannon wrote:

>On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky  wrote:
>> What's wrong with b[i:i+1] ?
>It always succeeds while indexing can trigger an IndexError.

Right.  You want a method with the semantics of __getitem__() but that returns
the desired type.

-Barry


pgpKzXeYAKnPj.pgp
Description: OpenPGP digital signature
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread tritium-list

Ignore that message.  I hit send before brain and hands were fully in sync.

> -Original Message-
> From: tritium-l...@sdamon.com [mailto:tritium-l...@sdamon.com]
> Sent: Tuesday, June 7, 2016 5:51 PM
> To: 'Nick Coghlan' <ncogh...@gmail.com>; 'Barry Warsaw'
> <ba...@python.org>
> Cc: python-dev@python.org
> Subject: RE: [Python-Dev] PEP 467: Minor API improvements to bytes,
> bytearray, and memoryview
> 
> 
> 
> > -Original Message-
> > From: Python-Dev [mailto:python-dev-bounces+tritium-
> > list=sdamon@python.org] On Behalf Of Nick Coghlan
> > Sent: Tuesday, June 7, 2016 5:40 PM
> > To: Barry Warsaw <ba...@python.org>
> > Cc: python-dev@python.org
> > Subject: Re: [Python-Dev] PEP 467: Minor API improvements to bytes,
> > bytearray, and memoryview
> >
> > On 7 June 2016 at 14:31, Barry Warsaw <ba...@python.org> wrote:
> > > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
> > >
> > >>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
> > >>   ``memoryview.iterbytes`` alternative iterators
> > >
> > > +1 but I want to go just a little farther.
> > >
> > > We can't change bytes.__getitem__ but we can add another method
> that
> > returns
> > > single byte objects?  I think it's still a bit of a pain to extract
> single
> > > bytes even with .iterbytes().
> > >
> > > Maybe .iterbytes can take a single index argument (blech) or add a
> method
> > like
> > > .byte_at(i).  I'll let you bikeshed on the name.
> >
> > Perhaps:
> >
> >  data.getbyte(i)
> >  data.iterbytes()
> 
> data.getbyte(index_or_slice_object) ?
> 
> while it might not be... ideal... to create a sliceable live view object,
we
> can have a method that accepts a slice, even if we have to create it
> manually (or at least make it convenient for those who wish to wrap a
bytes
> object in their own type and blindly pass the first-non-self arg of a
custom
> __getitem__ to the method).
> 
> > The rationale for "Why not a live view?" is that an iterator is simple
> > to define and implement, while we know from experience with
> memoryview
> > and the various dict views that live views are a minefield for folks
> > defining new container types. Since this PEP would in some sense
> > change what it means to implement a full "bytes-like object", it's
> > worth keeping implementation complexity in mind.
> >
> > Cheers,
> > Nick.
> >
> > --
> > Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> > ___
> > Python-Dev mailing list
> > Python-Dev@python.org
> > https://mail.python.org/mailman/listinfo/python-dev
> > Unsubscribe: https://mail.python.org/mailman/options/python-
> dev/tritium-
> > list%40sdamon.com


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Nick Coghlan

On 7 June 2016 at 14:33, Paul Sokolovsky  wrote:
> Hello,
>
> On Tue, 07 Jun 2016 13:28:13 -0700
> Ethan Furman  wrote:
>
>> Minor changes: updated version numbers, add punctuation.
>>
>> The current text seems to take into account Guido's last comments.
>>
>> Thoughts before asking for acceptance?
>>
>>
> []
>
>> Deprecation of current "zero-initialised sequence" behaviour
>> 
>>
>> Currently, the ``bytes`` and ``bytearray`` constructors accept an
>> integer argument and interpret it as meaning to create a
>> zero-initialised sequence of the given size::
>>
>>  >>> bytes(3)
>>  b'\x00\x00\x00'
>>  >>> bytearray(3)
>>  bytearray(b'\x00\x00\x00')
>>
>> This PEP proposes to deprecate that behaviour in Python 3.6, and
>> remove it entirely in Python 3.7.
>
> Why the desire to break applications of thousands and thousands of
> people?

Same argument as any deprecation: to make existing and future defects
easier to find or easier to debug.

That said, this is the main part I was referring to in the other
thread when I mentioned some of the constructor changes were
potentially controversial and probably not worth the hassle - it's the
only one with the potential to break currently working code, while the
others are just a matter of choosing suitable names.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread tritium-list



> -Original Message-
> From: Python-Dev [mailto:python-dev-bounces+tritium-
> list=sdamon@python.org] On Behalf Of Nick Coghlan
> Sent: Tuesday, June 7, 2016 5:40 PM
> To: Barry Warsaw <ba...@python.org>
> Cc: python-dev@python.org
> Subject: Re: [Python-Dev] PEP 467: Minor API improvements to bytes,
> bytearray, and memoryview
> 
> On 7 June 2016 at 14:31, Barry Warsaw <ba...@python.org> wrote:
> > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
> >
> >>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
> >>   ``memoryview.iterbytes`` alternative iterators
> >
> > +1 but I want to go just a little farther.
> >
> > We can't change bytes.__getitem__ but we can add another method that
> returns
> > single byte objects?  I think it's still a bit of a pain to extract
single
> > bytes even with .iterbytes().
> >
> > Maybe .iterbytes can take a single index argument (blech) or add a
method
> like
> > .byte_at(i).  I'll let you bikeshed on the name.
> 
> Perhaps:
> 
>  data.getbyte(i)
>  data.iterbytes()

data.getbyte(index_or_slice_object) ?

while it might not be... ideal... to create a sliceable live view object, we
can have a method that accepts a slice, even if we have to create it
manually (or at least make it convenient for those who wish to wrap a bytes
object in their own type and blindly pass the first-non-self arg of a custom
__getitem__ to the method).

> The rationale for "Why not a live view?" is that an iterator is simple
> to define and implement, while we know from experience with memoryview
> and the various dict views that live views are a minefield for folks
> defining new container types. Since this PEP would in some sense
> change what it means to implement a full "bytes-like object", it's
> worth keeping implementation complexity in mind.
> 
> Cheers,
> Nick.
> 
> --
> Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/tritium-
> list%40sdamon.com

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Brett Cannon

On Tue, 7 Jun 2016 at 14:38 Paul Sokolovsky  wrote:

> Hello,
>
> On Tue, 7 Jun 2016 17:31:19 -0400
> Barry Warsaw  wrote:
>
> > On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
> >
> > >* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
> > >   ``memoryview.iterbytes`` alternative iterators
> >
> > +1 but I want to go just a little farther.
> >
> > We can't change bytes.__getitem__ but we can add another method that
> > returns single byte objects?  I think it's still a bit of a pain to
> > extract single bytes even with .iterbytes().
> >
> > Maybe .iterbytes can take a single index argument (blech) or add a
> > method like .byte_at(i).  I'll let you bikeshed on the name.
>
> What's wrong with b[i:i+1] ?
>

It always succeeds while indexing can trigger an IndexError.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Nick Coghlan

On 7 June 2016 at 14:31, Barry Warsaw  wrote:
> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
>
>>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
>>   ``memoryview.iterbytes`` alternative iterators
>
> +1 but I want to go just a little farther.
>
> We can't change bytes.__getitem__ but we can add another method that returns
> single byte objects?  I think it's still a bit of a pain to extract single
> bytes even with .iterbytes().
>
> Maybe .iterbytes can take a single index argument (blech) or add a method like
> .byte_at(i).  I'll let you bikeshed on the name.

Perhaps:

 data.getbyte(i)
 data.iterbytes()

The rationale for "Why not a live view?" is that an iterator is simple
to define and implement, while we know from experience with memoryview
and the various dict views that live views are a minefield for folks
defining new container types. Since this PEP would in some sense
change what it means to implement a full "bytes-like object", it's
worth keeping implementation complexity in mind.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Paul Sokolovsky

Hello,

On Tue, 7 Jun 2016 17:31:19 -0400
Barry Warsaw  wrote:

> On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:
> 
> >* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
> >   ``memoryview.iterbytes`` alternative iterators
> 
> +1 but I want to go just a little farther.
> 
> We can't change bytes.__getitem__ but we can add another method that
> returns single byte objects?  I think it's still a bit of a pain to
> extract single bytes even with .iterbytes().
> 
> Maybe .iterbytes can take a single index argument (blech) or add a
> method like .byte_at(i).  I'll let you bikeshed on the name.

What's wrong with b[i:i+1] ?


-- 
Best regards,
 Paul  mailto:pmis...@gmail.com
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Koos Zevenhoven

On Tue, Jun 7, 2016 at 11:28 PM, Ethan Furman  wrote:
>
> Minor changes: updated version numbers, add punctuation.
>
> The current text seems to take into account Guido's last comments.
>
> Thoughts before asking for acceptance?
>
> PEP: 467
> Title: Minor API improvements for binary sequences
> Version: $Revision$
> Last-Modified: $Date$
> Author: Nick Coghlan 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2014-03-30
> Python-Version: 3.5
> Post-History: 2014-03-30 2014-08-15 2014-08-16
>
>
> Abstract
> 
>
> During the initial development of the Python 3 language specification, the 
> core ``bytes`` type for arbitrary binary data started as the mutable type 
> that is now referred to as ``bytearray``. Other aspects of operating in the 
> binary domain in Python have also evolved over the course of the Python 3 
> series.
>
> This PEP proposes four small adjustments to the APIs of the ``bytes``, 
> ``bytearray`` and ``memoryview`` types to make it easier to operate entirely 
> in the binary domain:
>
> * Deprecate passing single integer values to ``bytes`` and ``bytearray``
> * Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
> * Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
> * Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
>   ``memoryview.iterbytes`` alternative iterators
>

Why not bytes.viewbytes (or whatever name) so that one could also
subscript it? And if it were a property, one could perhaps
conveniently get the n'th byte:

b'abcde'.viewbytes[n]   # compared to b'abcde'[n:n+1]

Also, would it not be more clear to call the int -> bytes method
something like bytes.fromint or bytes.fromord and introduce the same
thing on str? And perhaps allow multiple arguments to create a
str/bytes of length > 1. I guess this may violate TOOWTDI, but anyway,
just a thought.

-- Koos
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Barry Warsaw

On Jun 07, 2016, at 01:28 PM, Ethan Furman wrote:

>* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
>   ``memoryview.iterbytes`` alternative iterators

+1 but I want to go just a little farther.

We can't change bytes.__getitem__ but we can add another method that returns
single byte objects?  I think it's still a bit of a pain to extract single
bytes even with .iterbytes().

Maybe .iterbytes can take a single index argument (blech) or add a method like
.byte_at(i).  I'll let you bikeshed on the name.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

[Python-Dev] PEP 467: Minor API improvements to bytes, bytearray, and memoryview

2016-06-07 Thread Ethan Furman


Minor changes: updated version numbers, add punctuation.

The current text seems to take into account Guido's last comments.

Thoughts before asking for acceptance?




PEP: 467
Title: Minor API improvements for binary sequences
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16


Abstract


During the initial development of the Python 3 language specification, 
the core ``bytes`` type for arbitrary binary data started as the mutable 
type that is now referred to as ``bytearray``. Other aspects of 
operating in the binary domain in Python have also evolved over the 
course of the Python 3 series.


This PEP proposes four small adjustments to the APIs of the ``bytes``, 
``bytearray`` and ``memoryview`` types to make it easier to operate 
entirely in the binary domain:


* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
  ``memoryview.iterbytes`` alternative iterators


Proposals
=

Deprecation of current "zero-initialised sequence" behaviour


Currently, the ``bytes`` and ``bytearray`` constructors accept an 
integer argument and interpret it as meaning to create a 
zero-initialised sequence of the given size::


>>> bytes(3)
b'\x00\x00\x00'
>>> bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.6, and remove 
it entirely in Python 3.7.


No other changes are proposed to the existing constructors.


Addition of explicit "zero-initialised sequence" constructors
-

To replace the deprecated behaviour, this PEP proposes the addition of 
an explicit ``zeros`` alternative constructor as a class method on both 
``bytes`` and ``bytearray``::


>>> bytes.zeros(3)
b'\x00\x00\x00'
>>> bytearray.zeros(3)
bytearray(b'\x00\x00\x00')

It will behave just as the current constructors behave when passed a 
single integer.


The specific choice of ``zeros`` as the alternative constructor name is 
taken from the corresponding initialisation function in NumPy (although, 
as these are 1-dimensional sequence types rather than N-dimensional 
matrices, the constructors take a length as input rather than a shape 
tuple).



Addition of explicit "single byte" constructors
---

As binary counterparts to the text ``chr`` function, this PEP proposes 
the addition of an explicit ``byte`` alternative constructor as a class 
method on both ``bytes`` and ``bytearray``::


>>> bytes.byte(3)
b'\x03'
>>> bytearray.byte(3)
bytearray(b'\x03')

These methods will only accept integers in the range 0 to 255 (inclusive)::

>>> bytes.byte(512)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: bytes must be in range(0, 256)

>>> bytes.byte(1.0)
Traceback (most recent call last):
  File "", line 1, in 
TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ``ord`` builtin will be updated to explicitly 
note that ``bytes.byte`` is the inverse operation for binary data, while 
``chr`` is the inverse operation for text data.


Behaviourally, ``bytes.byte(x)`` will be equivalent to the current 
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is 
expected to be easier to discover and easier to read (especially when 
used in conjunction with indexing operations on binary sequence types).


As a separate method, the new spelling will also work better with higher 
order functions like ``map``.



Addition of optimised iterator methods that produce ``bytes`` objects
-

This PEP proposes that ``bytes``, ``bytearray`` and ``memoryview`` gain 
an optimised ``iterbytes`` method that produces length 1 ``bytes`` 
objects rather than integers::


for x in data.iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer

The method can be used with arbitrary buffer exporting objects by 
wrapping them in a ``memoryview`` instance first::


for x in memoryview(data).iterbytes():
# x is a length 1 ``bytes`` object, rather than an integer

For ``memoryview``, the semantics of ``iterbytes()`` are defined such that::

memview.tobytes() == b''.join(memview.iterbytes())

This allows the raw bytes of the memory view to be iterated over without 
needing to make a copy, regardless of the defined shape and format.


The main advantage this method offers over the ``map(bytes.byte, data)`` 
approach is that it

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-18 Thread Barry Warsaw

On Aug 17, 2014, at 09:39 PM, Antoine Pitrou wrote:

 need for a special case for a single byte.  We already have a perfectly
 good spelling:
 NUL = bytes([0])

That is actually a very cumbersome spelling. Why should I first create a
one-element list in order to create a one-byte bytes object?

I feel the same way every time I have to write `set(['foo'])`.

-Barry
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Raymond Hettinger


On Aug 14, 2014, at 10:50 PM, Nick Coghlan ncogh...@gmail.com wrote:

 Key points in the proposal:
 
 * deprecate passing integers to bytes() and bytearray()

I'm opposed to removing this part of the API.  It has proven useful
and the alternative isn't very nice.   Declaring the size of fixed length
arrays is not a new concept and is widely adopted in other languages.
One principal use case for the bytearray is creating and manipulating
binary data.  Initializing to zero is common operation and should remain
part of the core API (consider why we now have list.copy() even though
copying with a slice remains possible and efficient).

I and my clients have taken advantage of this feature and it reads nicely.
The proposed deprecation would break our code and not actually make
anything better.

Another thought is that the core devs should be very reluctant to deprecate
anything we don't have to while the 2 to 3 transition is still in progress.   
Every new deprecation of APIs that existed in Python 2.7 just adds another
obstacle to converting code.  Individually, the differences are trivial.  
Collectively, they present a good reason to never migrate code to Python 3.


Raymond


 

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Nick Coghlan

On 17 August 2014 18:13, Raymond Hettinger raymond.hettin...@gmail.com wrote:

 On Aug 14, 2014, at 10:50 PM, Nick Coghlan ncogh...@gmail.com wrote:

 Key points in the proposal:

 * deprecate passing integers to bytes() and bytearray()


 I'm opposed to removing this part of the API.  It has proven useful
 and the alternative isn't very nice.   Declaring the size of fixed length
 arrays is not a new concept and is widely adopted in other languages.
 One principal use case for the bytearray is creating and manipulating
 binary data.  Initializing to zero is common operation and should remain
 part of the core API (consider why we now have list.copy() even though
 copying with a slice remains possible and efficient).

That's why the PEP proposes adding a zeros method, based on the name
of the corresponding NumPy construct.

The status quo has some very ugly failure modes when an integer is
passed unexpectedly, and tries to create a large buffer, rather than
throwing a type error.

 I and my clients have taken advantage of this feature and it reads nicely.

If I see bytearray(10) there is nothing there that suggests this
creates an array of length 10 and initialises it to zero to me. I'd
be more inclined to guess it would be equivalent to bytearray([10]).

bytearray.zeros(10), on the other hand, is relatively clear,
independently of user expectations.

 The proposed deprecation would break our code and not actually make
 anything better.

 Another thought is that the core devs should be very reluctant to deprecate
 anything we don't have to while the 2 to 3 transition is still in progress.
 Every new deprecation of APIs that existed in Python 2.7 just adds another
 obstacle to converting code.  Individually, the differences are trivial.
 Collectively, they present a good reason to never migrate code to Python 3.

This is actually one of the inconsistencies between the Python 2 and 3
binary APIs:

Python 2.7.5 (default, Jun 25 2014, 10:19:55)
[GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
Type help, copyright, credits or license for more information.
 bytes(10)
'10'
 bytearray(10)
bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')

Users wanting well-behaved binary sequences in Python 2.7 would be
well advised to use the future module to get a full backport of the
actual Python 3 bytes type, rather than the approximation that is the
8-bit str in Python 2. And once they do that, they'll be able to track
the evolution of the Python 3 binary sequence behaviour without any
further trouble.

That said, I don't really mind how long the deprecation cycle is. I'd
be fine with fully supporting both in 3.5 (2015), deprecating the main
constructor in favour of the explicit zeros() method in 3.6 (2017) and
dropping the legacy behaviour in 3.7 (2018)

Regards,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Raymond Hettinger


On Aug 17, 2014, at 1:41 AM, Nick Coghlan ncogh...@gmail.com wrote:

 If I see bytearray(10) there is nothing there that suggests this
 creates an array of length 10 and initialises it to zero to me. I'd
 be more inclined to guess it would be equivalent to bytearray([10]).
 
 bytearray.zeros(10), on the other hand, is relatively clear,
 independently of user expectations.

Zeros would have been great but that should have been done originally.
The time to get API design right is at inception.
Now, you're just breaking code and invalidating any published examples.

 
 Another thought is that the core devs should be very reluctant to deprecate
 anything we don't have to while the 2 to 3 transition is still in progress.
 Every new deprecation of APIs that existed in Python 2.7 just adds another
 obstacle to converting code.  Individually, the differences are trivial.
 Collectively, they present a good reason to never migrate code to Python 3.
 
 This is actually one of the inconsistencies between the Python 2 and 3
 binary APIs:

However, bytearray(n) is the same in both Python 2 and Python 3.
Changing it in Python 3 increases the gulf between the two.

The further we let Python 3 diverge from Python 2, the less likely that
people will convert their code and the harder you make it to write code
that runs under both.

FWIW, I've been teaching Python full time for three years.  I cover the
use of bytearray(n) in my classes and not a single person out of 3000+
engineers have had a problem with it.   I seriously question the PEP's
assertion that there is a real problem to be solved (i.e. that people
are baffled by bytearray(bufsiz)) and that the problem is sufficiently
painful to warrant the headaches that go along with API changes.

The other proposal to add bytearray.byte(3) should probably be named
bytearray.from_byte(3) for clarity.  That said, I question whether there is
actually a use case for this.   I have never seen seen code that has a
need to create a byte array of length one from a single integer.
For the most part, the API will be easiest to learn if it matches what
we do for lists and for array.array.

Sorry Nick, but I think you're making the API worse instead of better.
This API isn't perfect but it isn't flat-out broken either.   There is some
unfortunate asymmetry between bytes() and bytearray() in Python 2,
but that ship has sailed.  The current API for Python 3 is pretty good
(though there is still a tension between wanting to be like lists and like
strings both at the same time).


Raymond


P.S.  The most important problem in the Python world now is getting
Python 2 users to adopt Python 3.  The core devs need to develop
a strong distaste for anything that makes that problem harder.





___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Donald Stufft


 On Aug 17, 2014, at 1:07 PM, Raymond Hettinger raymond.hettin...@gmail.com 
 wrote:
 
 
 On Aug 17, 2014, at 1:41 AM, Nick Coghlan ncogh...@gmail.com 
 mailto:ncogh...@gmail.com wrote:
 
 If I see bytearray(10) there is nothing there that suggests this
 creates an array of length 10 and initialises it to zero to me. I'd
 be more inclined to guess it would be equivalent to bytearray([10]).
 
 bytearray.zeros(10), on the other hand, is relatively clear,
 independently of user expectations.
 
 Zeros would have been great but that should have been done originally.
 The time to get API design right is at inception.
 Now, you're just breaking code and invalidating any published examples.
 
 
 Another thought is that the core devs should be very reluctant to deprecate
 anything we don't have to while the 2 to 3 transition is still in progress.
 Every new deprecation of APIs that existed in Python 2.7 just adds another
 obstacle to converting code.  Individually, the differences are trivial.
 Collectively, they present a good reason to never migrate code to Python 3.
 
 This is actually one of the inconsistencies between the Python 2 and 3
 binary APIs:
 
 However, bytearray(n) is the same in both Python 2 and Python 3.
 Changing it in Python 3 increases the gulf between the two.
 
 The further we let Python 3 diverge from Python 2, the less likely that
 people will convert their code and the harder you make it to write code
 that runs under both.
 
 FWIW, I've been teaching Python full time for three years.  I cover the
 use of bytearray(n) in my classes and not a single person out of 3000+
 engineers have had a problem with it.   I seriously question the PEP's
 assertion that there is a real problem to be solved (i.e. that people
 are baffled by bytearray(bufsiz)) and that the problem is sufficiently
 painful to warrant the headaches that go along with API changes.
 
 The other proposal to add bytearray.byte(3) should probably be named
 bytearray.from_byte(3) for clarity.  That said, I question whether there is
 actually a use case for this.   I have never seen seen code that has a
 need to create a byte array of length one from a single integer.
 For the most part, the API will be easiest to learn if it matches what
 we do for lists and for array.array.
 
 Sorry Nick, but I think you're making the API worse instead of better.
 This API isn't perfect but it isn't flat-out broken either.   There is some
 unfortunate asymmetry between bytes() and bytearray() in Python 2,
 but that ship has sailed.  The current API for Python 3 is pretty good
 (though there is still a tension between wanting to be like lists and like
 strings both at the same time).
 
 
 Raymond
 
 
 P.S.  The most important problem in the Python world now is getting
 Python 2 users to adopt Python 3.  The core devs need to develop
 a strong distaste for anything that makes that problem harder.
 

For the record I’ve had all of the problems that Nick states and I’m
+1 on this change.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Ethan Furman


On 08/17/2014 10:16 AM, Donald Stufft wrote:


For the record I’ve had all of the problems that Nick states and I’m
+1 on this change.


I've had many of the problems Nick states and I'm also +1.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Raymond Hettinger


On Aug 17, 2014, at 11:33 AM, Ethan Furman et...@stoneleaf.us wrote:

 I've had many of the problems Nick states and I'm also +1.

There are two code snippets below which were taken from the standard library.
Are you saying that:
1) you don't understand the code (as the pep suggests)
2) you are willing to break that code and everything like it
3) and it would be more elegantly expressed as:  
charmap = bytearray.zeros(256)
and
mapping = bytearray.zeros(256)

At work, I have network engineers creating IPv4 headers and other structures
with bytearrays initialized to zeros.  Do you really want to break all their 
code?
No where else in Python do we create buffers that way.  Code like
msg, who = s.recvfrom(256) is the norm.

Also, it is unclear if you're saying that you have an actual use case for this
part of the proposal?

   ba = bytearray.byte(65)

And than the code would be better, clearer, and faster than the currently 
working form?

   ba = bytearray([65])

Does there really need to be a special case for constructing a single byte?
To me, that is akin to proposing list.from_int(65) as an important special
case to replace [65].

If you must muck with the ever changing bytes() API, then please 
leave the bytearray() API alone.  I think we should show some respect
for code that is currently working and is cleanly expressible in both
Python 2 and Python 3.  We aren't winning users with API churn.

FWIW, I guessing that the differing view points in the thread stem
mainly from the proponents experiences with bytes() rather than
from experience with bytearray() which doesn't seem to have any
usage problems in the wild.  I've never seen a developer say they
didn't understand what buf = bytearray(1024) means.   That is
not an actual problem that needs solving (or breaking).

What may be an actual problem is code like char = bytes(1024)
though I'm unclear what a user might have actually been trying
to do with code like that.


Raymond


--- excerpts from Lib/sre_compile.py ---

charmap = bytearray(256)
for op, av in charset:
while True:
try:
if op is LITERAL:
charmap[fixup(av)] = 1
elif op is RANGE:
for i in range(fixup(av[0]), fixup(av[1])+1):
charmap[i] = 1
elif op is NEGATE:
out.append((op, av))
else:
tail.append((op, av))

...

charmap = bytes(charmap) # should be hashable   
  
comps = {}
mapping = bytearray(256)
block = 0
data = bytearray()
for i in range(0, 65536, 256):
chunk = charmap[i: i + 256]
if chunk in comps:
mapping[i // 256] = comps[chunk]
else:
mapping[i // 256] = comps[chunk] = block
block += 1
data += chunk
data = _mk_bitmap(data)
data[0:0] = [block] + _bytes_to_codes(mapping)
out.append((BIGCHARSET, data))
out += tail
return out___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Donald Stufft


 On Aug 17, 2014, at 5:19 PM, Raymond Hettinger raymond.hettin...@gmail.com 
 wrote:
 
 
 On Aug 17, 2014, at 11:33 AM, Ethan Furman et...@stoneleaf.us 
 mailto:et...@stoneleaf.us wrote:
 
 I've had many of the problems Nick states and I'm also +1.
 
 There are two code snippets below which were taken from the standard library.
 Are you saying that:
 1) you don't understand the code (as the pep suggests)
 2) you are willing to break that code and everything like it
 3) and it would be more elegantly expressed as:  
 charmap = bytearray.zeros(256)
 and
 mapping = bytearray.zeros(256)
 
 At work, I have network engineers creating IPv4 headers and other structures
 with bytearrays initialized to zeros.  Do you really want to break all their 
 code?
 No where else in Python do we create buffers that way.  Code like
 msg, who = s.recvfrom(256) is the norm.
 
 Also, it is unclear if you're saying that you have an actual use case for this
 part of the proposal?
 
ba = bytearray.byte(65)
 
 And than the code would be better, clearer, and faster than the currently 
 working form?
 
ba = bytearray([65])
 
 Does there really need to be a special case for constructing a single byte?
 To me, that is akin to proposing list.from_int(65) as an important special
 case to replace [65].
 
 If you must muck with the ever changing bytes() API, then please 
 leave the bytearray() API alone.  I think we should show some respect
 for code that is currently working and is cleanly expressible in both
 Python 2 and Python 3.  We aren't winning users with API churn.
 
 FWIW, I guessing that the differing view points in the thread stem
 mainly from the proponents experiences with bytes() rather than
 from experience with bytearray() which doesn't seem to have any
 usage problems in the wild.  I've never seen a developer say they
 didn't understand what buf = bytearray(1024) means.   That is
 not an actual problem that needs solving (or breaking).
 
 What may be an actual problem is code like char = bytes(1024)
 though I'm unclear what a user might have actually been trying
 to do with code like that.

I think this is probably correct. I generally don’t think that bytes(1024)
makes much sense at all, especially not as a default constructor. Most likely
it exists to be similar to bytearray().

I don't have a specific problem with bytearray(1024), though I do think it's
more elegantly and clearly described as bytearray.zeros(1024), but not by much.

I find bytes.byte()/bytearray to be needed as long as there isn't a simple way
to iterate over a bytes or bytearray in a way that yields bytes or bytearrays
instead of integers. To be honest I can't think of a time when I'd actually
*want* to iterate over a bytes/bytearray as integers. Although I realize there
is unlikely to be a reasonable method to change that now. If iterbytes is added
I'm not sure where i'd personally use either bytes.byte() or bytearray.byte().

In general though I think that overloading a single constructor method to do
something conceptually different based on the type of the parameter leads to
these kind of confusing scenarios and that having differently named constructors
for the different concepts is far clearer.

So given all that, I am:

* +1 for some method of iterating over both types as bytes instead of
  integers.
* +1 on adding .zeros to both types as an alternative and preferred method of
  creating a zero filled instance and deprecating the original method[1].
* -0 on adding .byte to both types as an alternative method of creating a
  single byte instance.
* -1 On changing the meaning of bytearray(1024).
* +/-0 on changing the meaning of bytes(1024), I think that bytes(1024) is
  likely to *not* be what someone wants and that what they really want is
  bytes([N]). I also think that the number one reason for someone to be doing
  bytes(N) is because they were attempting to iterate over a bytes or bytearray
  object and they got an integer. I also think that it's bad that this changes
  from 2.x to 3.x and I wish it hadn't. However I can't decide if it's worth
  reverting this at this time or not.

[1] By deprecating I mean, raise a deprecation warning, or something but my
thoughts on actually removing the other methods are listed explicitly.

---
Donald Stufft
PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Antoine Pitrou



Le 17/08/2014 13:07, Raymond Hettinger a écrit :


FWIW, I've been teaching Python full time for three years.  I cover the
use of bytearray(n) in my classes and not a single person out of 3000+
engineers have had a problem with it.


This is less about bytearray() than bytes(), IMO. bytearray() is 
sufficiently specialized that only experienced people will encounter it.


And while preallocating a bytearray of a certain size makes sense, it's 
completely pointless for a bytes object.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Nick Coghlan

On 18 Aug 2014 03:07, Raymond Hettinger raymond.hettin...@gmail.com
wrote:


 On Aug 17, 2014, at 1:41 AM, Nick Coghlan ncogh...@gmail.com wrote:

 If I see bytearray(10) there is nothing there that suggests this
 creates an array of length 10 and initialises it to zero to me. I'd
 be more inclined to guess it would be equivalent to bytearray([10]).

 bytearray.zeros(10), on the other hand, is relatively clear,
 independently of user expectations.


 Zeros would have been great but that should have been done originally.
 The time to get API design right is at inception.
 Now, you're just breaking code and invalidating any published examples.

I'm fine with postponing the deprecation elements indefinitely (or just
deprecating bytes(int) and leaving bytearray(int) alone).



 Another thought is that the core devs should be very reluctant to
deprecate
 anything we don't have to while the 2 to 3 transition is still in
progress.
 Every new deprecation of APIs that existed in Python 2.7 just adds
another
 obstacle to converting code.  Individually, the differences are trivial.
 Collectively, they present a good reason to never migrate code to
Python 3.


 This is actually one of the inconsistencies between the Python 2 and 3
 binary APIs:


 However, bytearray(n) is the same in both Python 2 and Python 3.
 Changing it in Python 3 increases the gulf between the two.

 The further we let Python 3 diverge from Python 2, the less likely that
 people will convert their code and the harder you make it to write code
 that runs under both.

 FWIW, I've been teaching Python full time for three years.  I cover the
 use of bytearray(n) in my classes and not a single person out of 3000+
 engineers have had a problem with it.   I seriously question the PEP's
 assertion that there is a real problem to be solved (i.e. that people
 are baffled by bytearray(bufsiz)) and that the problem is sufficiently
 painful to warrant the headaches that go along with API changes.

Yes, I'd expect engineers and networking folks to be fine with it. It isn't
how this mode of the constructor *works* that worries me, it's how it
*fails* (i.e. silently producing unexpected data rather than a type error).

Purely deprecating the bytes case and leaving bytearray alone would likely
address my concerns.


 The other proposal to add bytearray.byte(3) should probably be named
 bytearray.from_byte(3) for clarity.  That said, I question whether there
is
 actually a use case for this.   I have never seen seen code that has a
 need to create a byte array of length one from a single integer.
 For the most part, the API will be easiest to learn if it matches what
 we do for lists and for array.array.

This part of the proposal came from a few things:

* many of the bytes and bytearray methods only accept bytes-like objects,
but iteration and indexing produce integers
* to mitigate the impact of the above, some (but not all) bytes and
bytearray methods now accept integers in addition to bytes-like objects
* ord() in Python 3 is only documented as accepting length 1 strings, but
also accepts length 1 bytes-like objects

Adding bytes.byte() makes it practical to document the binary half of ord's
behaviour, and eliminates any temptation to expand the also accepts
integers behaviour out to more types.

bytes.byte() thus becomes the binary equivalent of chr(), just as Python 2
had both chr() and unichr().

I don't recall ever needing chr() in a real program either, but I still
consider it an important part of clearly articulating the data model.

 Sorry Nick, but I think you're making the API worse instead of better.
 This API isn't perfect but it isn't flat-out broken either.   There is
some
 unfortunate asymmetry between bytes() and bytearray() in Python 2,
 but that ship has sailed.  The current API for Python 3 is pretty good
 (though there is still a tension between wanting to be like lists and like
 strings both at the same time).

Yes. It didn't help that the docs previously expected readers to infer the
behaviour of the binary sequence methods from the string documentation -
while the new docs could still use some refinement, I've at least addressed
that part of the problem.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Antoine Pitrou



Le 16/08/2014 01:17, Nick Coghlan a écrit :


* Deprecate passing single integer values to ``bytes`` and ``bytearray``


I'm neutral. Ideally we wouldn't have done that mistake at the beginning.


* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
   ``memoryview.iterbytes`` alternative iterators


+0.5. iterbytes isn't really great as a name.

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Raymond Hettinger


On Aug 17, 2014, at 4:08 PM, Nick Coghlan ncogh...@gmail.com wrote:

 Purely deprecating the bytes case and leaving bytearray alone would likely 
 address my concerns.

That is good progress.  Thanks :-)

Would a warning for the bytes case suffice, do you need an actual deprecation?

 bytes.byte() thus becomes the binary equivalent of chr(), just as Python 2 
 had both chr() and unichr().
 
 I don't recall ever needing chr() in a real program either, but I still 
 consider it an important part of clearly articulating the data model.
 
 


I don't recall having ever needed this  greatly weakens the premise that this 
is needed :-)

The APIs have been around since 2.6 and AFAICT there have been zero demonstrated
need for a special case for a single byte.  We already have a perfectly good 
spelling:

   NUL = bytes([0])

The Zen tells us we really don't need a second way to do it (actually a third 
since you
can also write b'\x00') and it suggests that this special case isn't special 
enough.

I encourage restraint against adding an unneeded class method that has no 
parallel
elsewhere.  Right now, the learning curve is mitigated because bytes is very 
str-like
and because bytearray is list-like (i.e. the method names have been used 
elsewhere
and likely already learned before encountering bytes() or bytearray()).  
Putting in new,
rarely used funky method adds to the learning burden.

If you do press forward with adding it (and I don't see why), then as an 
alternate 
constructor, the name should be from_int() or some such to avoid ambiguity
and to make clear that it is a class method.

 iterbytes() isn't especially attractive as a method name, but it's far more
 explicit about its purpose.

I concur.  In this case, explicitness matters.


Raymond


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Nick Coghlan

On 18 Aug 2014 09:41, Raymond Hettinger raymond.hettin...@gmail.com
wrote:


 I encourage restraint against adding an unneeded class method that has no
parallel
 elsewhere.  Right now, the learning curve is mitigated because bytes is
very str-like
 and because bytearray is list-like (i.e. the method names have been used
elsewhere
 and likely already learned before encountering bytes() or bytearray()).
 Putting in new,
 rarely used funky method adds to the learning burden.

 If you do press forward with adding it (and I don't see why), then as an
alternate
 constructor, the name should be from_int() or some such to avoid ambiguity
 and to make clear that it is a class method.

If I remember the sequence of events correctly, I thought of
map(bytes.byte, data) first, and then Guido suggested a dedicated
iterbytes() method later.

The step I hadn't taken (until now) was realising that the new
memoryview(data).iterbytes() capability actually combines with the existing
(bytes([b]) for b in data) to make the original bytes.byte idea unnecessary.

Cheers,
Nick.
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Antoine Pitrou



Le 17/08/2014 19:41, Raymond Hettinger a écrit :


The APIs have been around since 2.6 and AFAICT there have been zero
demonstrated
need for a special case for a single byte.  We already have a perfectly
good spelling:
NUL = bytes([0])


That is actually a very cumbersome spelling. Why should I first create a 
one-element list in order to create a one-byte bytes object?



The Zen tells us we really don't need a second way to do it (actually a
third since you
can also write b'\x00') and it suggests that this special case isn't
special enough.


b'\x00' is obviously the right way to do it in this case, but we're 
concerned about the non-constant case.


The reason to instantiate bytes from non-constant integer comes from the 
unfortunate indexing and iteration behaviour of bytes objects.


Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Ethan Furman


On 08/17/2014 02:19 PM, Raymond Hettinger wrote:

On Aug 17, 2014, at 11:33 AM, Ethan Furman wrote:


I've had many of the problems Nick states and I'm also +1.


There are two code snippets below which were taken from the standard library.


[...]

My issues are with 'bytes', not 'bytearray'.  'bytearray(10)' actually makes sense.  I certainly have no problem with 
bytearray and bytes not being exactly the same.


My primary issues with bytes is not being able to do b'abc'[2] == b'c', and with not being able to do x = b'abc'[2]; y = 
bytes(x); assert y == b'c'.


And because of the backwards compatibility issues I would deprecate, because we have a new 'better' way, but not remove, 
the current functionality.


I pretty much agree exactly with what Donald Stufft said about it.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Ethan Furman


On 08/17/2014 04:08 PM, Nick Coghlan wrote:


I'm fine with postponing the deprecation elements indefinitely (or just 
deprecating bytes(int) and leaving
bytearray(int) alone).


+1 on both pieces.

--
~Ethan~
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Ian Cordasco

On Sun, Aug 17, 2014 at 8:52 PM, Ethan Furman et...@stoneleaf.us wrote:
 On 08/17/2014 04:08 PM, Nick Coghlan wrote:


 I'm fine with postponing the deprecation elements indefinitely (or just
 deprecating bytes(int) and leaving
 bytearray(int) alone).


 +1 on both pieces.

Perhaps postpone the deprecation to Python 4000 ;)
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Alex Gaynor

Donald Stufft donald at stufft.io writes:

 
 
 
 For the record I’ve had all of the problems that Nick states and I’m
 +1 on this change.
 
 
 ---
 Donald Stufft
 PGP: 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
 

I've hit basically every problem everyone here has stated, and in no uncertain
terms am I completely opposed to deprecating anything. The Python 2 to 3
migration is already hard enough, and already proceeding far too slowly for
many of our tastes. Making that migration even more complex would drive me to
the point of giving up.

Alex

___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-17 Thread Devin Jeanpierre

On Sun, Aug 17, 2014 at 7:14 PM, Alex Gaynor alex.gay...@gmail.com wrote:
 I've hit basically every problem everyone here has stated, and in no uncertain
 terms am I completely opposed to deprecating anything. The Python 2 to 3
 migration is already hard enough, and already proceeding far too slowly for
 many of our tastes. Making that migration even more complex would drive me to
 the point of giving up.

Could you elaborate what problems you are thinking this will cause for you?

It seems to me that avoiding a bug-prone API is not particularly
complex, and moving it back to its 2.x semantics or making it not work
entirely, rather than making it work differently, would make porting
applications easier. If, during porting to 3.x, you find a deprecation
warning for bytes(n), then rather than being annoying code churny
extra changes, this is actually a bug that's been identified. So it's
helpful even during the deprecation period.

-- Devin
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-15 Thread Guido van Rossum

This feels chatty. I'd like the PEP to call out the specific proposals and
put the more verbose motivation later. It took me a long time to realize
that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3). Also
your mention of bytes.byte() as the counterpart to ord() confused me -- I
think it's more similar to chr(). I don't like iterbytes as a builtin,
let's keep it as a method on affected types.


On Thu, Aug 14, 2014 at 10:50 PM, Nick Coghlan ncogh...@gmail.com wrote:

 I just posted an updated version of PEP 467 after recently finishing
 the updates to the Python 3.4+ binary sequence docs to decouple them
 from the str docs.

 Key points in the proposal:

 * deprecate passing integers to bytes() and bytearray()
 * add bytes.zeros() and bytearray.zeros() as a replacement
 * add bytes.byte() and bytearray.byte() as counterparts to ord() for
 binary data
 * add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes()

 As far as I am aware, that last item poses the only open question,
 with the alternative being to add an iterbytes builtin with a
 definition along the lines of the following:

 def iterbytes(data):
 try:
 getiter = type(data).__iterbytes__
 except AttributeError:
 iter = map(bytes.byte, data)
 else:
 iter = getiter(data)
 return iter

 Regards,
 Nick.

 PEP URL: http://www.python.org/dev/peps/pep-0467/

 Full PEP text:
 =
 PEP: 467
 Title: Minor API improvements for bytes and bytearray
 Version: $Revision$
 Last-Modified: $Date$
 Author: Nick Coghlan ncogh...@gmail.com
 Status: Draft
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 2014-03-30
 Python-Version: 3.5
 Post-History: 2014-03-30 2014-08-15


 Abstract
 

 During the initial development of the Python 3 language specification, the
 core ``bytes`` type for arbitrary binary data started as the mutable type
 that is now referred to as ``bytearray``. Other aspects of operating in
 the binary domain in Python have also evolved over the course of the Python
 3 series.

 This PEP proposes a number of small adjustments to the APIs of the
 ``bytes``
 and ``bytearray`` types to make it easier to operate entirely in the binary
 domain.


 Background
 ==

 To simplify the task of writing the Python 3 documentation, the ``bytes``
 and ``bytearray`` types were documented primarily in terms of the way they
 differed from the Unicode based Python 3 ``str`` type. Even when I
 `heavily revised the sequence documentation
 http://hg.python.org/cpython/rev/463f52d20314`__ in 2012, I retained
 that
 simplifying shortcut.

 However, it turns out that this approach to the documentation of these
 types
 had a problem: it doesn't adequately introduce users to their hybrid
 nature,
 where they can be manipulated *either* as a sequence of integers type,
 *or* as ``str``-like types that assume ASCII compatible data.

 That oversight has now been corrected, with the binary sequence types now
 being documented entirely independently of the ``str`` documentation in
 `Python 3.4+ 
 https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview
 `__

 The confusion isn't just a documentation issue, however, as there are also
 some lingering design quirks from an earlier pre-release design where there
 was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
 was mutable (with no immutable counterpart).

 Finally, additional experience with using the existing Python 3 binary
 sequence types in real world applications has suggested it would be
 beneficial to make it easier to convert integers to length 1 bytes objects.


 Proposals
 =

 As a consistency improvement proposal, this PEP is actually about a few
 smaller micro-proposals, each aimed at improving the usability of the
 binary
 data model in Python 3. Proposals are motivated by one of two main factors:

 * removing remnants of the original design of ``bytes`` as a mutable type
 * allowing users to easily convert integer values to a length 1 ``bytes``
   object


 Alternate Constructors
 --

 The ``bytes`` and ``bytearray`` constructors currently accept an integer
 argument, but interpret it to mean a zero-filled object of the given
 length.
 This is a legacy of the original design of ``bytes`` as a mutable type,
 rather than a particularly intuitive behaviour for users. It has become
 especially confusing now that some other ``bytes`` interfaces treat
 integers
 and the corresponding length 1 bytes instances as equivalent input.
 Compare::

  b\x03 in bytes([1, 2, 3])
 True
  3 in bytes([1, 2, 3])
 True

  bytes(b\x03)
 b'\x03'
  bytes(3)
 b'\x00\x00\x00'

 This PEP proposes that the current handling of integers in the bytes and
 bytearray constructors by deprecated in Python 3.5 and targeted for
 removal in Python 3.7, being replaced by two more

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-15 Thread Serhiy Storchaka


15.08.14 08:50, Nick Coghlan написав(ла):

* add bytes.zeros() and bytearray.zeros() as a replacement


b'\0' * n and bytearray(b'\0') * n look good replacements to me. No need 
to learn new method. And it works right now.



* add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes()


What are use cases for this? I suppose that main use case may be writing 
the code compatible with 2.7 and 3.x. But in this case you need a 
wrapper (because these types in 2.7 have no the iterbytes() method). And 
how larger would be an advantage of this method over the 
``map(bytes.byte, data)``?



___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-15 Thread Victor Stinner

2014-08-15 21:54 GMT+02:00 Serhiy Storchaka storch...@gmail.com:
 15.08.14 08:50, Nick Coghlan написав(ла):
 * add bytes.zeros() and bytearray.zeros() as a replacement

 b'\0' * n and bytearray(b'\0') * n look good replacements to me. No need to
 learn new method. And it works right now.

FYI there is a pending patch for bytearray(int) to use calloc()
instead of malloc(). It's faster for buffer for n larger than 1 MB:
http://bugs.python.org/issue21644

I'm not sure that the optimization is really useful.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-15 Thread Victor Stinner

2014-08-15 7:50 GMT+02:00 Nick Coghlan ncogh...@gmail.com:
 As far as I am aware, that last item poses the only open question,
 with the alternative being to add an iterbytes builtin (...)

Do you have examples of use cases for a builtin function? I only found
5 usages of bytes((byte,)) constructor in the standard library:

$ grep -E 'bytes\(\([^)]+, *\)\)' $(find -name *.py)
./Lib/quopri.py:c = bytes((c,))
./Lib/quopri.py:c = bytes((c,))
./Lib/base64.py:b32tab = [bytes((i,)) for i in _b32alphabet]
./Lib/base64.py:_a85chars = [bytes((i,)) for i in range(33, 118)]
./Lib/base64.py:_b85chars = [bytes((i,)) for i in _b85alphabet]

bytes.iterbytes() can be used in 4 cases on 5. Adding a new builtin
for a single line in the whole standard library doesn't look right.

Victor
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-15 Thread Nick Coghlan

On 16 August 2014 03:48, Guido van Rossum gu...@python.org wrote:
 This feels chatty. I'd like the PEP to call out the specific proposals and
 put the more verbose motivation later.

I realised that some of that history was actually completely
irrelevant now, so I culled a fair bit of it entirely.

 It took me a long time to realize
 that you don't want to deprecate bytes([1, 2, 3]), but only bytes(3).

I've split out the four subproposals into their own sections, so
hopefully this is clearer now.

 Also
 your mention of bytes.byte() as the counterpart to ord() confused me -- I
 think it's more similar to chr().

This was just a case of me using the wrong word - I meant inverse
rather than counterpart.

 I don't like iterbytes as a builtin, let's
 keep it as a method on affected types.

Done. I also added an explanation of the benefits it offers over the
more generic map(bytes.byte, data), as well as more precise
semantics for how it will work with memoryview objects.

New draft is live at http://www.python.org/dev/peps/pep-0467/, as well
as being included inline below.

Regards,
Nick.

===

PEP: 467
Title: Minor API improvements for bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan ncogh...@gmail.com
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15 2014-08-16


Abstract


During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes four small adjustments to the APIs of the ``bytes``,
``bytearray`` and ``memoryview`` types to make it easier to operate entirely
in the binary domain:

* Deprecate passing single integer values to ``bytes`` and ``bytearray``
* Add ``bytes.zeros`` and ``bytearray.zeros`` alternative constructors
* Add ``bytes.byte`` and ``bytearray.byte`` alternative constructors
* Add ``bytes.iterbytes``, ``bytearray.iterbytes`` and
  ``memoryview.iterbytes`` alternative iterators


Proposals
=

Deprecation of current zero-initialised sequence behaviour


Currently, the ``bytes`` and ``bytearray`` constructors accept an integer
argument and interpret it as meaning to create a zero-initialised sequence
of the given size::

 bytes(3)
b'\x00\x00\x00'
 bytearray(3)
bytearray(b'\x00\x00\x00')

This PEP proposes to deprecate that behaviour in Python 3.5, and remove it
entirely in Python 3.6.

No other changes are proposed to the existing constructors.


Addition of explicit zero-initialised sequence constructors
-

To replace the deprecated behaviour, this PEP proposes the addition of an
explicit ``zeros`` alternative constructor as a class method on both
``bytes`` and ``bytearray``::

 bytes.zeros(3)
b'\x00\x00\x00'
 bytearray.zeros(3)
bytearray(b'\x00\x00\x00')

It will behave just as the current constructors behave when passed a single
integer.

The specific choice of ``zeros`` as the alternative constructor name is taken
from the corresponding initialisation function in NumPy (although, as these
are 1-dimensional sequence types rather than N-dimensional matrices, the
constructors take a length as input rather than a shape tuple)


Addition of explicit single byte constructors
---

As binary counterparts to the text ``chr`` function, this PEP proposes the
addition of an explicit ``byte`` alternative constructor as a class method
on both ``bytes`` and ``bytearray``::

 bytes.byte(3)
b'\x03'
 bytearray.byte(3)
bytearray(b'\x03')

These methods will only accept integers in the range 0 to 255 (inclusive)::

 bytes.byte(512)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: bytes must be in range(0, 256)

 bytes.byte(1.0)
Traceback (most recent call last):
  File stdin, line 1, in module
TypeError: 'float' object cannot be interpreted as an integer

The documentation of the ``ord`` builtin will be updated to explicitly note
that ``bytes.byte`` is the inverse operation for binary data, while ``chr``
is the inverse operation for text data.

Behaviourally, ``bytes.byte(x)`` will be equivalent to the current
``bytes([x])`` (and similarly for ``bytearray``). The new spelling is
expected to be easier to discover and easier to read (especially when used
in conjunction with indexing operations on binary sequence types).

As a separate method, the new spelling will also work better with higher
order functions like ``map``.


Addition of optimised iterator methods that produce ``bytes`` objects

[Python-Dev] PEP 467: Minor API improvements for bytes bytearray

2014-08-14 Thread Nick Coghlan

I just posted an updated version of PEP 467 after recently finishing
the updates to the Python 3.4+ binary sequence docs to decouple them
from the str docs.

Key points in the proposal:

* deprecate passing integers to bytes() and bytearray()
* add bytes.zeros() and bytearray.zeros() as a replacement
* add bytes.byte() and bytearray.byte() as counterparts to ord() for binary data
* add bytes.iterbytes(), bytearray.iterbytes() and memoryview.iterbytes()

As far as I am aware, that last item poses the only open question,
with the alternative being to add an iterbytes builtin with a
definition along the lines of the following:

def iterbytes(data):
try:
getiter = type(data).__iterbytes__
except AttributeError:
iter = map(bytes.byte, data)
else:
iter = getiter(data)
return iter

Regards,
Nick.

PEP URL: http://www.python.org/dev/peps/pep-0467/

Full PEP text:
=
PEP: 467
Title: Minor API improvements for bytes and bytearray
Version: $Revision$
Last-Modified: $Date$
Author: Nick Coghlan ncogh...@gmail.com
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2014-03-30
Python-Version: 3.5
Post-History: 2014-03-30 2014-08-15


Abstract


During the initial development of the Python 3 language specification, the
core ``bytes`` type for arbitrary binary data started as the mutable type
that is now referred to as ``bytearray``. Other aspects of operating in
the binary domain in Python have also evolved over the course of the Python
3 series.

This PEP proposes a number of small adjustments to the APIs of the ``bytes``
and ``bytearray`` types to make it easier to operate entirely in the binary
domain.


Background
==

To simplify the task of writing the Python 3 documentation, the ``bytes``
and ``bytearray`` types were documented primarily in terms of the way they
differed from the Unicode based Python 3 ``str`` type. Even when I
`heavily revised the sequence documentation
http://hg.python.org/cpython/rev/463f52d20314`__ in 2012, I retained that
simplifying shortcut.

However, it turns out that this approach to the documentation of these types
had a problem: it doesn't adequately introduce users to their hybrid nature,
where they can be manipulated *either* as a sequence of integers type,
*or* as ``str``-like types that assume ASCII compatible data.

That oversight has now been corrected, with the binary sequence types now
being documented entirely independently of the ``str`` documentation in
`Python 3.4+ 
https://docs.python.org/3/library/stdtypes.html#binary-sequence-types-bytes-bytearray-memoryview`__

The confusion isn't just a documentation issue, however, as there are also
some lingering design quirks from an earlier pre-release design where there
was *no* separate ``bytearray`` type, and instead the core ``bytes`` type
was mutable (with no immutable counterpart).

Finally, additional experience with using the existing Python 3 binary
sequence types in real world applications has suggested it would be
beneficial to make it easier to convert integers to length 1 bytes objects.


Proposals
=

As a consistency improvement proposal, this PEP is actually about a few
smaller micro-proposals, each aimed at improving the usability of the binary
data model in Python 3. Proposals are motivated by one of two main factors:

* removing remnants of the original design of ``bytes`` as a mutable type
* allowing users to easily convert integer values to a length 1 ``bytes``
  object


Alternate Constructors
--

The ``bytes`` and ``bytearray`` constructors currently accept an integer
argument, but interpret it to mean a zero-filled object of the given length.
This is a legacy of the original design of ``bytes`` as a mutable type,
rather than a particularly intuitive behaviour for users. It has become
especially confusing now that some other ``bytes`` interfaces treat integers
and the corresponding length 1 bytes instances as equivalent input.
Compare::

 b\x03 in bytes([1, 2, 3])
True
 3 in bytes([1, 2, 3])
True

 bytes(b\x03)
b'\x03'
 bytes(3)
b'\x00\x00\x00'

This PEP proposes that the current handling of integers in the bytes and
bytearray constructors by deprecated in Python 3.5 and targeted for
removal in Python 3.7, being replaced by two more explicit alternate
constructors provided as class methods. The initial python-ideas thread
[ideas-thread1]_ that spawned this PEP was specifically aimed at deprecating
this constructor behaviour.

Firstly, a ``byte`` constructor is proposed that converts integers
in the range 0 to 255 (inclusive) to a ``bytes`` object::

 bytes.byte(3)
b'\x03'
 bytearray.byte(3)
bytearray(b'\x03')
 bytes.byte(512)
Traceback (most recent call last):
  File stdin, line 1, in module
ValueError: bytes must be in range(0, 256)

One specific use case for this alternate constructor is

57 matches

Mail list logo