On Mon, Jan 13, 2014 at 12:31 PM, Antoine Pitrou <solip...@pitrou.net>wrote:
> On Mon, 13 Jan 2014 08:36:05 -0800 > Ethan Furman <et...@stoneleaf.us> wrote: > > > On 01/13/2014 08:09 AM, Antoine Pitrou wrote: > > > On Mon, 13 Jan 2014 07:59:10 -0800 > > > Guido van Rossum <gu...@python.org> wrote: > > >> On Mon, Jan 13, 2014 at 3:41 AM, Antoine Pitrou <solip...@pitrou.net> > wrote: > > >>> What is the use case for embedding a quoted ASCII-encoded > representation > > >>> in a byte stream? > > >> > > >> It doesn't crash but produces undesired output (always, not only when > > >> the data is non-ASCII) that gives the developer a hint to think about > > >> encoding to bytes. > > > > > > But why is it better to give a hint by producing undesired output > (which > > > may actually go unnoticed for some time and produce issues down the > > > road), rather than simply by raising TypeError? > > > > You mean crash all the time? I'd be fine with that for both the str case > > and the bytes case. But's probably too late > > to change the str case, and the bytes case should mirror what str does. > > Let me add something else: str and bytes don't have to be symmetrical. > In Python 2, str and unicode were symmetrical, they allowed exactly the > same operations and were composable. > In Python 3, str and bytes are different beasts; they have different > operations *and* different semantics (for example, bytes interoperates > with bytearray and memoryview, while str doesn't). > This is also why the int type doesn't have a __bytes__ method (ignoring the use of an integer to bytes()): it's universally defined what str(10) should return, but who know what you want when you would want the bytes of 10 (e.g. base-2, ASCII, UTF-16, etc.). > > So bytes formatting really needn't (and shouldn't, IMO) mirror str > formatting. > I think one of the things about Guido's proposal that bugs me is that it breaks the mental model of the .format() method from str in terms of how the mini-language works. For str.format() you have the conversion and the format spec (e.g. "{!r}" and "{:d}", respectively). You apply the conversion by calling the appropriate built-in, e.g. 'r' calls repr(). The format spec semantically gets passed with the object to format() which calls the object's __format__() method: ``format(number, 'd')``. Now Guido's suggestion has two parts that affect the mini-language for .format(). One is that for bytes.format() the default conversion is bytes() instead of str(), which is fine (probably want to add 'b' as a conversion value as well to be consistent). But the other bit is that the format spec goes from semantically meaning ``format(thing, format_spec)`` to ``format(thing, format_spec).encode('ascii', 'strict')`` for at least numbers. That implicitness bugs me as I have always thought of format specs just leading to a call to format(). I think I can live with it, though, as long as it is **consistently** applied across the board for bytes.format(); every use of a format spec leads to calling ``format(thing, format_spec).encode('ascii', 'strict')`` no matter what type 'thing' would be and it is clearly documented that this is done to ease porting and handle the common case then I can live with it. This even gives people in-place ASCII encoding for strings by always using '{:s}' with text which they can do when they port their code to run under both Python 2 and 3. So you should be able to do ``b'Content-Type: {:s}'.format('image/jpeg')`` and have it give ASCII. If you want more explicit encoding to latin-1 then you need to do it explicitly and not rely on the mini-language to do tricks for you. IOW I want to treat the format mini-language as a language and thus not have any special-casing or massive shifts in meaning between str.format() and bytes.format() so my mental model doesn't have to contort based on whether it's str or bytes. My preference is not have any, but if Guido is going say PBP here then I want absolute consistency across the board in how bytes.format() tweaks things. As for %s for the % operator calling ascii(), I think that will be a porting nightmare of finding out why your bytes suddenly stopped being formatted properly and then having to crawl through all of your code for that one use of %s which is getting bytes in. By raising a TypeError you will very easily detect where your screw-up occurred thanks to the traceback; do so otherwise feels too much like implicit type conversion and ask any JavaScript developer how that can be a bad thing. -Brett > > (the only reason I used "%s" in PEP 460 is to allow a migration path > from 2.x bytes-formatting to 3.x bytes-formatting; in a really "pure" > proposal it would have been called something else) > > Regards > > Antoine. > > > _______________________________________________ > Python-Dev mailing list > Python-Dev@python.org > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/brett%40python.org >
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com