On Jan 13, 2014, at 1:45 PM, Daniel Holth <dho...@gmail.com> wrote:

> On Mon, Jan 13, 2014 at 12:42 PM, R. David Murray <rdmur...@bitdance.com> 
> wrote:
>> On Mon, 13 Jan 2014 12:41:18 +0100, Antoine Pitrou <solip...@pitrou.net> 
>> wrote:
>>> On Sun, 12 Jan 2014 18:11:47 -0800
>>> Guido van Rossum <gu...@python.org> wrote:
>>>> On Sun, Jan 12, 2014 at 5:27 PM, Ethan Furman <et...@stoneleaf.us> wrote:
>>>>> On 01/12/2014 04:47 PM, Guido van Rossum wrote:
>>>>>> %s seems the trickiest: I think with a bytes argument it should just
>>>>>> insert those bytes (and the padding modifiers should work too), and
>>>>>> for other types it should probably work like %a, so that it works as
>>>>>> expected for numeric values, and with a string argument it will return
>>>>>> the ascii()-variant of its repr(). Examples:
>>>>>> 
>>>>>> b'%s' % 42 == b'42'
>>>>>> b'%s' % 'x' == b"'x'" (i.e. the three-byte string containing an 'x'
>>>>>> enclosed in single quotes)
>>>>> 
>>>>> I'm not sure about the quotes.  Would anyone ever actually want those in 
>>>>> the
>>>>> byte stream?
>>>> 
>>>> Perhaps not, but it's a hint that you should probably think about an
>>>> encoding. It's symmetric with how '%s' % b'x' returns "b'x'". Think of
>>>> it as payback time. :-)
>>> 
>>> What is the use case for embedding a quoted ASCII-encoded representation
>>> in a byte stream?
>> 
>> There is no use case in the sense you are asking, just like there is no
>> real use case for '%s' % b'x' producing "b'x'".  But the real use case
>> is exactly the same: to let you know your code is screwed up without
>> actually blowing up with a encoding Exception.
>> 
>> For the record, I like Guido's logic and proposal.  I don't understand
>> Nick's objection, since I don't see the difference between the situation
>> here where a string gets interpolated into bytes as 'xxx' and the
>> corresponding situation where bytes gets interpolated into a string
>> as b'xxx'.  Why struggle to keep bytes interpolation "pure" if string
>> interpolation isn't?
>> 
>> Guido's proposal makes the language more symmetric, and thus more
>> consistent and less surprising.  Exactly the hallmarks of Python's design
>> sense, IMO.  (Big surprise, right? :)
>> 
>> Of course, this point of view *is* based on the idea that when you are
>> doing interpolation using %/.format, you are in fact primarily concerned
>> with ASCII compatible byte streams.  This is a Practicality sort of
>> argument.  It is, after all, by far the most common use case when
>> doing interpolation[*].
>> 
>> If you wanted to do a purist version of this symmetry, you'd have bytes(x)
>> calling __bytes__ if it was defined and falling back to calling a
>> __brepr__ otherwise.
>> 
>> But what would __brepr__ implement?  The variety of format codes in
>> the struct module argues that there is no "one obvious" binary
>> repr for most types.  (Those that have one would implement __bytes__).
>> And what would be the __brepr__ of an arbitrary 'object'?
>> 
>> Faced with the impracticality of defining __brepr__ usefully in any "pure
>> bytes" form, it seems sensible to admit that the most useful __brepr__
>> is the ascii() encoding of the __repr__.  Which naturally produces 'xxx'
>> as the __brepr__ of a string.
>> 
>> This does cause things to get a little un-pretty when you are operating
>> at the python prompt:
>> 
>>>>> b'%s' % object
>>    b'"<class \\\'object\\\'>"'
>> 
>> But then again that is most likely really not what you mean to do, so
>> it becomes a big red flag...just like b'xxx' is a small red flag when
>> you accidentally interpolate unencoded bytes into a string.
>> 
>> --David
>> 
>> PS: When I first read Guido's remark that the result of interpolating a
>> string should be 'xxx', I went Wah?  I had to reason my way through to
>> it as above, but to him it was just the natural answer.  Guido isn't
>> always right, but this kind of automatic language design consistency
>> is one reason he's the BDFL.
>> 
>> [*] I still think that you mostly want to design your library so that
>> you are handling the text parts as text and the bytes parts as bytes,
>> and encoding/gluing them as appropriate at the IO boundary.  But if Guido
>> says his real code would benefit by being able to interpolate ASCII into
>> bytes at certain points, I'll believe him.
> 
> <elided rant/>
> 
> If you think corrupted data is easier or more pleasant to track down
> than encoding exceptions then I think you are strange. It makes
> porting really difficult while you are still trying to figure out
> where the bytes/str boundaries are. I am now deeply suspicious of all
> % formatting.
> _______________________________________________
> Python-Dev mailing list
> Python-Dev@python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/donald%40stufft.io

For the record, I think %d and %f and such where the RHS is guaranteed to have a
certain set of “characters” that are guaranteed to be ascii compatible is fine 
and it’s
perfectly acceptable to have an implicit ASCII encode for them. The %s code I’m 
not
sure of, I think trying to ascii encode that (just using encode()) is 
dangerous, and I 
think that using ascii() and adding quotes to it is never what anyone is going 
to want. 
Given that I think it’d be far better to blow up if you’re using %s (or at 
least using %s
on a str object and not as an alias for %b) than to implicitly encode that 
(given we
don’t know what the RHS can contain) or to throw junk data into the bytes that 
we
know pretty much nobody ever is going to actually want.

-----------------
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to