Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Ethan Furman Sun, 12 Jan 2014 20:47:46 -0800

On 01/12/2014 07:02 PM, Stephen J. Turnbull wrote:

[snip most of very eloquent reply]


Thank you, Stephen, for remaining calm despite my somewhat heated response.

A few comments in-line.

I now better understand your viewpoint about text always being unicode strings; 
I just happen to disagree.

Hopefully as some consolation I will be very vocal about using str unless bytes is necessary. Any application that usestext should be using str for it, and only using bytes, if necessary, on the back-end.

Ethan Furman writes:

In only one case did I use the word "text" loosely,


[...] Bytes are *never* Python 3 text in my terminology [...] "ASCII-encoded 
text"
as you call it [...] and want to manipulate using str-like methods on bytes

The part that you don't seem to acknowledge (sorry if I missed it) is that there are str-like methods already on bytes.While the actual implementation of isupper (your example from below) may be done using integer methods, it only makessemantic sense if interpreted as ASCII-encoded text.

is *exactly* the Python 2 model of text.  But you deny that the
effect of your proposals (eg, b"%d" % (12,)) is to reintroduce Python
2's bytes/character confusion, don't you?

Given that the default (and only) text type in Py3 is str, which is unicode, I don't think any confusion will be assevere, but I acknowledge that there could be some.

I hardly think Nick is *lying*, any more than you are.  AFAICT, you're
*both* wrong.


LOL, well, at least I'm in good company, then!  :)

I think some of the misunderstanding (which you also seem to suffer
from) is that we (or at least I) /ever/ want a unicode string back
from bytes interpolation.  I don't!


Please tell me why you think I suffer from that misunderstanding.

I no longer recall, but whatever misapprehension I was suffering from you have alleviated. (That sentence would make mydaughter pround! English major. ;)

But did you get that I'm worried that programmers in Omaha will use
that same functionality to communicate American English (for which it
is basically sufficient, and which also requires ASCII when bytes are
used for communication)?


Yes, I get that.  Hopefully their friends and neighbors will slap them with 
fishes if they do.

*My* definition is not ambiguous at all.  If this particular part
of the byte stream is defined to contain ASCII-encoded text, then I
can use the bytes text methods to work with it.


But how is Python supposed to know that?

Python doesn't need to. bytes is a low-level object -- it could contain music, movies, dbf data, pdf data, or mymothers cheesecake recipe (properly encoded, of course). Python can't protect me from treating a music file as if itwere a movie file, or even just writing proper music info at the wrong place in the music file; all that is up to me,as the programmer, to get right, and to understand what is needed.

But under your definition, you need to make the decision, or
explicitly code the decision, on the basis of context.


Exactly so.  I even have to do that in Py2.

If that particular configuration of bytes is because it's
ASCII-encoded text, then sure.


Once again, you are advocate precisely the Python 2 model of text.

Not exactly, because what I get back is bytes, which cannot directly be mixed with unicode (str) as it was in Py2. Ithink this is a key difference.

To use, for example, bytes.__upper__ on data that wasn't
ASCII-encoded text (even if it happened to look like it was) would
be the height of stupidity.  Please don't include me in such
accusations.


I have no idea why you think I think anybody would be that stupid.
That never occured to me.  It's precisely "magic numbers" that happen
to look like English words when interpreted as ASCII coded characters
that I don't want manipulated by str-like methods that interpret text
(such as full-featured format or %).

This confuses me somewhat. It's okay to use b'ethan'.upper(), which only makes semantic sense as ASCII-encoded text,but b'age: %d' % 43 isn't? (Aside, I'm perfectly comfortable with "ASCII-encoded text" because if you tooku'ethan'.encode('ascii') you would get b'ethan'. If it was some other encoding, such as cp1251, I would call thatparticular byte stream "cp1251-encoded text". And if there were methods that worked directly on a cp1251-encoded bytestream I would not have any problem using them on cp1251-encoded text.)

What Nick
means by a "boundary type" is a type that works seamlessly with the
types on each side of the boundary as a helper in the conversion.  So
when you use a struct to pack a bool, an int, and a date into a bytes,
the struct is the boundary type.  And if there's a helper type to work
with bytes and/or str simultaneously, that's a boundary type, eg,
asciistr.  But bytes itself is not a boundary type, it's just a type
with no internal structure, not even characters.


Hmmm.  I'll have to think about this.

Okay, I've thought somewhat. Under the definition above would it be fair to say that Db3Table (a class in my dbfmodule) is a boundary type? It sits between the actual file and the program, and transforms bytes into actual Python types.


--
~Ethan~
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] PEP 460: allowing %d and %f and mojibake

Reply via email to