On Sat, Aug 31, 2019 at 09:31:15PM +1000, Chris Angelico wrote:
> On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano <st...@pearwood.info> wrote:
> > > So b"abc" should not be allowed?
> >
> > In what way are byte-STRINGS not strings? Unicode-strings and
> > byte-strings share a significant fraction of their APIs, and are so
> > similar that back in Python 2.2 the devs thought it was a good idea to
> > try automagically coercing from one to the other.
> >
> > I was careful to write *string* rather than *str*. Sorry if that wasn't
> > clear enough.
> >
> 
> We call it a string, but a bytes object has as much in common with
> bytearray and with a list of integers as it does with a text string.

I don't think that's true.

py> b'abc'.upper()
b'ABC'

py> [1, 2, 3].upper()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute 'upper'

Shall I beat this dead horse some more by listing the other 33 methods 
that byte-strings share with Unicode-strings but not lists?

Compared to just two methods shared by all three of bytes, str and list, 
(namely count() and index()), and *zero* methods shared by bytes and 
list but not str.

In Python2, byte-strings and Unicode strings were both subclasses of 
type basestring. Although we have moved away from that shared base class 
in Python3, it does demonstrate that conceptually bytes and str are 
closely related to each other.


> Is the contents of a MIDI file a "string"? I would say no, it's not -
> but it can *contain* strings, eg for metadata and lyrics.

Don't confuse *human-readable native language strings* for generic 
strings. "Hello world!" is a string, but so are '&w-8\x02^xs\0' and 
b'DEADBEEF'.


> You can't upper-case the
> variable-length-integer b"\xe7\x61" any more than you can upper-case
> the integer 13281.

Of course you can.

py> b"\xe7\x61".upper()
b'\xe7A'

Whether it is *meaningful* to do so is another question. But the same 
applies to str.upper: just because you can call the method doesn't mean 
that the result will be semantically valid.

    source = "def spam():\n\tpass\n"
    source = source.upper()  # no longer valid Python source code.


> Those common methods are mostly built on the
> assumption that the string contains ASCII text.

As they often do. If they don't, then don't call the text methods which 
don't make sense in context.

Just as there are cases where text methods don't make sense on Unicode 
strings. You wouldn't want to call .casefold() on a password, or 
.lstrip() on a line of Python source code.


[...]
> Bytes and text have a long relationship, and as such, there are
> special similarities. That doesn't mean that bytes ARE text, 

I didn't say that bytes are (human-readable) text. Although they can be: 
not every application needs Unicode strings, ASCII strings are still 
special, and there are still applications where once has to mix binary 
and ASCII text data.

I said they were *strings*. Strings are not necessarily text, although 
they often are. Formally, a string is a finite sequence of symbols that 
are chosen from a set called an alphabet. See:

https://en.wikipedia.org/wiki/String_%28computer_science%29



> I don't think it's necessary to be too adamant about "must be some
> sort of thing-we-call-string" here. Let practicality rule, since
> purity has already waved a white flag at us.

It is because of *practicality* that we should prefer that things that 
look similar should be similar. Code is read far more often that it is 
written, and if you read two pieces of code that look similar, we should 
strongly prefer that they should actually be similar.

Would you be happy with a Pythonesque language that used prefixed 
strings as the delimiter for arbitrary data types?

    mylist = L"1, 2, None, {}, L"", 99.5"

    mydict = D"key: value, None: L"", "abc": "xyz""

    myset = S"1, 2, None"


That's what this proposal wants: string syntax that can return arbitrary 
data types.

How about using quotes for function calls?

    assert chr"9" == "\t"

    assert ord"9" == 57

That's what this proposal wants: string syntax for a subset of function 
calls.

Don't say that this proposal won't be abused. Every one of the OP's 
motivating examples is an abuse of the syntax, returning non-strings 
from something that looks like a string.



-- 
Steven
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BCIIWV2KMETDPB7M2OUMXRXK6A6CVHGJ/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to