On Sat, Aug 31, 2019 at 09:31:15PM +1000, Chris Angelico wrote: > On Sat, Aug 31, 2019 at 8:44 PM Steven D'Aprano <st...@pearwood.info> wrote: > > > So b"abc" should not be allowed? > > > > In what way are byte-STRINGS not strings? Unicode-strings and > > byte-strings share a significant fraction of their APIs, and are so > > similar that back in Python 2.2 the devs thought it was a good idea to > > try automagically coercing from one to the other. > > > > I was careful to write *string* rather than *str*. Sorry if that wasn't > > clear enough. > > > > We call it a string, but a bytes object has as much in common with > bytearray and with a list of integers as it does with a text string.
I don't think that's true. py> b'abc'.upper() b'ABC' py> [1, 2, 3].upper() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'list' object has no attribute 'upper' Shall I beat this dead horse some more by listing the other 33 methods that byte-strings share with Unicode-strings but not lists? Compared to just two methods shared by all three of bytes, str and list, (namely count() and index()), and *zero* methods shared by bytes and list but not str. In Python2, byte-strings and Unicode strings were both subclasses of type basestring. Although we have moved away from that shared base class in Python3, it does demonstrate that conceptually bytes and str are closely related to each other. > Is the contents of a MIDI file a "string"? I would say no, it's not - > but it can *contain* strings, eg for metadata and lyrics. Don't confuse *human-readable native language strings* for generic strings. "Hello world!" is a string, but so are '&w-8\x02^xs\0' and b'DEADBEEF'. > You can't upper-case the > variable-length-integer b"\xe7\x61" any more than you can upper-case > the integer 13281. Of course you can. py> b"\xe7\x61".upper() b'\xe7A' Whether it is *meaningful* to do so is another question. But the same applies to str.upper: just because you can call the method doesn't mean that the result will be semantically valid. source = "def spam():\n\tpass\n" source = source.upper() # no longer valid Python source code. > Those common methods are mostly built on the > assumption that the string contains ASCII text. As they often do. If they don't, then don't call the text methods which don't make sense in context. Just as there are cases where text methods don't make sense on Unicode strings. You wouldn't want to call .casefold() on a password, or .lstrip() on a line of Python source code. [...] > Bytes and text have a long relationship, and as such, there are > special similarities. That doesn't mean that bytes ARE text, I didn't say that bytes are (human-readable) text. Although they can be: not every application needs Unicode strings, ASCII strings are still special, and there are still applications where once has to mix binary and ASCII text data. I said they were *strings*. Strings are not necessarily text, although they often are. Formally, a string is a finite sequence of symbols that are chosen from a set called an alphabet. See: https://en.wikipedia.org/wiki/String_%28computer_science%29 > I don't think it's necessary to be too adamant about "must be some > sort of thing-we-call-string" here. Let practicality rule, since > purity has already waved a white flag at us. It is because of *practicality* that we should prefer that things that look similar should be similar. Code is read far more often that it is written, and if you read two pieces of code that look similar, we should strongly prefer that they should actually be similar. Would you be happy with a Pythonesque language that used prefixed strings as the delimiter for arbitrary data types? mylist = L"1, 2, None, {}, L"", 99.5" mydict = D"key: value, None: L"", "abc": "xyz"" myset = S"1, 2, None" That's what this proposal wants: string syntax that can return arbitrary data types. How about using quotes for function calls? assert chr"9" == "\t" assert ord"9" == 57 That's what this proposal wants: string syntax for a subset of function calls. Don't say that this proposal won't be abused. Every one of the OP's motivating examples is an abuse of the syntax, returning non-strings from something that looks like a string. -- Steven _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BCIIWV2KMETDPB7M2OUMXRXK6A6CVHGJ/ Code of Conduct: http://python.org/psf/codeofconduct/