As has been said, a builtin *could* be written that would be "friendly to
subclassing", by the definition in this thread. (I'll stay out of the
argument for the moment as to whether that would be better)
I suspect that the reason str acts like it does is that it was originally
written a LONG time ago, when you couldn't subclass basic built in types at
all.
Secondarily, it could be a performance tweak -- minimal memory and peak
performance are pretty critical for strings.
But collections.UserString does exist -- so if you want to subclass, and
performance isn't critical, then use that. Steven A pointed out that
UserStrings are not instances of str though. I think THAT is a bug. And
it's probably that way because with the magic of duck typing, no one cared
-- but with all the static type hinting going on now, that is a bigger
liability than it used to be. Also basue when it was written, you couldn't
subclass str.
Though I will note that run-time type checking of string is relatively
common compared to other types, due to the whole a-str-is-a-sequence-of-str
issue making the distinction between a sequence of strings and a string
itself is sometimes needed. And str is rarely duck typed.
If anyone actually has a real need for this I'd post an issue -- it'd be
interesting if the core devs see this as a bug or a feature (well, probably
not feature, but maybe missing feature)
OK -- I got distracted and tried it out -- it was pretty easy to update
UserString to be a subclass of str. I suspect it isn't done that way now
because it was originally written because you could not subclass str -- so
it stored an internal str instead.
The really hacky part of my prototype is this:
# self.data is the original attribute for storing the string internally.
Partly to prevent my having to re-write all the other methods, and partly
because you get recursion if you try to use the methods on self when
overriding them ...
@property
def data(self):
return "".join(self)
The "".join is because it was the only way I quickly thought of to make a
native string without invoking the __str__ method and other initialization
machinery. I wonder if there is another way? Certainly there is in C, but
in pure Python?
Anyway, after I did that and wrote a __new__ -- the rest of it "just
worked".
def __new__(cls, s):
return super().__new__(cls, s)
UserString and its subclasses return instances of themselves, and instances
are instances of str.
Code with a couple asserts in the __main__ block enclosed.
Enjoy!
-CHB
NOTE: VERY minimally tested :-)
On Tue, Dec 20, 2022 at 4:17 PM Chris Angelico <[email protected]> wrote:
> On Wed, 21 Dec 2022 at 09:30, Cameron Simpson <[email protected]> wrote:
> >
> > On 19Dec2022 22:45, Chris Angelico <[email protected]> wrote:
> > >On Mon, 19 Dec 2022 at 22:37, Steven D'Aprano <[email protected]>
> wrote:
> > >> > But this much (say with a better validator) gets you static type
> checking,
> > >> > syntax highlighting, and inherent documentation of intent.
> > >>
> > >> Any half-way decent static type-checker will immediately fail as soon
> as
> > >> you call a method on this html string, because it will know that the
> > >> method returns a vanilla string, not a html string.
> > >
> > >But what does it even mean to uppercase an HTML string? Unless you
> > >define that operation specifically, the most logical meaning is
> > >"convert it into a plain string, and uppercase that".
> >
> > Yes, this was my thought. I've got a few subclasses of builtin types.
> > They are not painless.
> >
> > For HTML "uppercase" is a kind of ok notion because the tags are case
> > insensitive.
>
> Tag names are, but their attributes might not be, so even that might
> not be safe.
>
> > Notthe case with, say, XML - my personal nagging example is
> > from KML (Google map markup dialect) where IIRC a "ScreenOverlay" and a
> > "screenoverlay" both existing with different semantics. Ugh.
>
> Ugh indeed. Why? Why? Why?
>
> > So indeed, I'd probably _want_ .upper to return a plain string and have
> > special methods to do more targetted things as appropriate.
> >
>
> Agreed.
>
> ChrisA
> _______________________________________________
> Python-ideas mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/T7FZ3FIA6INMHQIRVZ3ZZJC6UAQQCFOI/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
--
Christopher Barker, PhD (Chris)
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
"""
A UserString implementation that subclasses from str
so instances of it and its subclasses are instances of string
-- could be handy for using with static typing.
NOTE: this could probably be cleaner code, but this was done with
an absolute minimum of changes from what's in the standard library
"""
import sys as _sys
class UserString(str):
def __new__(cls, s):
return super().__new__(cls, s)
# There's no need for this logic in __init__
# def __init__(self, seq):
# if isinstance(seq, str):
# self.data = seq
# elif isinstance(seq, UserString):
# self.data = seq.data[:]
# else:
# self.data = str(seq)
@property
def data(self):
return "".join(self)
def __str__(self):
return str(self.data)
def __repr__(self):
return repr(self.data)
def __int__(self):
return int(self.data)
def __float__(self):
return float(self.data)
def __complex__(self):
return complex(self.data)
def __hash__(self):
return hash(self.data)
def __getnewargs__(self):
return (self.data[:],)
def __eq__(self, string):
if isinstance(string, UserString):
return self.data == string.data
return self.data == string
def __lt__(self, string):
if isinstance(string, UserString):
return self.data < string.data
return self.data < string
def __le__(self, string):
if isinstance(string, UserString):
return self.data <= string.data
return self.data <= string
def __gt__(self, string):
if isinstance(string, UserString):
return self.data > string.data
return self.data > string
def __ge__(self, string):
if isinstance(string, UserString):
return self.data >= string.data
return self.data >= string
def __contains__(self, char):
if isinstance(char, UserString):
char = char.data
return char in self.data
def __len__(self):
return len(self.data)
def __getitem__(self, index):
return self.__class__(self.data[index])
def __add__(self, other):
if isinstance(other, UserString):
return self.__class__(self.data + other.data)
elif isinstance(other, str):
return self.__class__(self.data + other)
return self.__class__(self.data + str(other))
def __radd__(self, other):
if isinstance(other, str):
return self.__class__(other + self.data)
return self.__class__(str(other) + self.data)
def __mul__(self, n):
return self.__class__(self.data * n)
__rmul__ = __mul__
def __mod__(self, args):
return self.__class__(self.data % args)
def __rmod__(self, template):
return self.__class__(str(template) % self)
# the following methods are defined in alphabetical order:
def capitalize(self):
return self.__class__(self.data.capitalize())
def casefold(self):
return self.__class__(self.data.casefold())
def center(self, width, *args):
return self.__class__(self.data.center(width, *args))
def count(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.count(sub, start, end)
def removeprefix(self, prefix, /):
if isinstance(prefix, UserString):
prefix = prefix.data
return self.__class__(self.data.removeprefix(prefix))
def removesuffix(self, suffix, /):
if isinstance(suffix, UserString):
suffix = suffix.data
return self.__class__(self.data.removesuffix(suffix))
def encode(self, encoding='utf-8', errors='strict'):
encoding = 'utf-8' if encoding is None else encoding
errors = 'strict' if errors is None else errors
return self.data.encode(encoding, errors)
def endswith(self, suffix, start=0, end=_sys.maxsize):
return self.data.endswith(suffix, start, end)
def expandtabs(self, tabsize=8):
return self.__class__(self.data.expandtabs(tabsize))
def find(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.find(sub, start, end)
def format(self, /, *args, **kwds):
return self.data.format(*args, **kwds)
def format_map(self, mapping):
return self.data.format_map(mapping)
def index(self, sub, start=0, end=_sys.maxsize):
return self.data.index(sub, start, end)
def isalpha(self):
return self.data.isalpha()
def isalnum(self):
return self.data.isalnum()
def isascii(self):
return self.data.isascii()
def isdecimal(self):
return self.data.isdecimal()
def isdigit(self):
return self.data.isdigit()
def isidentifier(self):
return self.data.isidentifier()
def islower(self):
return self.data.islower()
def isnumeric(self):
return self.data.isnumeric()
def isprintable(self):
return self.data.isprintable()
def isspace(self):
return self.data.isspace()
def istitle(self):
return self.data.istitle()
def isupper(self):
return self.data.isupper()
def join(self, seq):
return self.data.join(seq)
def ljust(self, width, *args):
return self.__class__(self.data.ljust(width, *args))
def lower(self):
return self.__class__(self.data.lower())
def lstrip(self, chars=None):
return self.__class__(self.data.lstrip(chars))
maketrans = str.maketrans
def partition(self, sep):
return self.data.partition(sep)
def replace(self, old, new, maxsplit=-1):
if isinstance(old, UserString):
old = old.data
if isinstance(new, UserString):
new = new.data
return self.__class__(self.data.replace(old, new, maxsplit))
def rfind(self, sub, start=0, end=_sys.maxsize):
if isinstance(sub, UserString):
sub = sub.data
return self.data.rfind(sub, start, end)
def rindex(self, sub, start=0, end=_sys.maxsize):
return self.data.rindex(sub, start, end)
def rjust(self, width, *args):
return self.__class__(self.data.rjust(width, *args))
def rpartition(self, sep):
return self.data.rpartition(sep)
def rstrip(self, chars=None):
return self.__class__(self.data.rstrip(chars))
def split(self, sep=None, maxsplit=-1):
return self.data.split(sep, maxsplit)
def rsplit(self, sep=None, maxsplit=-1):
return self.data.rsplit(sep, maxsplit)
def splitlines(self, keepends=False):
return self.data.splitlines(keepends)
def startswith(self, prefix, start=0, end=_sys.maxsize):
return self.data.startswith(prefix, start, end)
def strip(self, chars=None):
return self.__class__(self.data.strip(chars))
def swapcase(self):
return self.__class__(self.data.swapcase())
def title(self):
return self.__class__(self.data.title())
def translate(self, *args):
return self.__class__(self.data.translate(*args))
def upper(self):
return self.__class__(self.data.upper())
def zfill(self, width):
return self.__class__(self.data.zfill(width))
if __name__ == "__main__":
# make sure it works, at least a little
us = UserString("something")
assert isinstance(us, UserString)
assert isinstance(us, str)
us_upper = us.upper()
assert isinstance(us_upper, UserString)
assert isinstance(us_upper, str)
# try subclassing
class SpecialString(UserString):
def special(self):
return "Special" + self
ss = SpecialString("something")
assert isinstance(ss, SpecialString)
assert isinstance(ss, UserString)
assert isinstance(ss, str)
ss_upper = ss.upper()
assert isinstance(ss_upper, SpecialString)
assert isinstance(ss_upper, UserString)
assert isinstance(ss_upper, str)
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/I62E7PVP5NN3KYYKFOW5OUKJRQSKNL4T/
Code of Conduct: http://python.org/psf/codeofconduct/