[Python-ideas] Re: Idea: Tagged strings in python
collections.UserString can take away a lot of this boilerplate pain from user defined str subclasses. On Sun, Dec 18, 2022 at 7:28 PM Steven D'Aprano wrote: > On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote: > > > However, if you want to allow these types to possibly *do* something with > > the strings inside (validate them, canonicalize them, do a security > check, > > etc), I think I like the other way: > > > > #2 > > > > class html(str): pass > > class css(str): pass > > The problem with this is that the builtins are positively hostile to > subclassing. The issue is demonstrated with this toy example: > > class mystr(str): > def method(self): > return 1234 > > s = mystr("hello") > print(s.method()) # This is fine. > print(s.upper().method()) # This is not. > > > To be useable, we have to override every string method that returns a > string. Including dunders. So your class becomes full of tedious boiler > plate: > > def upper(self): > return type(self)(super().upper()) > def lower(self): > return type(self)(super().lower()) > def casefold(self): > return type(self)(super().casefold()) > # Plus another 29 or so methods > > This is not just tedious and error-prone, but it is inefficient: calling > super returns a regular string, which then has to be copied as a > subclassed string and the original garbage collected. > > > -- > Steve > ___ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/ > Code of Conduct: http://python.org/psf/codeofconduct/ > ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/4RIQ65SHYK3T2KZ2XKOPD45KH2SOFQFI/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
On Sun, Dec 18, 2022 at 8:29 PM Steven D'Aprano wrote: > > However, if you want to allow these types to possibly *do* something with > > the strings inside (validate them, canonicalize them, do a security > check, > > etc), I think I like the other way: > > class html(str): pass > > class css(str): pass > > The problem with this is that the builtins are positively hostile to > subclassing. The issue is demonstrated with this toy example: > > class mystr(str): > def method(self): > return 1234 > > s = mystr("hello") > print(s.method()) # This is fine. > print(s.upper().method()) # This is not. > I'd agree to "limited", but not "hostile." Look at the suggestions I mentioned: validate, canoncialize, security check. All of those are perfectly fine in `.__new__()`. E.g.: In [1]: class html(str): ...: def __new__(cls, s): ...: if not "<" in s: ...: raise ValueError("That doesn't look like HTML") ...: return str.__new__(cls, s) In [2]: html("Hello") In [3]: html("Hello") --- ValueErrorTraceback (most recent call last) in > 1 html("Hello") in __new__(cls, s) 2 def __new__(cls, s): 3 if not "<" in s: > 4 raise ValueError("That doesn't look like HTML") 5 ValueError: That doesn't look like HTML I readily acknowledge that's not a very thorough validator :-). But this much (say with a better validator) gets you static type checking, syntax highlighting, and inherent documentation of intent. I know that lots of things one can do with a str subclass wind up producing a str instead. But if the thing you do is just "make sure it is created as the right kind of thing for static checking and editor assistance, I don't care about any of that falling back. -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/AEQCVTJ2ABFQSQHWM62JOJQJI6UU675Y/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
On Mon, 19 Dec 2022 at 12:29, Steven D'Aprano wrote: > The problem with this is that the builtins are positively hostile to > subclassing. The issue is demonstrated with this toy example: > > class mystr(str): > def method(self): > return 1234 > > s = mystr("hello") > print(s.method()) # This is fine. > print(s.upper().method()) # This is not. > "Hostile"? I dispute that. Are you saying that every method on a string has to return something of the same type as self, rather than a vanilla string? Because that would be far MORE hostile to other types of string subclass: >>> import dataclasses >>> from enum import StrEnum >>> class Demo(StrEnum): ... x = "eggs" ... m = "ham" ... >>> Demo.x >>> isinstance(Demo.x, str) True >>> Demo.x.upper() 'EGGS' >>> Demo.m + " and " + Demo.x 'ham and eggs' Demo.x is a string. Which means that, unless there's good reason to do otherwise, it should behave as a string. So it should be possible to use it as if it were the string "eggs", including appending it to something, appending something to it, uppercasing it, etc, etc, etc. So what should happen if you do these kinds of manipulations? Should attempting to use a string in a normal string context raise ValueError? >>> Demo("ham and eggs") Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.12/enum.py", line 726, in __call__ return cls.__new__(cls, value) ^^^ File "/usr/local/lib/python3.12/enum.py", line 1121, in __new__ raise ve_exc ValueError: 'ham and eggs' is not a valid Demo I would say that *that* would count as "positively hostile to subclassing". ChrisA ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HCXWIKZ47LI7UIESEYAP63TP2CGWHR5O/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
On Sun, Dec 18, 2022 at 07:38:06PM -0500, David Mertz, Ph.D. wrote: > However, if you want to allow these types to possibly *do* something with > the strings inside (validate them, canonicalize them, do a security check, > etc), I think I like the other way: > > #2 > > class html(str): pass > class css(str): pass The problem with this is that the builtins are positively hostile to subclassing. The issue is demonstrated with this toy example: class mystr(str): def method(self): return 1234 s = mystr("hello") print(s.method()) # This is fine. print(s.upper().method()) # This is not. To be useable, we have to override every string method that returns a string. Including dunders. So your class becomes full of tedious boiler plate: def upper(self): return type(self)(super().upper()) def lower(self): return type(self)(super().lower()) def casefold(self): return type(self)(super().casefold()) # Plus another 29 or so methods This is not just tedious and error-prone, but it is inefficient: calling super returns a regular string, which then has to be copied as a subclassed string and the original garbage collected. -- Steve ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/O7PU5FLLGNR7IR2V667LDPBBOEXF5NFU/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
Using a typing approach sounds like a fantastic idea. Moreover, as Stephen showed, it's easy to make Emacs utilize that, and as I showed, it's easy to make vim follow that. I've only written one tiny VS Code extension, but it wouldn't be hard there either. I'm not sure how one adds stuff to PyCharm and other editors, but I have to believe it's possible. So I see two obvious approaches, both of which 100% fulfill Emil's hope without new syntax: #1 from typing import NewType html = NewType("html", str) css = NewType("css", str) a: html = html("Hello world") b: css = css("h1 { color: #99; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) # <- good combine(b, a) # <- bad However, if you want to allow these types to possibly *do* something with the strings inside (validate them, canonicalize them, do a security check, etc), I think I like the other way: #2 class html(str): pass class css(str): pass a: html = html("Hello world") b: css = css("h1 { color: #99; }") def combine(h: html, c: css): print(f"Combined page elements: {h} | {c}") combine(a, b) combine(b, a) The type annotations in the assignment lines are optional, but if you're doing something other than just creating an instance of the (pseudo-)type, they might add something. They might also be what your text editor decides to use as its marker. For either version, type analysis will find a problem. If I hadn't matched the types in the assignment, it would detect extra problems: (py3.11) 1310-scratch % mypy tagged_types1.py tagged_types1.py:13: error: Argument 1 to "combine" has incompatible type "css"; expected "html" [arg-type] tagged_types1.py:13: error: Argument 2 to "combine" has incompatible type "html"; expected "css" [arg-type] Found 2 errors in 1 file (checked 1 source file) Using typing.Annotated can also be used, but it solves a slightly different problem. On Sun, Dec 18, 2022 at 5:24 PM Paul Moore wrote: > On Sun, 18 Dec 2022 at 21:42, Christopher Barker > wrote: > >> On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. >> wrote: >> >>> In general, I find any proposal to change Python "because then my text >>> editor would need to >>> change to accommodate the language" to be unconvincing. >>> >> >> Personally, I’m skeptical of any proposal to change Python to make it >> easier for IDEs. >> >> But there *may* be other good reasons to do something like this. I’m not >> a static typing guy, but it segg do me that it could be useful to subtype >> strings: >> >> This function expects an SQL string. >> >> This function returns an SQL string. >> >> Maybe not worth the overhead, but worth more than giving IDEs hints SATO >> what to do. >> > > I believe typing has annotated types that could do this. > Paul > -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/XCACWMITDR5YNBICCNONLUGZUYC3NFRV/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
On Sun, 18 Dec 2022 at 21:42, Christopher Barker wrote: > On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. > wrote: > >> In general, I find any proposal to change Python "because then my text >> editor would need to >> change to accommodate the language" to be unconvincing. >> > > Personally, I’m skeptical of any proposal to change Python to make it > easier for IDEs. > > But there *may* be other good reasons to do something like this. I’m not a > static typing guy, but it segg do me that it could be useful to subtype > strings: > > This function expects an SQL string. > > This function returns an SQL string. > > Maybe not worth the overhead, but worth more than giving IDEs hints SATO > what to do. > I believe typing has annotated types that could do this. Paul ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/U22UUM7J22IKDQCQTMHW27AISQ2H2YOY/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
On Sun, Dec 18, 2022 at 9:48 AM David Mertz, Ph.D. wrote: > In general, I find any proposal to change Python "because then my text > editor would need to > change to accommodate the language" to be unconvincing. > Personally, I’m skeptical of any proposal to change Python to make it easier for IDEs. But there *may* be other good reasons to do something like this. I’m not a static typing guy, but it segg do me that it could be useful to subtype strings: This function expects an SQL string. This function returns an SQL string. Maybe not worth the overhead, but worth more than giving IDEs hints SATO what to do. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UUDANPFKWV66IN3DXGTS3VQ6A7XY6YIX/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Re: Idea: Tagged strings in python
Well, obviously I have to come to the defense of vim as well :-). I'm not sure what year vim got the capability, but I suspect around as long as emacs. This isn't for exactly the same language use case, but finding a quick example on the internet: unlet b:current_syntaxsyntax include @srcBash syntax/bash.vim syntax region srcBashHi start="..." end="..." keepend contains=@srcBash unlet b:current_syntaxsyntax include @srcHTML syntax/html.vim syntax region srcHTMLHi start="^...$" end="^...$" keepend contains=@srcHTML This is easy to adapt to either the named function convention: `html('Hello')` or to the standardized-comment convention. In general, I find any proposal to change Python "because then my text editor would need to change to accommodate the language" to be unconvincing. ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6PMUCHFX6FG2IT2VHANPGSPX4GNBJAII/ Code of Conduct: http://python.org/psf/codeofconduct/
[Python-ideas] Idea: Tagged strings in python
e...@emilstenstrom.se writes: > Seems simple enough, right? The problem is: There's no syntax > highlighting in my code editor for the three other languages. Then you're not using Emacs's mmm-mode, which has been available for a couple of decades. Now, mmm-mode doesn't solve the whole problem -- it doesn't know anything about how the languages are tagged. But this isn't a problem for an Emacs shop, the team decides on a convention (or recognizes a third party's convention), and somebody will code up the 5-line function that font-lock (syntax highlighter in Emacs) uses to dispatch to the appropriate the syntax highlighting mode. AFAICS this requires either all editors become Emacs ;-) or all editor maintainers get together and agree on the tags (this will need to be extensible, there are a lot of languages out there, and some editors will want to distinguish languages by version to flag syntax invalid in older versions). Is this really going to happen? Just for Python? When the traditional solution of separating different languages into different files is almost always acceptable? There are other uses proposed for tagged strings. In combination, perhaps this feature is worthwhile. But I think that on its own the multiple language highlighting application is pretty dubious given the limited benefit vs. the amount of complexity it will introduce not only in Python, but in editors as well. > This makes for a horrible developer experience, where you > constantly have to hunt for characters inside of strings. If this were a feature anyway, it would be very useful in certain situations (for example dynamic web pages), no question about it. But mixed-language files are not something I want to see in projects I work on -- and remember, I use Emacs, I have mmm-mode already. > If I instead use separate files, I get syntax highlighting and > auto-completion for each file, because editors set language based > on file type. This is problematic for your case. This means that the editor needs to change how it dispatches to syntax highlighting. Emacs, no problem, it already dispatches highlighting based on tagged regions of text. But are other editors going to *change* to do that? > But should I really have to choose? Most of the time, I'd say "yes", and you should choose multiple files. ;-) YMMV of course, but I really appreciate the separation of concerns that is provided by separate files for Python code, HTML templates, and (S)CSS presentation. > *Do we need a python language solution to this?* > Could the code editors fix this? There's a long issue thread for > vscode where this is discussed: > https://github.com/Microsoft/vscode/issues/1751 - The reasoning > (reasonable imho) is that this is not something that can be done > generally, but that it needs to be handled at the python vscode > extension level. Makes sense. Makes sense, yes -- that's how Emacs does it, but Emacs is *already* fundamentally designed on a model of implicitly tagged text. Parsing strings is already relatively hard because the begin marker is the same as the end marker. Now you need to tie it to the syntax highlighting mode, which may change over large regions of text every time you insert or delete a quotation mark or comment delimiter. You *can't* just hand it off to the Python highlighter, *every* syntax highlighter that might be used inside a Python string at least needs to know how to hand control back to Python. For one thing, they all need to learn about all four of Python's string delimiters. And it gets worse. I wonder how you end up with CSS and HTML inside Python strings? Yup, the CSS is inside a
[Python-ideas] Re: Idea: Tagged strings in python
dn wrote: > > Is this a problem with Python, or with the tool? > « > Language injections > Last modified: 14 December 2022 > Language injections let you work with pieces of code in other languages > embedded in your code. When you inject a language (such as HTML, CSS, > XML, RegExp, and so on) into a string literal, you get comprehensive > code assistance for editing that literal. > ... > » > https://www.jetbrains.com/help/pycharm/using-language-injections.html > Contains a specific example for Django scripters. > (sadly as an image - probably wouldn't be handled by this ListServer) I touched upon this solution in the original post. If all editors could agree to use # language=html it would be an ok solution. That API creates lots of ambiguity around to what the comment should be applied. Some examples which are non-obvious imho: "" # language=html " # language=html "" # language=html process_html("") # language=html concat_html("", "") > > If I instead use separate files, I get syntax highlighting and > > auto-completion for each file, because editors set language based on file > > type. But should I really have to choose? > > In other situations where files need to be collected together, a > data-archive may be used (not to be confused with any historical > context, nor indeed with data-compression). The point here is to have everything in one file, editable and syntax highlighted in that same file. I don't think this tip applies to that? ___ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TX35CCY4YLJEGWCODYHTWXWDM2SSANE4/ Code of Conduct: http://python.org/psf/codeofconduct/