On Sat, Sep 19, 2020 at 12:10 PM Wes Turner <wes.tur...@gmail.com> wrote:
> Regex uses the ? symbol to indicate that something is a "non-greedy" match > (to default to "shortest match") > exactly -- Regex was designed to be a parsing language, format specifiers were not. I'm quite surprised by how little the parse package has had to adapt the format language to a parsing language, but it has indeed adapted it. I'm honestly not sure how confusing that would be to have a built in parsing language that looks like the format one, but behaves differently. I suspect it's particularly an issue if we did assigning to fstrings, and less so if it were a string method or stand alone function. Trying parse with my earlier example in this thread: In [1]: x, y, z = 23, 45, 67 In [2]: a_string = f"{x}{y}{z}" In [3]: a_string Out[3]: '234567' In [4]: from parse import parse In [5]: parse("{x}{y}{z}", a_string) Out[5]: <Result () {'x': '2', 'y': '3', 'z': '4567'}> In [6]: parse("{x:d}{y:d}{z:d}", a_string) Out[6]: <Result () {'x': 2345, 'y': 6, 'z': 7}> So that's interesting -- different level of "greadiness" for strings than integers In [7]: parse("{x:2d}{y:2d}{z:2d}", a_string) Out[7]: <Result () {'x': 23, 'y': 45, 'z': 67}> And now we get back what we started with -- not bad. I'm liking this -- I think it would be good to have parse, or something like in, in the stdlib, maybe as a string method. Then maybe consider some auto-assigning behavior -- though I'm pretty sceptical of that, and Wes' point about debugging is a good one. It would create a real debugging / testing nightmare to have stuff auto-assigned into locals. -CHB > import re > str_ = "a:b:c" > assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") > assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c") > > Typically, debugging parsing issues involves testing the output of a > function (not changes to locals()). > > Parse defaults to (case-insensitive) non-greedy/shortest-match: > > > parse() will always match the shortest text necessary (from left to > right) to fulfil the parse pattern, so for example: > > > >>> pattern = '{dir1}/{dir2}' > > >>> data = 'root/parent/subdir' > > >>> sorted(parse(pattern, data).named.items()) > > [('dir1', 'root'), ('dir2', 'parent/subdir')] > > > So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit > the pattern, the actual match represents the shortest successful match for > dir1. > > https://github.com/r1chardj0n3s/parse#potential-gotchas > > https://github.com/r1chardj0n3s/parse#format-specification : > > > Note: attempting to match too many datetime fields in a single parse() > will currently result in a resource allocation issue. A TooManyFields > exception will be raised in this instance. The current limit is about 15. > It is hoped that this limit will be removed one day. > > > On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < > python-ideas@python.org> wrote: > >> Parsing can be ambiguous: >> f"{x}:{y}" = "a:b:c" >> Does this set >> x = "a" >> y = "b:c" >> or >> x = "a:b" >> y = "c" >> Rob Cliffe >> >> On 17/09/2020 05:52, Dennis Sweeney wrote: >> > TL;DR: I propose the following behavior: >> > >> > >>> s = "She turned me into a newt." >> > >>> f"She turned me into a {animal}." = s >> > >>> animal >> > 'newt' >> > >> > >>> f"A {animal}?" = s >> > Traceback (most recent call last): >> > File "<pyshell#2>", line 1, in <module> >> > f"A {animal}?" = s >> > ValueError: f-string assignment target does not match 'She turned >> me into a newt.' >> > >> > >>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" >> > >>> hh, mm, ss >> > (11, 59, 59) >> > >> > === Rationale === >> > >> > Part of the reason I like f-strings so much is that they reduce the >> > cognitive overhead of reading code: they allow you to see *what* is >> > being inserted into a string in a way that also effortlessly shows >> > *where* in the string the value is being inserted. There is no need to >> > "paint-by-numbers" and remember which variable is {0} and which is {1} >> > in an unnecessary extra layer of indirection. F-strings allow string >> > formatting that is not only intelligible, but *locally* intelligible. >> > >> > What I propose is the inverse feature, where you can assign a string >> > to an f-string, and the interpreter will maintain an invariant kept >> > in many other cases: >> > >> > >>> a[n] = 17 >> > >>> a[n] == 17 >> > True >> > >> > >>> obj.x = "foo" >> > >>> obj.x == "foo" >> > True >> > >> > # Proposed: >> > >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" >> > >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" >> > True >> > >>> hh >> > '11' >> > >> > This could be thought of as analogous to the c language's scanf >> > function, something I've always felt was just slightly lacking in >> > Python. I think such a feature would more clearly allow readers of >> > Python code to answer the question "What kinds of strings are allowed >> > here?". It would add certainty to programs that accept strings, >> > confirming early that the data you have is the data you want. >> > The code reads like a specification that beginners can understand in >> > a blink. >> > >> > >> > === Existing way of achieving this === >> > >> > As of now, you could achieve the behavior with regular expressions: >> > >> > >>> import re >> > >>> pattern = re.compile(r'It is (.+):(.+) (.+)') >> > >>> match = pattern.fullmatch("It is 11:45 PM") >> > >>> hh, mm, am_or_pm = match.groups() >> > >>> hh >> > '11' >> > >> > But this suffers from the same paint-by-numbers, extra-indirection >> > issue that old-style string formatting runs into, an issue that >> > f-strings improve upon. >> > >> > You could also do a strange mishmash of built-in str operations, like >> > >> > >>> s = "It is 11:45 PM" >> > >>> empty, rest = s.split("It is ") >> > >>> assert empty == "" >> > >>> hh, rest = rest.split(":") >> > >>> mm, am_or_pm = s.split(" ") >> > >>> hh >> > '11' >> > >> > But this is 5 different lines to express one simple idea. >> > How many different times have you written a micro-parser like this? >> > >> > >> > === Specification (open to bikeshedding) === >> > >> > In general, the goal would be to pursue the assignment-becomes-equal >> > invariant above. By default, assignment targets within f-strings would >> > be matched as strings. However, adding in a format specifier would >> > allow the matches to be evaluated as different data types, e.g. >> > f'{foo:d}' = "1" would make foo become the integer 1. If a more complex >> > format specifier was added that did not match anything that the >> > f-string could produce as an expression, then we'd still raise a >> > ValueError: >> > >> > >>> f"{x:.02f}" = "0.12345" >> > Traceback (most recent call last): >> > File "<pyshell#2>", line 1, in <module> >> > f"{x:.02f}" = "0.12345" >> > ValueError: f-string assignment target does not match '0.12345' >> > >> > If we're feeling adventurous, one could turn the !r repr flag in a >> > match into an eval() of the matched string. >> > >> > The f-string would match with the same eager semantics as regular >> > expressions, backtracking when a match is not made on the first >> > attempt. >> > >> > Let me know what you think! >> > _______________________________________________ >> > Python-ideas mailing list -- python-ideas@python.org >> > To unsubscribe send an email to python-ideas-le...@python.org >> > https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> > Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKODAK5MCO2HHUF4555JZPZ6SKNEC/ >> > Code of Conduct: http://python.org/psf/codeofconduct/ >> _______________________________________________ >> Python-ideas mailing list -- python-ideas@python.org >> To unsubscribe send an email to python-ideas-le...@python.org >> https://mail.python.org/mailman3/lists/python-ideas.python.org/ >> Message archived at >> https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5MEEUV2HPP4QOSZQDGQ6CWAXCY7/ >> Code of Conduct: http://python.org/psf/codeofconduct/ >> > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/HFNRY3HB4CJXPKOX6ZXBPZ7V2TZ3O4FY/ > Code of Conduct: http://python.org/psf/codeofconduct/ > -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/BUZPGEC4EESBBVBAIV5G4RJ7SUED4XCX/ Code of Conduct: http://python.org/psf/codeofconduct/