Regex uses the ? symbol to indicate that something is a "non-greedy" match (to default to "shortest match")
import re str_ = "a:b:c" assert re.match(r'(.*):(.*)', str_).groups() == ("a:b", "c") assert re.match(r'(.*?):(.*)', str_).groups() == ("a", "b:c") Typically, debugging parsing issues involves testing the output of a function (not changes to locals()). Parse defaults to (case-insensitive) non-greedy/shortest-match: > parse() will always match the shortest text necessary (from left to right) to fulfil the parse pattern, so for example: > >>> pattern = '{dir1}/{dir2}' > >>> data = 'root/parent/subdir' > >>> sorted(parse(pattern, data).named.items()) > [('dir1', 'root'), ('dir2', 'parent/subdir')] > So, even though {'dir1': 'root/parent', 'dir2': 'subdir'} would also fit the pattern, the actual match represents the shortest successful match for dir1. https://github.com/r1chardj0n3s/parse#potential-gotchas https://github.com/r1chardj0n3s/parse#format-specification : > Note: attempting to match too many datetime fields in a single parse() will currently result in a resource allocation issue. A TooManyFields exception will be raised in this instance. The current limit is about 15. It is hoped that this limit will be removed one day. On Sat, Sep 19, 2020, 1:00 PM Rob Cliffe via Python-ideas < python-ideas@python.org> wrote: > Parsing can be ambiguous: > f"{x}:{y}" = "a:b:c" > Does this set > x = "a" > y = "b:c" > or > x = "a:b" > y = "c" > Rob Cliffe > > On 17/09/2020 05:52, Dennis Sweeney wrote: > > TL;DR: I propose the following behavior: > > > > >>> s = "She turned me into a newt." > > >>> f"She turned me into a {animal}." = s > > >>> animal > > 'newt' > > > > >>> f"A {animal}?" = s > > Traceback (most recent call last): > > File "<pyshell#2>", line 1, in <module> > > f"A {animal}?" = s > > ValueError: f-string assignment target does not match 'She turned > me into a newt.' > > > > >>> f"{hh:d}:{mm:d}:{ss:d}" = "11:59:59" > > >>> hh, mm, ss > > (11, 59, 59) > > > > === Rationale === > > > > Part of the reason I like f-strings so much is that they reduce the > > cognitive overhead of reading code: they allow you to see *what* is > > being inserted into a string in a way that also effortlessly shows > > *where* in the string the value is being inserted. There is no need to > > "paint-by-numbers" and remember which variable is {0} and which is {1} > > in an unnecessary extra layer of indirection. F-strings allow string > > formatting that is not only intelligible, but *locally* intelligible. > > > > What I propose is the inverse feature, where you can assign a string > > to an f-string, and the interpreter will maintain an invariant kept > > in many other cases: > > > > >>> a[n] = 17 > > >>> a[n] == 17 > > True > > > > >>> obj.x = "foo" > > >>> obj.x == "foo" > > True > > > > # Proposed: > > >>> f"It is {hh}:{mm} {am_or_pm}" = "It is 11:45 PM" > > >>> f"It is {hh}:{mm} {am_or_pm}" == "It is 11:45 PM" > > True > > >>> hh > > '11' > > > > This could be thought of as analogous to the c language's scanf > > function, something I've always felt was just slightly lacking in > > Python. I think such a feature would more clearly allow readers of > > Python code to answer the question "What kinds of strings are allowed > > here?". It would add certainty to programs that accept strings, > > confirming early that the data you have is the data you want. > > The code reads like a specification that beginners can understand in > > a blink. > > > > > > === Existing way of achieving this === > > > > As of now, you could achieve the behavior with regular expressions: > > > > >>> import re > > >>> pattern = re.compile(r'It is (.+):(.+) (.+)') > > >>> match = pattern.fullmatch("It is 11:45 PM") > > >>> hh, mm, am_or_pm = match.groups() > > >>> hh > > '11' > > > > But this suffers from the same paint-by-numbers, extra-indirection > > issue that old-style string formatting runs into, an issue that > > f-strings improve upon. > > > > You could also do a strange mishmash of built-in str operations, like > > > > >>> s = "It is 11:45 PM" > > >>> empty, rest = s.split("It is ") > > >>> assert empty == "" > > >>> hh, rest = rest.split(":") > > >>> mm, am_or_pm = s.split(" ") > > >>> hh > > '11' > > > > But this is 5 different lines to express one simple idea. > > How many different times have you written a micro-parser like this? > > > > > > === Specification (open to bikeshedding) === > > > > In general, the goal would be to pursue the assignment-becomes-equal > > invariant above. By default, assignment targets within f-strings would > > be matched as strings. However, adding in a format specifier would > > allow the matches to be evaluated as different data types, e.g. > > f'{foo:d}' = "1" would make foo become the integer 1. If a more complex > > format specifier was added that did not match anything that the > > f-string could produce as an expression, then we'd still raise a > > ValueError: > > > > >>> f"{x:.02f}" = "0.12345" > > Traceback (most recent call last): > > File "<pyshell#2>", line 1, in <module> > > f"{x:.02f}" = "0.12345" > > ValueError: f-string assignment target does not match '0.12345' > > > > If we're feeling adventurous, one could turn the !r repr flag in a > > match into an eval() of the matched string. > > > > The f-string would match with the same eager semantics as regular > > expressions, backtracking when a match is not made on the first > > attempt. > > > > Let me know what you think! > > _______________________________________________ > > Python-ideas mailing list -- python-ideas@python.org > > To unsubscribe send an email to python-ideas-le...@python.org > > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/JEGSKODAK5MCO2HHUF4555JZPZ6SKNEC/ > > Code of Conduct: http://python.org/psf/codeofconduct/ > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/CVPRH5MEEUV2HPP4QOSZQDGQ6CWAXCY7/ > Code of Conduct: http://python.org/psf/codeofconduct/ >
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/HFNRY3HB4CJXPKOX6ZXBPZ7V2TZ3O4FY/ Code of Conduct: http://python.org/psf/codeofconduct/