Steven D'Aprano writes: > On Sat, Jan 07, 2023 at 10:48:48AM -0800, Peter Ludemann wrote: > > You can get almost the same result using pattern matching. For example, > > your > > "foo:bar;baz".partition(":", ";") > > can be done by a well-known matching idiom: > > re.match(r'([^:]*):([^;]*);(.*)', 'foo:bar;baz').groups() > > "Well-known" he says :-)
It *is* well-known to those who know. Just because you don't like regex doesn't mean it's not well-known. I wouldn't use that idiom though; I'd use an explicit character class in most cases I encounter. > I think that the regex solution is also wrong because it requires you > to know *exactly* what order the separators are found in the source > string. But that's characteristic of many examples. In "structured" mail headers like Content-Type, you want the separators to come in the order ':', '=', ';'. In a URI scheme with an authority component, you want them in the order '@', ':'. Except that you don't, in both those examples. In Content-Type, the '=' is optional, and there may be multiple ';'. In authority, the existing ':' is optional, and there's an optional ':' to separate password from username before the '@'. And it gets worse: in the authority case, the username is optional. In the common case of anonymous access, the username is omitted, so user, _, domain = "example.com".partition('@') does the wrong thing! > If we swap the semi-colon and the colon in the source, but not > the pattern, the idiom fails: > > >>> re.match(r'([^:]*):([^;]*);(.*)', 'foo;bar:baz').groups() > Traceback (most recent call last): > File "<stdin>", line 1, in <module> > AttributeError: 'NoneType' object has no attribute 'groups' > > So that makes it useless for the case where you want to split of any of > a number of separators, but don't know which order they occur in. Examples where the order of separators doesn't matter? In most of the examples I need, swapping order is a parse error. > You call it "almost the same result" but it is nothing like the result > from partition. The separators are lost, Trivial to fix, just add parens, in the simpler grouping form as a bonus! I'm not asking you to like the resulting regexp better, just pointing out that your dislike of regex is driving the discussion in unprofitable directions. > and it splits the string all at once instead of one split per call. So does the original proposal, that's part of the point of it, I think. I really don't see any of the variations on the proposal as a particularly valuable addition. It's already easy to screw up your parse with str.partition (the authority example: although you can fix the order problem with '@' by using str.rpartition, the multiple optional ':' mean that whichever r?partition you use, you can get it wrong unless you check the order of '@' and ':', so you have to use a recursive parse, not a sequential parse). But you can write a regex version of authority to give a sequence of tokens rather than a parse, and you convert that into a parse by checking each element of the sequence for None in a deterministic order. I prefer the latter approach (Emacs user since Emacs was programmed in TECO), but as long as you allow me to use regex for character classes and sequences, I can live with retrictions on use of regex in the style guide. Parsing is hard. Both regex and r?partition are best used as low- level tools for tokenizing, and you're asking for trouble if you try to use them for parsing past a certain point. My breaking point for regex is somewhere around the authority example, but I wouldn't push back if my project's style guide said to to break that up. I *would* however often prefer regexp to r?partition because it would allow character classes, and in most of the areas I work with (mail, URIs, encodings) being able to detect lexical errors by using character classes is helpful. And I would prefer "one bite per call" partition to a partition at multiple points. Where I'm being pretty fuzzy, the .split methods are fine. -- Yet another Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/VDZQVHGUPAOUCPL4HPAXFTQPNAHNJZIK/ Code of Conduct: http://python.org/psf/codeofconduct/