This being in `os.path` module suggests that the main intent is to find a common prefix of a `path`.
If this is the case, maybe it would be worth instead of: ``` if not isinstance(m[0], (list, tuple)): m = tuple(map(os.fspath, m)) ``` have ``` if not isinstance(m[0], (list, tuple)): m = [os.fspath(el).split(SEP) for el in m] … as now (from tests): ``` commonprefix([b"home:swenson:spam", b"home:swen:spam”]) -> b"home:swen" ``` , which is not a common prefix of a path. If my suggestion above is valid and it was intended to be used on parts of `path`, then it is the right place for it. But it was made too flexible, which makes it suitable for general string problems, but error prone for path problems. The way I see it, ideally there should be: 1. string method 2. sequence method 3. path utility Current `commonprefix` is doing 1 and 2 very well, but 3 is error-prone. Regards, dg > On 13 Jun 2024, at 05:32, Tim Peters <tim.pet...@gmail.com> wrote: > > [Rob Cliffe] > > The os.path.commonprefix function basically returns the common initial > > characters (if any) shared by a sequence of strings, e.g. > > ... > > It seems to me that this function (or something similar) should not be > > in os.path, but somewhere else where it would be more visible. > > It's certainly in a strange place ;-) > > > (1) a str.commonprefix function. Unfortunately it would have to be > > used like this > > str.commonprefix(<sequence of strings>) > > which would make it [I think] the only str function which > > couldn't be called as a string method. > > Sure it could. like > > astr.commonprefix(*strs) > > to find the longest common prefix among `astr` and the (zero or more) > optional arguments. > > > ... > > One wrinkle: os.patch.commonprefix, if passed an empty > > sequence, returns an empty STRING. > > If this function were designed from scratch, it should > > probably return None. > > Don't think so. An empty string is a valid string too, and is the "obviously > correct" common prefix of "abc" and "xyz". > > > I also suspect that this function could be made more efficient. It > > sorts the sequence. While sorting is very fast (thanks, Uncle Tim!) it > > seems a bit OTT in this case. > > It doesn't sort. It finds the min and the max of all the inputs, as separate > operations, and finds the longest common prefix of those teo alone. Apart > from the initial min/max calls, the other inputs are never looked at again. > It's a matter of logical deduction that if S <= L have K initial characters > in common, then every string T with S <= T <= L must have the same K initial > characters. > > As a micro-optimization, you might think the min'max could be skipped if > there were only two input strings. But then > > for i, c in enumerate(s1): > if c != s2[i]: > return s1[:i] > > could blow up with an IndexError if len(s2) < len(s1). As is, that can't > happen, because s1 <= s2 is known. At worst, the `for` loop can run to > exhaustion (in which case s2.startswith(s1)). > > So. to my eye, there are surprisingly few possibilities for speeding this. > > _______________________________________________ > Python-ideas mailing list -- python-ideas@python.org > To unsubscribe send an email to python-ideas-le...@python.org > https://mail.python.org/mailman3/lists/python-ideas.python.org/ > Message archived at > https://mail.python.org/archives/list/python-ideas@python.org/message/6AJN3FXTZMKSBF6KDQYSDSPO2GJOBP6S/ > Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7K56LVEZLNZMS3DH7S32KY7FWEEZFXOS/ Code of Conduct: http://python.org/psf/codeofconduct/