[Python-ideas] Re: commonprefix

Dom Grigonis Wed, 12 Jun 2024 21:32:41 -0700

This being in `os.path` module suggests that the main intent is to find a 
common prefix of a `path`.


If this is the case, maybe it would be worth instead of:
```
if not isinstance(m[0], (list, tuple)):
        m = tuple(map(os.fspath, m))
```
have
```
if not isinstance(m[0], (list, tuple)):
        m = [os.fspath(el).split(SEP) for el in m]
…

as now (from tests):
```
commonprefix([b"home:swenson:spam", b"home:swen:spam”]) -> b"home:swen"
```
, which is not a common prefix of a path.

If my suggestion above is valid and it was intended to be used on parts of 
`path`, then it is the right place for it. But it was made too flexible, which 
makes it suitable for general string problems, but error prone for path 
problems.

The way I see it, ideally there should be:
1. string method
2. sequence method
3. path utility

Current `commonprefix` is doing 1 and 2 very well, but 3 is error-prone.

Regards,
dg

> On 13 Jun 2024, at 05:32, Tim Peters <tim.pet...@gmail.com> wrote:
> 
> [Rob Cliffe]
> > The os.path.commonprefix function basically returns the common initial
> > characters (if any) shared by a sequence of strings, e.g.
> >  ...
> > It seems to me that this function (or something similar) should not be
> > in os.path, but somewhere else where it would be more visible.
> 
> It's certainly in a strange place ;-)
> 
> >      (1) a str.commonprefix function.  Unfortunately it would have to be
> > used like this
> >              str.commonprefix(<sequence of strings>)
> >          which would make it [I think] the only str function which
> > couldn't be called as a string method.
> 
> Sure it could. like
> 
>     astr.commonprefix(*strs)
> 
> to find the longest common prefix among `astr` and the (zero or more) 
> optional arguments.
> 
> > ...
> >            One wrinkle: os.patch.commonprefix, if passed an empty
> > sequence, returns an empty STRING.
> >                If this function were designed from scratch, it should
> > probably return None.
> 
> Don't think so. An empty string is a valid string too, and is the "obviously 
> correct" common prefix of "abc" and "xyz".
> 
> > I also suspect that this function could be made more efficient.  It
> > sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) it
> > seems a bit OTT in this case.
> 
> It doesn't sort. It finds the min and the max of all the inputs, as separate 
> operations, and finds the longest common prefix of those teo alone. Apart 
> from the initial min/max calls, the other inputs are never looked at again. 
> It's a matter of logical deduction that if S <= L have K initial characters 
> in common, then every string T with S <= T <= L must have the same K initial 
> characters.
> 
> As a micro-optimization, you might think the min'max could be skipped if 
> there were only two input strings. But then
> 
>     for i, c in enumerate(s1):
>         if c != s2[i]:
>             return s1[:i]
> 
> could blow up with an IndexError if len(s2) < len(s1). As is, that can't 
> happen, because s1 <= s2 is known. At worst, the `for` loop can run to 
> exhaustion (in which case s2.startswith(s1)).
> 
> So. to my eye, there are surprisingly few possibilities for speeding this.
> 
> _______________________________________________
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/6AJN3FXTZMKSBF6KDQYSDSPO2GJOBP6S/
> Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7K56LVEZLNZMS3DH7S32KY7FWEEZFXOS/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-ideas] Re: commonprefix

Reply via email to