[Python-ideas] Re: commonprefix

2024-06-17 Thread Rob Cliffe via Python-ideas



On 13/06/2024 02:20, Rob Cliffe wrote:
The os.path.commonprefix function basically returns the common initial 
characters (if any) shared by a sequence of strings, e.g.
    os.path.commonprefix(("Python is great!", "Python is not bad", 
"Python helps")) # returns "Python "


The following was wrong (os.path.commonprefix uses min and max; it does 
not sort).  I withdraw it.
I also suspect that this function could be made more efficient. It 
sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) 
it seems a bit OTT in this case.


Thoughts?
Best wishes
Rob Cliffe
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/BCJWR24K34TJNQJC7SZCWUQDPBUSHEBB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: commonprefix

2024-06-17 Thread Rob Cliffe via Python-ideas
The os.path.commonprefix function basically returns the common initial 
characters (if any) shared by a sequence of strings, e.g.
    os.path.commonprefix(("Python is great!", "Python is not bad", 
"Python helps")) # returns "Python "


The following was wrong.  os.path.commonprefix uses min and max, not 
sort.  I withdraw it:
I also suspect that this function could be made more efficient.  It 
sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) it 
seems a bit OTT in this case.


Thoughts?
Best wishes
Rob Cliffe
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/C7S5C6YISAVYFVLBAQRCS6TS2BDJZANZ/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: commonprefix

2024-06-12 Thread Dom Grigonis
This being in `os.path` module suggests that the main intent is to find a 
common prefix of a `path`.

If this is the case, maybe it would be worth instead of:
```
if not isinstance(m[0], (list, tuple)):
m = tuple(map(os.fspath, m))
```
have
```
if not isinstance(m[0], (list, tuple)):
m = [os.fspath(el).split(SEP) for el in m]
…

as now (from tests):
```
commonprefix([b"home:swenson:spam", b"home:swen:spam”]) -> b"home:swen"
```
, which is not a common prefix of a path.

If my suggestion above is valid and it was intended to be used on parts of 
`path`, then it is the right place for it. But it was made too flexible, which 
makes it suitable for general string problems, but error prone for path 
problems.

The way I see it, ideally there should be:
1. string method
2. sequence method
3. path utility

Current `commonprefix` is doing 1 and 2 very well, but 3 is error-prone.

Regards,
dg

> On 13 Jun 2024, at 05:32, Tim Peters  wrote:
> 
> [Rob Cliffe]
> > The os.path.commonprefix function basically returns the common initial
> > characters (if any) shared by a sequence of strings, e.g.
> >  ...
> > It seems to me that this function (or something similar) should not be
> > in os.path, but somewhere else where it would be more visible.
> 
> It's certainly in a strange place ;-)
> 
> >  (1) a str.commonprefix function.  Unfortunately it would have to be
> > used like this
> >  str.commonprefix()
> >  which would make it [I think] the only str function which
> > couldn't be called as a string method.
> 
> Sure it could. like
> 
> astr.commonprefix(*strs)
> 
> to find the longest common prefix among `astr` and the (zero or more) 
> optional arguments.
> 
> > ...
> >One wrinkle: os.patch.commonprefix, if passed an empty
> > sequence, returns an empty STRING.
> >If this function were designed from scratch, it should
> > probably return None.
> 
> Don't think so. An empty string is a valid string too, and is the "obviously 
> correct" common prefix of "abc" and "xyz".
> 
> > I also suspect that this function could be made more efficient.  It
> > sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) it
> > seems a bit OTT in this case.
> 
> It doesn't sort. It finds the min and the max of all the inputs, as separate 
> operations, and finds the longest common prefix of those teo alone. Apart 
> from the initial min/max calls, the other inputs are never looked at again. 
> It's a matter of logical deduction that if S <= L have K initial characters 
> in common, then every string T with S <= T <= L must have the same K initial 
> characters.
> 
> As a micro-optimization, you might think the min'max could be skipped if 
> there were only two input strings. But then
> 
> for i, c in enumerate(s1):
> if c != s2[i]:
> return s1[:i]
> 
> could blow up with an IndexError if len(s2) < len(s1). As is, that can't 
> happen, because s1 <= s2 is known. At worst, the `for` loop can run to 
> exhaustion (in which case s2.startswith(s1)).
> 
> So. to my eye, there are surprisingly few possibilities for speeding this.
> 
> ___
> Python-ideas mailing list -- python-ideas@python.org
> To unsubscribe send an email to python-ideas-le...@python.org
> https://mail.python.org/mailman3/lists/python-ideas.python.org/
> Message archived at 
> https://mail.python.org/archives/list/python-ideas@python.org/message/6AJN3FXTZMKSBF6KDQYSDSPO2GJOBP6S/
> Code of Conduct: http://python.org/psf/codeofconduct/

___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/7K56LVEZLNZMS3DH7S32KY7FWEEZFXOS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-ideas] Re: commonprefix

2024-06-12 Thread Tim Peters
[Rob Cliffe]
> The os.path.commonprefix function basically returns the common initial
> characters (if any) shared by a sequence of strings, e.g.
>  ...
> It seems to me that this function (or something similar) should not be
> in os.path, but somewhere else where it would be more visible.

It's certainly in a strange place ;-)

>  (1) a str.commonprefix function.  Unfortunately it would have to be
> used like this
>  str.commonprefix()
>  which would make it [I think] the only str function which
> couldn't be called as a string method.

Sure it could. like

astr.commonprefix(*strs)

to find the longest common prefix among `astr` and the (zero or more)
optional arguments.

> ...
>One wrinkle: os.patch.commonprefix, if passed an empty
> sequence, returns an empty STRING.
>If this function were designed from scratch, it should
> probably return None.

Don't think so. An empty string is a valid string too, and is the
"obviously correct" common prefix of "abc" and "xyz".

> I also suspect that this function could be made more efficient.  It
> sorts the sequence.  While sorting is very fast (thanks, Uncle Tim!) it
> seems a bit OTT in this case.

It doesn't sort. It finds the min and the max of all the inputs, as
separate operations, and finds the longest common prefix of those teo
alone. Apart from the initial min/max calls, the other inputs are never
looked at again. It's a matter of logical deduction that if S <= L have K
initial characters in common, then every string T with S <= T <= L must
have the same K initial characters.

As a micro-optimization, you might think the min'max could be skipped if
there were only two input strings. But then

for i, c in enumerate(s1):
if c != s2[i]:
return s1[:i]

could blow up with an IndexError if len(s2) < len(s1). As is, that can't
happen, because s1 <= s2 is known. At worst, the `for` loop can run to
exhaustion (in which case s2.startswith(s1)).

So. to my eye, there are surprisingly few possibilities for speeding this.
___
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/6AJN3FXTZMKSBF6KDQYSDSPO2GJOBP6S/
Code of Conduct: http://python.org/psf/codeofconduct/