On 4/1/19 9:34 PM, David Mertz wrote:
On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <st...@pearwood.info> wrote:

The point I am making is not that we must not ever support multiple
affixes, but that we shouldn't rush that decision. Let's pick the
low-hanging fruit, and get some real-world experience with the function
before deciding how to handle the multiple affix case.


There are exactly two methods of strings that deal specifically with
affixes currently. Startswith and endswith. Both of those allow specifying
multiple affixes. That's pretty strong real-world experience, and breaking
the symmetry for no reason is merely confusing. Especially since the
consistency would be obviously as commonly useful.

My imagination is failing me:  for multiple affixes
(affices?), what is a use case for removing one, but not
having the function return which one?  In other words,
shouldn't a function that removes multiple affixes also
return which one(s) were removed?  I think I'm agreeing
with Steven:  take the low hanging fruit now, and worry
about complexification later (because I'm not sure that the
existing API is good when removing multiple affixes).

Stemming is hard, because a lot of words begin/end with
common affixes, but that string of letters isn't always an
affix.  For example, removing common prefixes from "relay"
leaves "lay," but that's not the root; similarly with "relax"
and "area."  If my algorithm is "look for the word in a list
of known words, if it's there then great, but if it's not
then remove one affix and try again," then I don't want to
remove all the affixes at once.

When removing extensions from filenames, all of my use cases
involve removing one at a time and acting on the one that
was removed.  For example, decompressing foo.tar.gz into
foo.tar, and then untarring foo.tar into foo.  I suppose I
can imagine removing tar.gz and then decompressing and
untarring in one step, but again, then I have to know which
suffixes were removed.  Or maybe I could process foo.tar.gz
and want to end up with foo.norm (bonus points for
recognizing the XKCD reference), but my personal preference
would still be to produce foo.tar.gz.norm by default and let
the user specify the ultimate filename if they want something
else.

So I've seen someone (likely David Mertz?) ask for something
like filename.strip_suffix(('.png', '.jpg')).  What is the
context?  Is it strictly a filename processing program?  Do
you subsequently have to determine the suffix(es) at hand?
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to