On Mon, Apr 1, 2019, 8:54 PM Steven D'Aprano <st...@pearwood.info> wrote:
> The point I am making is not that we must not ever support multiple > affixes, but that we shouldn't rush that decision. Let's pick the > low-hanging fruit, and get some real-world experience with the function > before deciding how to handle the multiple affix case. > There are exactly two methods of strings that deal specifically with affixes currently. Startswith and endswith. Both of those allow specifying multiple affixes. That's pretty strong real-world experience, and breaking the symmetry for no reason is merely confusing. Especially since the consistency would be obviously as commonly useful. Now look, the sky won't fall if a single-affix-only method is added. For that matter, it won't if nothing is added. In fact, the single affix version makes it a little bit easier to write a custom function handling multiple affixes. And the sky won't fall if the remove-just-one semantics are used rather than remove-from-class. But adding methods with sneakily helpful capabilities often helps users greatly. A lot of folks in this thread didn't even know about passing a tuple to str.startswith() a few days ago. I'm pretty sure that capability was added by Raymond, who has an amazingly good sense of what little tricks can prove really powerful. Apologies to a different developer if it wasn't him, but congrats and thanks to you if so. Somebody (I won't name names, but they know who they are) wrote to me > off-list some time ago and accused me of being arrogant and thinking I know > more than everyone else. Well perhaps I am, but I'm not so arrogant as to > think that I can choose the right behaviour for clashing affixes for other > people when my own use-cases don't have clashing affixes. > That could be me... Unless it's someone else :-). I think my intent was a bit different than you characterize, but I'm very guilty of presuming too much also. So mea culpa. > Sure, but I've often wanted to do something like "strip off a prefix > > of http:// or https://", or something else that doesn't have a > > semantic that's known to the stdlib. > > I presume there's a reason you aren't using urllib.parse and you just need > a string without the leading scheme. If you're doing further parsing, the > stdlib has the right batteries for that. > I know there are lots of specialized string manipulations in the STDLIB. Yeah, I could use os.path.splitext, and os.path.split, and urllib.parse.something, and lots of other things I rarely use. A lot of us like to manipulate strings in generically stringy ways. But not until we had a couple of releases of experience with them: > > https://docs.python.org/2.7/library/stdtypes.html#l.endswith > <https://docs.python.org/2.7/library/stdtypes.html#str.endswith> Ok. Fair point. I used Python 2.4 without the multiple affix option. Here's a partial list of English prefixes that somebody doing text > processing might want to remove to get at the root word: > > a an ante anti auto circum co com con contra contro de dis > en ex extra hyper il im in ir inter intra intro macro micro > mono non omni post pre pro sub sym syn tele un uni up > > I count fourteen clashes: > > a: an ante anti > an: ante anti > co: com con contra contro > ex: extra > in: inter intra intro > un: uni > This seems like a good argument for remove-all-from-class. :-) stem = word.lstrip(prefix_tup) But the we really need 'word.porter_stemmer()' as a built-in method.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/