On 01May2021 05:30, David Mertz <me...@gnosis.cx> wrote: >I was actually thinking about this before the recent "string comprehension" >thread. I wasn't really going to post the idea, but it's similar enough >that I am nudged to. Moreover, since PEP 616 added str.removeprefix() and >str.removesuffix(), this feels like a natural extension of that. > >I find myself very often wanting to remove several substrings of similar >lines to get at "the good bits" for my purpose. Log files are a good >example of this, but it arises in lots of other contexts I encounter. >Let's take a not-absurd hypothetical: > >GET [http://example.com/picture] 200 image/jpeg >POST [http://nowhere.org/data] 200 application/json >PUT [https://example.org/page] 200 text/html > >For each of these lines, I'd like to see the URL and the MIME type only. >The new str.removeprefix() helps some, but not as much as I would like >since the "remove a tuple of prefixes" idea was rejected for PEP 616. But >even past that, very often much of what I want to remove is in the middle, >not at the start or the end.
This is not a good way to tidy up log lines. try parsing it into fields: PUT http://example.com/picture 200 image.jpeg and then only looking at the fields you care about. >I know I can use regular expressions here. However, they are definitely a >higher cognitive burden, and especially so for those who haven't taught >them and written about them a lot, as I have. Even for me, I'd rather not >think about regexen if I don't *have to*. Though for this, they are ok. Or even just: method, _url_, code, mimetype = line.split(None,3) There shouldn't be any whitespace in a log line URL - it should be percent encoded. >So probably I'll do something >like this: > >for line in lines: > for noise in ('GET', 'POST', 'PUT', '200', '[', ']'): > line = line.replace(noise, '') This is a very bad way to do this. What about thr URL "http://example.com/foo/PUT/bah". Badness ensues. It's worse than using a well written regexp. > process_line(line) > >That's not horrible, but it would be nicer to write: > >for line in lines: > process_line(line.remove(('GET', 'POST', 'PUT', '200', '[', ']')) I'm -1 on this idea. As you note, str.replace already exists and does what your line.remove does, just on a single substring basis. It's a trivial exercise to write an mreplace(s,substrs) function. Just do it and put it in your personal kit, and import it. >Of course, if I really needed this as much as I seem to be suggesting, >I >know how to write a function `remove_strings()`... and I confess I have not >done that. Or at least I haven't done it in some standard "my_utils" module >I always import. Nonetheless, a string method would feel even more natural >than a function taking the string as an argument. A method is almost always "easier/natural", but how many do we really want? If you really want this, write a StrMixin with a bunch of nice methods, subclass str, and promote your lines to your new subclass. Methods managed! Cheers, Cameron Simpson <c...@cskk.id.au> _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/NIOOPW2LJ754EVTAVEDQWFW2RTCD2CH7/ Code of Conduct: http://python.org/psf/codeofconduct/