On 2016-10-26 23:17, Chris Barker wrote:
I"ve lost track of what (If anything) is actually being proposed here...
so I"m going to try a quick summary:


1) an easy way to spell "remove all the characters other than these"

I think that's a good idea. What with unicode having an enormous number
of code points, it really does make sense to have a way to specify only
what you want, rather than what you don't want.

Back in the good old days of 1-byte chars, it wasn't hard to build up a
full 256 element translate table -- not so much anymore. And one of the
whole points of str.translate() is good performance.

 a) a new method:

   str.remove_all_but(sequence_of_chars)
  (naming TBD)

b) a new flag in translate (Kind of like the decode keywords)

  str.translate(table, missing='ignore'|'remove')

c) pass a function that returns the replacement:

    def replace(c):
        return c.upper() if c.isalpha() else ''

    str.translate(replace)

The replacement function could be called only on distinct codepoints.


(b) has the advantage of adding translation and removal in one fell
swoop -- but if you only want to remove, then you have to make a
translation table of 1:1 mappings = not hard, but a annoying:

table = {c:c for c in sequence_of_chars}

I'm on the fence about what I personally prefer.

2) (in another thread, but similar enough) being able to pass in more
than one string to replace:

str.replace( old=seq_of_strings, new=seq_of_strings )

I know I've wanted this a lot, and certainly from a performance
perspective, it could be a nice bonus.

But: It overlaps a lot with str.translate -- at least for single
character replacements. so really why? so it would really only make
sense if supported multi-char strings:

str.replace(old = ("aword", "another_word"), ("something", "something
else"))

However: a string IS a sequence of strings, so we'd have confusion about
that:

str.replace("this", "four")

Does the user want the word "this" replaced with the word "four" -- or
do they want each character replaced?

Maybe we'd need a .replace_many() method? ugh!

There are also other issues with what to di with repeated / overlapping
cahractors:

str.replace( ("aaa", "a", "b"), ("b", "bbb", "a")

and all sort of other complications!

Possible choices are:

1) Use the given order.

2) Check from the longest to the shortest.

If you're going to pick choice 2, does it have to be 2 tuples/lists? Why not a dict instead?

THAT I think could be nailed down by defining the "order of operations"
Does it lop through the entire string for each item? or through each
item for each point in the string? note that if you loop thorugh the
entire string for each item, you might as well have written the loop
yourself:

for old, new in sip(old_list, new_list):
    s = s.replace(old, new))

and at least if the length of the string si long-ish, and the number of
replacements short-ish -- performance would be fine.


*** So the question is -- is there support for these enhancements? If
so, then it would be worth hashing ot the details.

But the next question is -- does anyone care enough to manage that
process -- it'll be a lot of work!

NOTE: there has also been a fair bit of discussion in this thread about
ordinals vs characters, and unicode itself -- I don't think any of that
resulted in any possible proposals...

[snip]

_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to