On 20 July 2013 22:56, Dave Angel <da...@davea.name> wrote: > On 07/20/2013 02:37 PM, Joshua Landau wrote: >> >> The problem can be solved, I'd imagine, for builtin types. Just build >> an internal representation upon calling .translate that's faster. It's >> especially easy in the list case > > What "list case"? list doesn't have a replace() method or translate() > method.
I mean some_str.translate(some_list). >> -- just build a C array¹ at the start >> mapping int -> int and then have really fast C mapping speeds. > > > As long as you can afford to have a list with a billion or so entries in it. > We are talking about strings and version 3.3, aren't we? Of course, one > could always examine the mapping object (table) and see what the max value > was, and only build a "C array" if it was smaller than say 50,000. When talking about some_str.translate(some_list), this doesn't apply very much -- they've already gotten a much bigger Python list. In the dict case² I don't actually want to jump to the conclusion that one should do array-based mappings because I can see the obvious downsides and it's obviously not good to have 100 cases in there, *but* I still think that there's a solution. Here are some ideas: · Latin and ASCII can obviously be done with a C array, and I imagine that covers at least a fair portion of use-cases. · If you only have a few characters in the mapping (so sys.getsizeof is small) then it'll be a lot faster to just iterate through a C list instead of checking the dict. · Other cases are: · Full-character-set or equiv. mappings, which are already faster than .replace(). Those should really be re-made into lists so that the list optimisation can take place, and lists are much faster even in versions without these hypothetical optimizations, too. · Custom objects. There's nothing we can do here. I realise that this is a lot more code, so it's not something I'm going to try to force. However, I think it's useful if it stops people using .replace in a loop ;). ² some_str.translate(some_dict) -- http://mail.python.org/mailman/listinfo/python-list