Re: question about speed of sequential string replacement vs regex or

Terry Reedy Wed, 28 Sep 2011 14:27:18 -0700

On 9/28/2011 5:28 AM, Xah Lee wrote:

curious question.


suppose you have 300 different strings and they need all be replaced
to say "aaa".

is it faster to replace each one sequentially (i.e. replace first
string to aaa, then do the 2nd, 3rd,...)
, or is it faster to use a regex with “or” them all and do replace one
shot? (i.e. "1ststr|2ndstr|3rdstr|..." ->  aaa)

Here the problem is replace multiple random substrings with one randomsubstring that could create new matches. I would start with the re 'or'solution.

btw, the origin of this question is about writing a emacs lisp
function that replace ~250 html named entities to unicode char.

As you noted this is a different problem in that there is a differentreplacement for each. Also, the substrings being searched for are notrandom but have a distinct and easily recognized structure. Thereplacement cannot create a new match. So the multiple scan approach*could* work.

Unspecified is whether the input is unicode or ascii bytes. If thelatter I might copy to a bytearray (mutable), scan forward, replaceentity defs with utf-8 encoding of the corresponding unicode (with adict lookup, and which I assume are *always* fewer chars), and shiftother chars to close any gaps created.

If the input is unicode, I might do the same with array.array (which iswhere bytearray came from). Or I might use the standard idiom ofconstructing a list of pieces of the original, with replacements, and''.join() at the end.


--
Terry Jan Reedy


--
http://mail.python.org/mailman/listinfo/python-list

Re: question about speed of sequential string replacement vs regex or

Reply via email to