Hi!

I stumbled upon a "problem" with the function strtr() the other day... I
noticed a very long running php script, and tried to reproduce the
behaviour.

I traced it down to a single call of strtr doing text replacements using a
search => replace array.
It wasn't quit obvious why the call would take that long, so I started to
investigate the issue.

>From the docs, it says that strtr "... will be the most efficient when all
the keys have the same size".
My testcase showed, that in fact I was using Keys of very different lengths
(they are determined automatically, so there's no fixed list).

I wrote a simple script to reproduce this behaviour:

<?php

$text = str_repeat( 'm', 2000 );

$long_from_a = str_repeat( 'a', 1 );
$long_from_x = str_repeat( 'x', 1500 );

$replacements = array(
  $long_from_a => 'b',
  $long_from_x => 'y'
);

$start = microtime( true );
$result_1 = strtr( $text, $replacements );
echo "strtr: " . number_format( microtime( true ) - $start, 4 ) . "\n";

$start = microtime( true );
$result_2 = str_replace( array_keys( $replacements ), array_values(
$replacements ), $text );
echo "str_replace: " . number_format( microtime( true ) - $start, 4 ) .
"\n";

echo $result_1 === $result_2 ? "results match!\n": "no match!\n";

?>

On my box, this reports 2.5 seconds for strtr and 0.0 seconds for
str_replace. As far as I know the only difference between str_replace and
strtr is that strtr does not replace stuff in already replaced parts of the
string. Might be wrong here, though.

If I adjust the str_repeat for "m" from 2000 to 20000 the runtime is 45
seconds for strtr and 0.0001 for str_replace.

I might have chosen the wrong tool for what I'm trying to achieve in the
first place, but can anyone comment on the algorithmic complexity of strtr?
This is definitely not the expected behaviour for such small inputs. Since
the inputs varied and the keys where determined automatically in my
original script, I was confronted with runtimes of several hours compared
to just a few seconds with str_replace.

If this is the expected behaviour, at least the documentation should be
adjusted to state that this function is very inefficient with keylengths
that are very distant from each other...

Greetings

Nico

Reply via email to