Hi Andrei,

Pardon me for my ignorance, as I have not even looked at the Unicode stuff, but based on what you wrote, what about always allocating two UChars per codepoint? It would take a bit more space, but then random-offset indexing is fast and easy (the codepoint would always start at "index << 1").


Regards,

Jessie Hernandez


Andrei Zmievski wrote:
You probably saw that I have committed initial implementation of TextIterator. The impetus for this is that direct indexing of Unicode strings via [] operator is slow, very slow, at least currently. The reason is that [] cannot simply perform random-offset indexing into UCHar* strings. It needs to start from the beginning of the string and iterate forward until it reaches the desired offset, because our default unit is a codepoint, which can take up 1 or 2 UChar's.

So here are some (rough) numbers on the relative performance of TextIterator vs. []. The script I used was a simple one (attached after the signature). Each test was 10000 runs over 500-character string.

[] operator:  27.16373 s
TextIterator: 1.89697 s (!)

For comparison, running the same [] operator test on a 500-character binary (old-style) string gives me 9.11334 s. Quite interesting, I'd say.

I am not sure how we can optimize [] to be faster than the iterator approach. Food for thought?

- Andrei

<?php
$a = str_repeat('a\U010201bcß', 100);
var_dump($a);

/* warm up the engine */
for ($x = 0; $x < 100; $x++) {
foreach (new TextIterator($a) as $c) {
}
}

/* measure [] */
$start = microtime(true);
for ($x = 0; $x < 10000; $x++) {
    $len = strlen($a);
    for ($i = 0; $i < $len; $i++) {
        $c = $a[$i];
    }
}
$end = microtime(true);

printf("[] run time: %.5f\n", $end - $start);

/* measure iterator */
$start = microtime(true);
for ($x = 0; $x < 10000; $x++) {
foreach (new TextIterator($a) as $c) {
}
}
$end = microtime(true);

printf("iterator run time: %.5f\n", $end - $start);
?>

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to