Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

Patrick R. Michaud Sat, 05 Jan 2008 09:34:27 -0800

On Sat, Jan 05, 2008 at 12:17:00PM +0100, Cosimo Streppone wrote:
> Patrick wrote:
> >
> >[...] I also improved utf8_set_position
> >a bit so that it doesn't always have to restart position
> >counting from the beginning of the string.  As a result,
> >compiling the actions.pl script on my machine goes from 39s to
> >a little over 28s -- about a 25% speed increase.
> 
> [... /me reads again the diff ...]
> 
> I realized while writing this that if `i->charpos > pos'.
> you simply end up re-scanning the string from the start.
> Is that correct?


Correct.

> Maybe it could be an idea to scan backwards in that case?
> Please don't yell at me, I'm just trying to follow up :)

It's a very good question.  My suspicion is that we rarely
"scan backwards".  The utf8_set_position function actually 
manipulates a String_iter struct, as opposed to the string
itself, and from the cases I've looked at, it looks as 
though utf8_set_position is used to set the iterator to a 
known offset after the iterator is created.  All subsequent
scanning with the iterator tends to go forward after that
(using the get_and_advance and set_and_advance methods).

So, scanning backwards feels a lot like a premature optimization
to me -- i.e., we could implement it, but there's a good chance
it never comes up in real use.  In fact, I just did a quick
check of this by adding a print message to utf8_set_position
to send a message whenever a backwards scan is encountered,
and the only times it occurs we're moving from offset 1 to
offset 0 -- i.e., "scanning backwards" would actually take longer
than the current algorithm.  (I'm also a little curious as
to the conditions when we'd be moving from offset 1 to offset 0,
but will check that later.)

Thanks for the question and reviewing the code!  Based on this
I'm removing the "XXX" note about scanning in both directions,
and I also see where I forgot to properly cast the initialization
of u8ptr.

Pm

Re: Repeated Loopy Variable Width String Character Access is Slooooow-ish

Reply via email to