Luke Palmer wrote: > > Benjamin Golberg writes: > > Actually, these are mostly questions about the string_str_index > > function. > > Uh oh... > > > I've some questions about bufstart, strstart, bufused, strlen and > > encoding->characters? > > > > In string_str_index_multibyte, the lastmatch variable is calculated as: > > > > const void* const lastmatch = > > str->encoding->skip_backward((char*)str->strstart + str->strlen, > > find->encoding->characters(find, find->strlen)); > > > > There seems to be quite a bit of confusion on this line about bytes and > > characters... the goal here seems to be to find a pointer to the last > > place where it would be possible to begin a match. > > Yep. > > You're right, there is a bit of confusion about characters and bytes > in this statement -- mostly because I'm confused about characters and > bytes in Parrot. So... str->strlen is the number of *characters* in > the string? Hmm.. that changes things. > > Maybe someone else should fix this -- who knows what they're doing :-) > Do we have tests for multibyte string operations in the test suite? > > > What's with find and ->characters? Shouldn't find->strlen be > > sufficient, without all that other stuff around it? Next... > > If find->strlen represents the number of characters as you say, then > yes. > > > If these weren't multibyte strings, then this would be (str->strstart + > > str->strlen - find->strlen), right? Or, translating that literally (and > > doing the subtraction first): > > > > const void* const lastmatch = str->encoding->skip_forward( > > str->strstart, str->strlen - find->strlen ); > > Yeah, the thing about that is, for strings in UTF formats, > skip_forward is a linear time operation, which is pretty expensive > when there's a lot of data. That's why I used pointers in this > function instead of string_index as the previous implementation did.
Except, of course, that the pointer arithmetic version was wrong :( > > Or, if we can do that trick for finding the end of a string: > > > > const void* const lastmatch = str->encoding->skip_backward( > > (char*)str->bufstart + str->bufused, find->strlen ); > > > > Similarly, the lastfind variable should either be: > > > > const void* const lastfind = find->encoding->skip_forward( > > find->strlen ); > > skip_forward takes 2 args, I assume you mean: > > const void* const lastfind = find->encoding->skip_forward( > find, find->strlen); Actually, I think that I meant: const void* const lastfind = find->encoding->skip_forward( find->strstart, find->strlen ); Since I assume that the functions of encoding objects operate on pointers into a buffer's data area, *not* on STRING* objects. > Again, that's linear time. But usually the string to find won't be > that long, so it's not so important in this case. But your shortcut > would still be faster. > > > Or: > > > > const void* const lastfind = (char*)find->bufstart + find->bufused; -- $a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED] ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}