Re: string.c questions

Benjamin Goldberg Sun, 03 Aug 2003 17:34:58 -0700


Luke Palmer wrote:
> 
> Benjamin Golberg writes:
> > Actually, these are mostly questions about the string_str_index
> > function.
> 
> Uh oh...
> 
> > I've some questions about bufstart, strstart, bufused, strlen and
> > encoding->characters?
> >
> > In string_str_index_multibyte, the lastmatch variable is calculated as:
> >
> >     const void* const lastmatch =
> >        str->encoding->skip_backward((char*)str->strstart + str->strlen,
> >           find->encoding->characters(find, find->strlen));
> >
> > There seems to be quite a bit of confusion on this line about bytes and
> > characters... the goal here seems to be to find a pointer to the last
> > place where it would be possible to begin a match.
> 
> Yep.
> 
> You're right, there is a bit of confusion about characters and bytes
> in this statement -- mostly because I'm confused about characters and
> bytes in Parrot.  So... str->strlen is the number of *characters* in
> the string?  Hmm.. that changes things.
> 
> Maybe someone else should fix this -- who knows what they're doing :-)
> Do we have tests for multibyte string operations in the test suite?
> 
> > What's with find and ->characters?  Shouldn't find->strlen be
> > sufficient, without all that other stuff around it?  Next...
> 
> If find->strlen represents the number of characters as you say, then
> yes.
> 
> > If these weren't multibyte strings, then this would be (str->strstart +
> > str->strlen - find->strlen), right?  Or, translating that literally (and
> > doing the subtraction first):
> >
> >     const void* const lastmatch = str->encoding->skip_forward(
> >        str->strstart, str->strlen - find->strlen );
> 
> Yeah, the thing about that is, for strings in UTF formats,
> skip_forward is a linear time operation, which is pretty expensive
> when there's a lot of data.  That's why I used pointers in this
> function instead of string_index as the previous implementation did.


Except, of course, that the pointer arithmetic version was wrong :(


> > Or, if we can do that trick for finding the end of a string:
> >
> >     const void* const lastmatch = str->encoding->skip_backward(
> >        (char*)str->bufstart + str->bufused, find->strlen );
> >
> > Similarly, the lastfind variable should either be:
> >
> >     const void* const lastfind = find->encoding->skip_forward(
> >        find->strlen );
> 
> skip_forward takes 2 args, I assume you mean:
> 
>     const void* const lastfind = find->encoding->skip_forward(
>         find, find->strlen);

Actually, I think that I meant:

     const void* const lastfind = find->encoding->skip_forward(
        find->strstart, find->strlen );

Since I assume that the functions of encoding objects operate on pointers
into a buffer's data area, *not* on STRING* objects.

> Again, that's linear time.  But usually the string to find won't be
> that long, so it's not so important in this case.  But your shortcut
> would still be faster.
> 
> > Or:
> >
> >     const void* const lastfind = (char*)find->bufstart + find->bufused;

-- 
$a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca
);{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED]
]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}

Re: string.c questions

Reply via email to