On Sat, 12 May 2007 12:05:26 -0700
Allison Randal (via RT) <[EMAIL PROTECTED]> wrote:
> On x86 Linux (Ubuntu), this configuration fails 2 tests:
> 
> t/library/string_utils.t    0   134    29    4  13.79%  28-29
> t/op/stringu.t              2   512    25    2   8.00%  1 19
> 
> Both tests are failing with the error:
> 
> parrot: src/encodings/utf8.c:271: utf8_encode_and_advance: Assertion 
> `i->bytepos <= (s)->obj.u._b._buflen' failed.

Reproduced on Gentoo.  Before patch, results are as above.

After patch:

t/library/string_utils....ok
t/op/stringu..............ok

The code in utf8_encode_and_advance is beautiful.  It basically says,
add a utf8 character to the buffer.  Ok, now did we overrun the buffer?
CRASH!

It seems safer to check the buffer size *before* writing to it, so
here's a patch to do so.  Is it the right fix?  I thought so when I
was doing it, but now I'm not so sure; it does introduce a const
warning.  Maybe we can resolve that with a cast; maybe its the wrong
solution to the problem.  Please provide guidance.

Might be worth it to prereserve 8 bytes or so, to avoid having to
realloc as often, if this will be called a lot.  Currently it just
reallocs the minimum necessary to fit the existing string, the new
character and a null terminator.

Mark
=== src/encodings/utf8.c
==================================================================
--- src/encodings/utf8.c	(revision 20520)
+++ src/encodings/utf8.c	(local)
@@ -264,6 +264,9 @@
     const STRING *s = i->str;
     unsigned char *new_pos, *pos;
 
+    if(i->bytepos + UNISKIP(c) >= PObj_buflen(s)) {
+        Parrot_reallocate_string(interp, i->str, i->bytepos + UNISKIP(c) + 1);
+    }
     pos = (unsigned char *)s->strstart + i->bytepos;
     new_pos = utf8_encode(pos, c);
     i->bytepos += (new_pos - pos);

Reply via email to