On Sat, 12 May 2007 12:05:26 -0700 Allison Randal (via RT) <[EMAIL PROTECTED]> wrote: > On x86 Linux (Ubuntu), this configuration fails 2 tests: > > t/library/string_utils.t 0 134 29 4 13.79% 28-29 > t/op/stringu.t 2 512 25 2 8.00% 1 19 > > Both tests are failing with the error: > > parrot: src/encodings/utf8.c:271: utf8_encode_and_advance: Assertion > `i->bytepos <= (s)->obj.u._b._buflen' failed.
Reproduced on Gentoo. Before patch, results are as above. After patch: t/library/string_utils....ok t/op/stringu..............ok The code in utf8_encode_and_advance is beautiful. It basically says, add a utf8 character to the buffer. Ok, now did we overrun the buffer? CRASH! It seems safer to check the buffer size *before* writing to it, so here's a patch to do so. Is it the right fix? I thought so when I was doing it, but now I'm not so sure; it does introduce a const warning. Maybe we can resolve that with a cast; maybe its the wrong solution to the problem. Please provide guidance. Might be worth it to prereserve 8 bytes or so, to avoid having to realloc as often, if this will be called a lot. Currently it just reallocs the minimum necessary to fit the existing string, the new character and a null terminator. Mark
=== src/encodings/utf8.c ================================================================== --- src/encodings/utf8.c (revision 20520) +++ src/encodings/utf8.c (local) @@ -264,6 +264,9 @@ const STRING *s = i->str; unsigned char *new_pos, *pos; + if(i->bytepos + UNISKIP(c) >= PObj_buflen(s)) { + Parrot_reallocate_string(interp, i->str, i->bytepos + UNISKIP(c) + 1); + } pos = (unsigned char *)s->strstart + i->bytepos; new_pos = utf8_encode(pos, c); i->bytepos += (new_pos - pos);