Jakub Wilk <jw...@debian.org> writes: >>The reason is the following (see >>https://github.com/pediapress/pyfribidi/issues/2): >> >> fribidi_utf8_to_unicode consumes at most 3 bytes for a single >> unicode character, i.e. it does not handle unicode character above >> 0xffff. > > As far as I can see this is not true. In Debian, we allocate 4 bytes > per characters. (An upstream version, which the Debian package is > based on, is completely broken in this respect: it allocates a buffer > of static size. See bug #570068)
upstream is pretty much dead in this case. I've published our version on PyPI. However, I didn't ask or inform the original authors about that. > >> For a 4 byte utf-8 sequence it will generate 2 unicode characters, >> which overflows the logical buffer. > > I'm confused. What is "it" in your sentence? Why 2 Unicode characters? "it" refers to the 4 byte utf-8 sequence. here's the inner loop of "fribidi_utf8_to_unicode" from fribidi-char-sets-utf8.c: ,---- | length = 0; | while ((FriBidiStrIndex) (s - t) < len) | { | register unsigned char ch = *s; | if (ch <= 0x7f) /* one byte */ | { | *us++ = *s++; | } | else if (ch <= 0xdf) /* 2 byte */ | { | *us++ = ((*s & 0x1f) << 6) + (*(s + 1) & 0x3f); | s += 2; | } | else /* 3 byte */ | { | *us++ = | ((int) (*s & 0x0f) << 12) + | ((*(s + 1) & 0x3f) << 6) + (*(s + 2) & 0x3f); | s += 3; | } | length++; | } `---- Assume you have a 4-byte utf-8 sequence. One loop step consumes a maximum of 3 bytes of that 4-byte sequence (there's no "4 byte" case), leaving 1-byte of that sequence for further processing. this 1 byte will generate another unicode character. pyfribidi uses the length of the python unicode string as buffer size, which is less than what the fribidi_utf8_to_unicode generates. and there you have your buffer overflow. to confirm the issue, you can add an assert and check that fribidi_utf8_to_unicode's return value (the length of the string) equals unicode_length. > > Anyway I tried to double the buffer size (8 bytes per characters of > original string) but this didn't fix the crash. So likely the problem > lies somewhere else. I'm pretty sure my analysis is correct and I'm not so quite sure what you did here. -- Cheers Ralf -- To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org