> Correct, utf7/8 are otherwise escaped.

It's stricter than that, at least for UTF 8,16 and 32 (I haven't checked 7)
-- they don't use values < 0x80 at all except when representing characters
which are the same in 7-bit ASCII.  This means, given any of the encodings {
ASCII, ISO-8859-x, UTF-{8/16/32} } you can safely memchr(buf, '/', size) and
rely on the result without back-tracking.

FWIW, all those encodings also have the nice property that you can find the
number of bytes of encoding used for any character by examining only the
first byte of the character. That property is helpful, for example, when
writing lexers.

You're right that shift-JIS in particular needs attention paid to it.
Locally, I try very hard not to support any non-unicode character set, but I
understand that's a luxury that APR does not have.

Wes

-- 
Wesley W. Garland
Director, Product Development
PageMail, Inc.
+1 613 542 2787 x 102

Reply via email to