> 
> However I'm not sure your patch does the right thing re UTF-8, unless there's
> some magic involved that I'm not seeing :-/ I'm no expert on how to deal with
> UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit
> encodings.


ok, after some to and fro with robin over on #modperl it looks like we discovered a few
things...

first, Apache::Util is not UTF-8 compliant, since it currently mangles C strings
byte-by-byte, which introduces the possibility that all or part of a 2-byte character
could be mangled.

second, the patch follows suit and expands the range of 1-byte characters it mangles,
which makes it more non-UTF-8 friendly.

so, basically what we're thinking is that the new Apache::Util is more secure for
non-UTF-8 encodings, while more broken for UTF-8.  but UTF-8 is unusable with 
Apache::Util
in either case, so the patch is probably a good thing.

other ideas/eyeballs are welcome here, since we've just been going over the spec and
making some conjectures - neither of us is an expert here by any means.

once other people chime in, we can whip up a doc patch for Apache::Util as well.

thanks

--Geoff

Reply via email to