> > However I'm not sure your patch does the right thing re UTF-8, unless there's > some magic involved that I'm not seeing :-/ I'm no expert on how to deal with > UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit > encodings.
ok, after some to and fro with robin over on #modperl it looks like we discovered a few things... first, Apache::Util is not UTF-8 compliant, since it currently mangles C strings byte-by-byte, which introduces the possibility that all or part of a 2-byte character could be mangled. second, the patch follows suit and expands the range of 1-byte characters it mangles, which makes it more non-UTF-8 friendly. so, basically what we're thinking is that the new Apache::Util is more secure for non-UTF-8 encodings, while more broken for UTF-8. but UTF-8 is unusable with Apache::Util in either case, so the patch is probably a good thing. other ideas/eyeballs are welcome here, since we've just been going over the spec and making some conjectures - neither of us is an expert here by any means. once other people chime in, we can whip up a doc patch for Apache::Util as well. thanks --Geoff