Stas Bekman wrote: > > Geoffrey Young wrote: > > >>However I'm not sure your patch does the right thing re UTF-8, unless there's > >>some magic involved that I'm not seeing :-/ I'm no expert on how to deal with > >>UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit > >>encodings. > >> > > > > > > ok, after some to and fro with robin over on #modperl it looks like we discovered >a few > > things... > > > > first, Apache::Util is not UTF-8 compliant, since it currently mangles C strings > > byte-by-byte, which introduces the possibility that all or part of a 2-byte >character > > could be mangled. > > > > second, the patch follows suit and expands the range of 1-byte characters it >mangles, > > which makes it more non-UTF-8 friendly. > > > > so, basically what we're thinking is that the new Apache::Util is more secure for > > non-UTF-8 encodings, while more broken for UTF-8. but UTF-8 is unusable with >Apache::Util > > in either case, so the patch is probably a good thing. > > > > other ideas/eyeballs are welcome here, since we've just been going over the spec >and > > making some conjectures - neither of us is an expert here by any means. > > > > once other people chime in, we can whip up a doc patch for Apache::Util as well. > > Since Apache::Util wasn't ported to mod_perl 2.0 and I was thinking to > do that at some point. So we can work on the Apache::Util for 2.0 and > then backport it to 1.x. Sounds like a more promising scenario.
however it comes about is fine, I guess. however, if Apache::Util in 1.3 is left un-patched then we're kinda giving a false impression that calling Apache::Util::escape_html() is sufficient to thwart CSS attacks when it really only keeps all but the most clever away. > > So what spec are you working with? robin and I were reading http://www.cl.cam.ac.uk/~mgk25/unicode.html but there may be others. > > Can we just reap the functionality from some Perl core module in > bleadperl that does it right? well, the problem that robin and I were contemplating is that Apache::Util is supposed to be fast because it uses XS. if we went to a pure perl implementation we would loose the speed and duplicate something like HTML::Entities (although it would be easier to solve the problem). that said, perhaps there is C code in utf8.c (or wherever) that we can steal to make life easier. we probably need to get someone involved who understands the issues better than I do :) --Geoff