Stas Bekman wrote:
> 
> Geoffrey Young wrote:
> 
> >>However I'm not sure your patch does the right thing re UTF-8, unless there's
> >>some magic involved that I'm not seeing :-/ I'm no expert on how to deal with
> >>UTF-8 in C (or even in Perl) but it looks like you're only addressing 8bit
> >>encodings.
> >>
> >
> >
> > ok, after some to and fro with robin over on #modperl it looks like we discovered 
>a few
> > things...
> >
> > first, Apache::Util is not UTF-8 compliant, since it currently mangles C strings
> > byte-by-byte, which introduces the possibility that all or part of a 2-byte 
>character
> > could be mangled.
> >
> > second, the patch follows suit and expands the range of 1-byte characters it 
>mangles,
> > which makes it more non-UTF-8 friendly.
> >
> > so, basically what we're thinking is that the new Apache::Util is more secure for
> > non-UTF-8 encodings, while more broken for UTF-8.  but UTF-8 is unusable with 
>Apache::Util
> > in either case, so the patch is probably a good thing.
> >
> > other ideas/eyeballs are welcome here, since we've just been going over the spec 
>and
> > making some conjectures - neither of us is an expert here by any means.
> >
> > once other people chime in, we can whip up a doc patch for Apache::Util as well.
> 
> Since Apache::Util wasn't ported to mod_perl 2.0 and I was thinking to
> do that at some point. So we can work on the Apache::Util for 2.0 and
> then backport it to 1.x. Sounds like a more promising scenario.

however it comes about is fine, I guess.  however, if Apache::Util in 1.3 is left
un-patched then we're kinda giving a false impression that calling
Apache::Util::escape_html() is sufficient to thwart CSS attacks when it really only 
keeps
all but the most clever away.

> 
> So what spec are you working with?

robin and I were reading

http://www.cl.cam.ac.uk/~mgk25/unicode.html

but there may be others.

> 
> Can we just reap the functionality from some Perl core module in
> bleadperl that does it right?

well, the problem that robin and I were contemplating is that Apache::Util is supposed 
to
be fast because it uses XS.  if we went to a pure perl implementation we would loose 
the
speed and duplicate something like HTML::Entities (although it would be easier to solve
the problem).

that said, perhaps there is C code in utf8.c (or wherever) that we can steal to make 
life
easier.  we probably need to get someone involved who understands the issues better 
than I
do :)

--Geoff

Reply via email to