On Fri, Nov 22, 2002 at 03:03:08PM -0800, Ian Clarke wrote:
> > Are they? The safest thing is certainly to block anything we don't
> > understand.
> 
> True, ideally we should be using something like JTidy to parse the HTML 
> to XML, then filter it, then spit it out to the browser.  The JTidy jar 
> is 142k, but this will slow things down.  Additionally, I think JTidy 
> relies on the XML stuff in post-1.1 versions of Java.
I did ask around for a good HTML parser months ago but got no response.
Can we bundle JTidy? Does it do CSS as well?
> 
> Basically, to be 100% safe, any given piece of HTML should be assumed 
> *insecure* unless we can affirm that it isn't.  Easier said than done 
> though.
Yeah. That's the point. It _should_ be simple enough, but tedious if we
can't use existing code, to just parse the HTML and only let through
what is known good. BUT we have to be really careful with I18N - a lot
of products have been caught out with that. Thus if we just want to
ignore I18N, we need to block all high characters in what should be the
main text. And if we want to support it, we need to parse it really
carefully i.e. we need to decode it all to UCS4/wchars before trying to
parse the HTML at all... but we would also need to make sure it's not a
threat to non-i18n-aware browsers, so... could be a lot of work. For
little obvious benefit, but it absolutely must be done before 1.0.
Volunteers with experience in this area would be greatly appreciated,
otherwise I'll end up doing it, at some point.
> 
> Ian.
> 
> -- 
> Ian Clarke                ian@[freenetproject.org|locut.us|cematics.com]
> Latest Project                                 http://cematics.com/kanzi
> Personal Homepage                                     http://locut.us/
> 

-- 
Matthew Toseland
toad at amphibian.dyndns.org
amphibian at users.sourceforge.net
Freenet/Coldstore open source hacker.
Employed full time by Freenet Project Inc. from 11/9/02 to 11/1/03
http://freenetproject.org/
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 
<https://emu.freenetproject.org/pipermail/devl/attachments/20021122/838e18df/attachment.pgp>

Reply via email to