Kjetil Kjernsmo requested a front end to HTML::StripScripts that,
instead of returning HTML text, would return a LibXML Document or
DocumentFragment (ie a DOM tree).

I have released this as HTML::StripScripts::LibXML:
http://search.cpan.org/~drtech/HTML-StripScripts-LibXML-0.10/LibXML.pm

It handles messy HTML, strips out XSS, and gives you fine grained
control of the HTML/XML nodes that are returned.

If you are interested in this, please give it a try, and give me some
feedback about how to improve it, options to add etc.

The main question mark I have is what to do with encoding - suggestions
welcome.

Also see my question at Perl Monks:
http://www.perlmonks.org/index.pl?node_id=624334

thanks

Clint

On Tue, 2007-06-26 at 16:34 +0200, Kjetil Kjernsmo wrote:
> On Tuesday 26 June 2007 16:22, Clinton Gormley wrote:
> >  - used to strip XSS scripting from user submitted HTML
> 
> Ooooh, cool! I haven't found any modules that does that well enough.
> 
> >  - outputs valid HTML (cleans up nesting, context of tags etc)
> >
> >  - handles the exploits listed at http://ha.ckers.org/xss.html
> 
> 
> Great!
> 
> > I hope this helps others, and if anybody has any suggestions, please
> > feed them back to me
> 
> Actually, something I would feel would be very useful is if it could 
> return an XML::LibXML::DocumentFragment object. 
> 
> I tend to use XML::LibXML to parse user input and insert in the 
> document, which is then going through some XSLT, and since you've 
> allready parsed stuff, it seems like a waste to parse again.
> 
> So that's my feature request! :-) 
> 
> Cheers,
> 
> Kjetil

Reply via email to