Hi, lxml parses the html to an xml ElementTree structure. It is also a validating parser, so a restrictived DTD could be provided to reject scripts. Or the tree could just be searched.
Lenard Quoting René Dudfield <ren...@gmail.com>: > yeah should be mostly simple... > > the website also uses some stuff to filter out things like javascript. > Hopefully there is something similar available for python now. Does lxml > support that? Failing that, will have to convert one of the ones from php. > feedparser in python is pretty good for that... however it still has some > problems. > > It's a must for user submitted website content, no matter the markup > language. > > cu, > > > > > On Mon, Apr 27, 2009 at 7:28 AM, Lenard Lindstrom <le...@telus.net> wrote: > > > Sanitising will be simple. I have tried lxml. Of course there is also > > beautifulsoup. Another issue is maintaining consistently across pages. > Using > > <h..> tags doesn't work. Remembering what header level to use when is > > bothersome. If new, more descriptive, header tags could be added that > would > > be great. And a preview function. > > > > Lenard > > > > René Dudfield wrote: > > > >> Hi, > >> > >> I suggest using the current one - rewritten in python, and fixing that > >> bug. I think that's the only code mangling bug it has? > >> > >> Yeah, the code in the wiki is probably best described as non-strict > >> html... or just html... which is not strict itself. The wiki does some > >> sanitising on the html after entry. It's only a few lines of code to add > a > >> gui editor like tinymce... so we could add that for those who don't want > to > >> use markup. > >> > >> cheers, > >> > >> > >> > >> On Mon, Apr 27, 2009 at 3:52 AM, Lenard Lindstrom > <le...@telus.net<mailto: > >> le...@telus.net>> wrote: > >> > >> Hi René, > >> > >> I don't know about Trac's tracking system but I find bugzilla > >> difficult as it requires report generation. How to get a listing > >> of recent bugs is not obvious. > >> > >> The html markup in the current wiki is not strict XHTML. We do > >> want the new site to generate properly formed XHTML pages, or am I > >> mistaken. Also Python code gets mangled, '<' replaced with '<' > >> for <code> sections. This is probably a data entry problem though. > >> But whatever wiki engine is chosen it has to handle this properly. > >> Trac does. Do any of the html tag wikis handle it right? What > >> alternate wiki do you suggest? > >> > >> Lenard > >> > >> > >> > >> René Dudfield wrote: > >> > >> hi, > >> > >> the main way we do bugs with pygame is through the mailing > >> list. The internet is a bug tracker. > >> > >> I wrote a blog post about the reasons why the mailing list is > >> good, and what 'the internet is a bug tracker' means: > >> http://renesd.blogspot.com/2008/02/bugs-search-not-categorise.html > >> > >> I personally think trac is a bit rubbish, and have been happy > >> with James Paige hosting bugzilla for us. > >> > >> > >> The current pygame wiki just uses simple html. So should be > >> fairly straight forward to convert... or we could just leave > >> it in html. Since most programmers know html anyway... way > >> more than trac markup. > >> > >> > >> > >> > >> > >> > > > -- Lenard Lindstrom <le...@telus.net>