On Fri, 2002-04-19 at 18:55, James Strachan wrote:
> From: "Donald Ball" <[EMAIL PROTECTED]>
> > this is a slightly strange request, but bear with me. in our app, we're
> > letting users enter a subset of x/html (no script, embed, etc.). we're
> > parsing using the dom4j SAXReader with validation turned out. it all
> > works very well, thanks for the great tools. however, we'd now like to
> > relax the rules a bit and let users enter a subset of html.
> 
> i.e. you want to allow malformed XML? like
> 
> <html>
>     <body>
>         <p>
>         hello
>         <p>
>         <br>
>     </body>
> </html>

exactly

> There have been some developments lately of parsers that can accept HTML as
> input but behave like XML parsers and balance tags and so forth. So they
> behave just like a regular SAX parser.
> 
> A promising example is here:-
> 
> http://hotsax.sourceforge.net/

seems pretty dead to me, actually. that's a shame.

> Also Andy Clark from the Xerces team has put together a HTML parser called
> NekoHTML which looks really cool (and could well be a great event-based
> replacement for JTidy).
> 
> http://www.apache.org/~andyc/
> 
> I think its moving into the Xerces codebase soon.

this looks more promising, but we haven't tested our app with xerces-2
yet. i think this will be the long term solution though.

thanks for the tips. i've got things workly fairly well with jtidy now.
if anyone needs sample code for doing this, i'm happy to post it.

- donald

_______________________________________________
dom4j-user mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to