"Paul Green" <[EMAIL PROTECTED]> wrote on 10/05/2005 08:14:33 AM:
> Hi, > > I read recently (in Elliotte Rusty Harold's "Processing XML with Java") > that Xerces-J is capable of parsing an HTML document into a DOM tree. > Xerces-J 1.4.4 does indeed contain an "html" package with all the required > interfaces to represent an HTML document in DOM form. However, I have been > unable to determine how to set up the DOM parser to create such a document, > despite an extensive search. I would be grateful if someone could point me > at any documentation, and particularly code examples, describing how to do > this. Alternatively, if i'm barking up the wrong tree, which tree should I > go and bark up? If you're just starting I would avoid the "html.dom" package altogether. Some users report that the implementation [1] is horribly broken. I haven't verified that but what I do know is that the code hasn't had an active maintainer in years. Try having a look at NekoHTML [2]. It's an HTML parser and it plugs into Xerces-J 2.x allowing you to use XML APIs (SAX and DOM) for reading HTML documents. > Regards, > > Paul Green [1] http://issues.apache.org/jira/browse/XERCESJ-890 [2] http://www.apache.org/~andyc/neko/doc/index.html Michael Glavassevich XML Parser Development IBM Toronto Lab E-mail: [EMAIL PROTECTED] E-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
