"Paul Green" <[EMAIL PROTECTED]> wrote on 10/05/2005 08:14:33 AM:

> Hi,
> 
> I read recently (in Elliotte Rusty Harold's "Processing XML with Java")
> that Xerces-J is capable of parsing an HTML document into a DOM tree.
> Xerces-J 1.4.4 does indeed contain an "html" package with all the 
required
> interfaces to represent an HTML document in DOM form. However, I have 
been
> unable to determine how to set up the DOM parser to create such a 
document,
> despite an extensive search. I would be grateful if someone could point 
me
> at any documentation, and particularly code examples, describing how to 
do
> this. Alternatively, if i'm barking up the wrong tree, which tree should 
I
> go and bark up?

If you're just starting I would avoid the "html.dom" package altogether. 
Some users report that the implementation [1] is horribly broken. I 
haven't verified that but what I do know is that the code hasn't had an 
active maintainer in years. Try having a look at NekoHTML [2]. It's an 
HTML parser and it plugs into Xerces-J 2.x allowing you to use XML APIs 
(SAX and DOM) for reading HTML documents.

> Regards,
> 
> Paul Green

[1] http://issues.apache.org/jira/browse/XERCESJ-890
[2] http://www.apache.org/~andyc/neko/doc/index.html

Michael Glavassevich
XML Parser Development
IBM Toronto Lab
E-mail: [EMAIL PROTECTED]
E-mail: [EMAIL PROTECTED]


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to