Andreas Funke wrote:
Hello,

i'm pretty new in using xerces-c.
I'd like to use it in a project, that should be able to handle with xml files and also with html files.
Xerces-C is an XML parser, and many HTML documents are not well-formed XML.

Can anybody tell me, what have to be done for this, or a reference side, where i can find samples for using Xerces to parse html files? The standart xerces samples, that come with the installation, just semms to handle with xml. Is it a good choice to parse html files with a dom parser at all, or would it be better to use sax for that? I know, there is this problem with the wellformig of html, and so i wonder, if i shouldn't use another, more tolerant parser.
You need an HTML parser, or something like NekoHTML (http://people.apache.org/~andyc/neko/doc/html/), which attempts to turn HTML into well-formed XML.

Conforming XML parsers are not allowed to ignore well-formedness errors.

Dave

Reply via email to