If you want a full SAX interface for parsing HTML I should have a working product
next week. The HTML parser I have now supports a SAX-like interface, and I've been
working on converting it over to true SAX. This will be available as free for
non-commercial use, fee for commercial use. I'll announce it here when it's
available.

You can also run your documents through Tidy or JTidy to clean up the HTML and
convert it to XHTML, then parse those through a standard SAX XML parser. The only
advantage my HTML parser with a SAX interface has over this approach is speed.

  - Dennis

"Ryan, Jeff J (Comm Lines, IT)" wrote:

> Hi,
>
> I've tried using a SAX Parser to pull some pertinent information from a HTML
> file and run into trouble because the file was generated from Microsoft Word
> and is not well formed.
>
> I'd prefer to use SAX if possible.  Any ideas on how to make it more
> forgiving?  I can't guarantee that the files will be well formed because
> they are user generated.
>
> My second choice would be to use a HTML parser or to write something custom.
> Any ideas/experience with commercial HTML parsers?
>
> Thanks, Jeff
>
> "Men occasionally stumble over the truth, but most of them pick themselves
> up and hurry off as if nothing had happened." - Winston Churchill
>
> Jeff Ryan
> Hartford Financial Services
> [EMAIL PROTECTED]
> (860)547-4237
>
> ===========================================================================
> To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
> For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
> Some relevant FAQs on JSP/Servlets can be found at:
>
>  http://java.sun.com/products/jsp/faq.html
>  http://www.esperanto.org.nz/jsp/jspfaq.html
>  http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
>  http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets

===========================================================================
To unsubscribe: mailto [EMAIL PROTECTED] with body: "signoff JSP-INTEREST".
For digest: mailto [EMAIL PROTECTED] with body: "set JSP-INTEREST DIGEST".
Some relevant FAQs on JSP/Servlets can be found at:

 http://java.sun.com/products/jsp/faq.html
 http://www.esperanto.org.nz/jsp/jspfaq.html
 http://www.jguru.com/jguru/faq/faqpage.jsp?name=JSP
 http://www.jguru.com/jguru/faq/faqpage.jsp?name=Servlets

Reply via email to