At 11:02 AM 9/12/2002 -0700, you wrote:
>
>Hi,
>I need to use an HTML parser in Java. I found APIs at SUN's website for 
>such a parser at:
> 
>http://java.sun.com/j2se/1.3/docs/api/javax/swing/text/html/parser/package-summary.html
>
>Unfortunately, I did not find any examples on how to use these APIs. Could 
>anyone point to some
>documentation for the above parser, or recommend another one?
>
>Thanks,
>Raj



I've never used the Sun HTML parser.  And I'm not quite sure how you would 
use it.

At first glance, it seems like you would just create an instance and then 
just call parser.parse(someReader).  But actually you would need to 
subclass their parser in order to receive callbacks as the parsing is going 
on.  That's a really poor design by the way.  It would be much better if 
they just had you supply some object that implements a callback interface.

But regardless of that, the thing that threw me is that you need to supply 
a DTD object when you construct the parser.  I'm not sure why this is 
necessary, or how you should construct the DTD object it needs.  There 
didn't seem to be any predefined ones in the JDK.  (At least not that I 
could see in the documentation.)


So personally I wouldn't use this class.  And there's several other 
alternatives you could use:

1) Use JTidy (http://sourceforge.net/projects/jtidy).  Although it was 
originally written to tidy up HTML, they now have support to build a DOM 
tree from HTML (while it tidies) as well as SAX support, which gets around 
the memory hogging of DOM.  I did a project with Cocoon a while back, and 
Cocoon uses JTidy to parse HTML.  Works really nicely.

2) HTMLStreamTokenizer (http://www.do.org/products/parser/).

3) Use the JavaCC parser generator with an HTML grammar 
(http://www.cobase.cs.ucla.edu/pub/javacc/#Hsection) to generate an HTML 
parser.


I'd try option 1 if I were you.  Probably has the smallest learning curve.


HTH.

DR



To change your JDJList options, please visit: http://www.sys-con.com/java/list.cfm

Reply via email to