Thanks for your help! I will give it a try. -Raj
-----Original Message----- From: David Rosenstrauch [mailto:[EMAIL PROTECTED]] Sent: Thursday, September 12, 2002 11:19 AM To: JDJList Subject: [jdjlist] Re: HTML Parser in Java At 11:02 AM 9/12/2002 -0700, you wrote: > >Hi, >I need to use an HTML parser in Java. I found APIs at SUN's website for >such a parser at: > >http://java.sun.com/j2se/1.3/docs/api/javax/swing/text/html/parser/package-summary.html > >Unfortunately, I did not find any examples on how to use these APIs. Could >anyone point to some >documentation for the above parser, or recommend another one? > >Thanks, >Raj I've never used the Sun HTML parser. And I'm not quite sure how you would use it. At first glance, it seems like you would just create an instance and then just call parser.parse(someReader). But actually you would need to subclass their parser in order to receive callbacks as the parsing is going on. That's a really poor design by the way. It would be much better if they just had you supply some object that implements a callback interface. But regardless of that, the thing that threw me is that you need to supply a DTD object when you construct the parser. I'm not sure why this is necessary, or how you should construct the DTD object it needs. There didn't seem to be any predefined ones in the JDK. (At least not that I could see in the documentation.) So personally I wouldn't use this class. And there's several other alternatives you could use: 1) Use JTidy (http://sourceforge.net/projects/jtidy). Although it was originally written to tidy up HTML, they now have support to build a DOM tree from HTML (while it tidies) as well as SAX support, which gets around the memory hogging of DOM. I did a project with Cocoon a while back, and Cocoon uses JTidy to parse HTML. Works really nicely. 2) HTMLStreamTokenizer (http://www.do.org/products/parser/). 3) Use the JavaCC parser generator with an HTML grammar (http://www.cobase.cs.ucla.edu/pub/javacc/#Hsection) to generate an HTML parser. I'd try option 1 if I were you. Probably has the smallest learning curve. HTH. DR To change your JDJList options, please visit: http://www.sys-con.com/java/list.cfm To change your JDJList options, please visit: http://www.sys-con.com/java/list.cfm
