James Strachan wrote:Rather than using JTidy to parse HTML (which makes a DOM) you could use
NekoHTML which is-a SAX parser that can handle HTML. Then you don't need to
use a DOM.
Sorry to hijack a thread like this, but I was curious -- if you're building an in-memory representation of an XML document, is there still a compelling reason to use a SAX parser? Or should you just use DOM in that case.
james can probably give you a pretty definitive answer to this question but here's my two penneth.
i think that the answer about this depends on what in-memory representation you want. DOM is a generic representation. different kinds of xml (eg having different schemas) are represented using the same objects. this may be good or bad depending on the circumstances. if you're interested in general xml then a general representation is best. but there'
s more than DOM out there. there are several general representations (eg. dom4j) which offer more java-friendly APIs.
even when you're dealing with general representations, SAX (and therefore digester) can have advantages over DOM. with SAX it is easy to filter so that only the part of the object model you're interested in is created. digester has a rule that creates partial DOM object models which can be used in this way.
on the other hand, a very common use case is having a particular object model in mind which is represented by strongly typed java beans. in this case, though the mapping is to an in-memory object model, there is a considerable performance benefit (both speed and memory) in using SAX rather than DOM. there are a number of technologies (eg. castor, JAXB, betwixt) which do this - and digester is also commonly used for this purpose.
- robert
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]