You are correct, mostly. In DOM, the parser parses the entire document into memory, handing you back a more bulky representation of the document (XML elements replaced by Objects holding objects). Then, if you need all the fields, you have to navigate through the DOM tree again, which can be thought of as a double parse, if a bit oversimplistic
In SAX, you skip the double parse, because you tell SAX what you need, and SAX will simply call you when it reaches certain elements. However, SAX is a one time deal, so you have to set up triggers for all of the fields you MIGHT be interested in up front. For a full parse of the document, you scan it once, and no in-memory representation, unless you create one. In XPP (XML Pull Parser), the idea is like SAX, where no in-memory model is kept, and the code is basically scanning through the XML. However, As Anne stated, you can stop in the middle of a parse in XPP, and continue later, or start over, or whatever. Also, most XPP parsers throw away some of the XML information, like extra whitespace in order to gain more performance. If you have to scan the entire XML feed, XPP is still faster, because it throws away information, but your most pronounced speedup in XPP is if you do a conditional partial parse of the doc (It's much harder, if possible at all to do a conditional partial parse using SAX.) So, DOM is memory and 1+ scan, all XML entities (once by the parser, more by your app) SAX is no memory and 1 scan, all XML entities XPP is no memory and 0-1 scan , not all XML entities (your app scans as it desires) Examples of things XPP throws away: <jim> <brain>Hi there</brain> </jim> In DOM and SAX, the whitespace between jim and brain is represented in the model, because it might be necessary, but in XPP, the document gets represented as: <jim><brain>Hi there</brain></jim> Reason: XPP is tuned for SOAP and structured XML work, where the whitespace and CRLF marks can be assumed to be there for prettiness only, and have no code value. Jim Jim Brain, [EMAIL PROTECTED] "Researching tomorrow's decisions today." (319) 369-2070 (work) SYSTEMS ARCHITECT, ITS, AEGON FINANCIAL PARTNERS -----Original Message----- From: Ricky Ho [mailto:[EMAIL PROTECTED]] Sent: Thursday, February 20, 2003 9:57 AM To: Anne Thomas Manes Cc: [EMAIL PROTECTED] Subject: RE: Why Pull-Parser faster ? Thanks Annie, but it is still unclear to me why Pull-Parser is faster when the application take control. Is it because ... 1) Less work being done, or 2) Same work being done using a more efficient mechanism After reading the article, I don't think "Pull" is using a more efficient mechanism than "SAX". The only possibility is potentially less work being done because the application is in control. In other words, application can decide to stop after parsing the information it need so it can skip the scanning of later elements. Is this the only reason ? I know the result show that. But the article hasn't explained the theory behind. Best regards, Ricky At 12:35 AM 2/11/2003 -0500, Anne Thomas Manes wrote: >The main difference between SAX and Pull is in who controls the process. SAX >is event driven; Pull is application driven. > >This article goes into much more detail: >http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-xmljava3.html? > >Anne > > > -----Original Message----- > > From: Ricky Ho [mailto:[EMAIL PROTECTED]] > > Sent: Monday, February 10, 2003 10:02 PM > > To: Anne Thomas Manes > > Subject: RE: Why Pull-Parser faster ? > > > > > > I don't see they are giving an introduction of how Pull Parser > > works. The > > programming model seems to be similar to a SAX parser except the > > application make an explicit "next()" call to get a token by > > token. Why is > > this faster than a SAX parser ? > > > > Best regards, > > Ricky > > > > At 02:04 PM 2/10/2003 -0500, you wrote: > > >Here's a really nice, simple description of pull parsing: > > >http://www.extreme.indiana.edu/xgws/xsoap/xpp/ > > > > > >This article compares the various types of parsing: > > >http://www-106.ibm.com/developerworks/xml/library/x-injava/index.html > > > > > >Pull parsing essentially tokenizes the XML stream. Then you can just grab > > >what you need when you need it. > > > > > >Systinet developed its own pull parser. They started with XPP, but found > > >that it didn't do what they needed, so they developed their own from > > >scratch. Zdenek would be happy to provide you with more information, I'm > > >sure. > > > > > >Anne > > > > > > > -----Original Message----- > > > > From: Ricky Ho [mailto:[EMAIL PROTECTED]] > > > > Sent: Monday, February 10, 2003 12:45 PM > > > > To: [EMAIL PROTECTED] > > > > Subject: Why Pull-Parser faster ? > > > > > > > > > > > > Anne, > > > > > > > > I understand DOM parser which read the whole XML into a memory and > > > > construct a Tree that you can manipulate. The downside is the > > > > application > > > > have to wait until the whole XML string is digested. Also > > the whole tree > > > > can take up a lot of memory. > > > > > > > > I also understand SAX which treat the XML document as a character > > > > stream. Once it hit certain recognized "string patterns", it > > > > will callback > > > > the application code. The downside is it only scan the XML document > > > > once. If you want to rescan the string multiple times. > > > > > > > > I heard about Pull-Parser but haven't look into any detail. > > Can you give > > > > me a summary intro on what is "pull parser", how it works and > > why is it > > > > faster ? > > > > > > > > A common technique that I use is to use SAX to construct a highly > > > > condensed > > > > Tree (filter out all unneeded elements). And then manipulate this much > > > > smaller tree. How is this compared with Pull Parser ? > > > > > > > > Best regards, > > > > Ricky > > > > > > > > At 11:07 AM 2/10/2003 -0500, you wrote: > > > > >Both sides. (you have to parse the message on both sides) > > There are other > > > > >issues that affect performance and (even more so) scalability -- > > > > especially > > > > >on the server -- such as lifecycle management. But these other > > > > performance > > > > >issues are negligible next to parsing. > > > > > > > > > >We had another discussion on this list [1] recently about > > > > performance. The > > > > >JAX-RPC spec forces the use of SAX, which isn't the most > > efficient way to > > > > >parse structured messages. > > > > > > > > > >[1] http://marc.theaimsgroup.com/?l=axis-user&m=104429792424850&w=2 > > > > > > > > > >Anne > > > > > > > > > > > -----Original Message----- > > > > > > From: Lu�s Fraga [mailto:[EMAIL PROTECTED]] > > > > > > Sent: Monday, February 10, 2003 10:16 AM > > > > > > To: [EMAIL PROTECTED] > > > > > > Subject: Re: Axis performance in compare with XRPC (reference > > > > > > implementation from SUN)! > > > > > > > > > > > > > > > > > > Hi Anne! > > > > > > > > > > > > The issues you are referring to concern mainly server side, > > > > client side > > > > > > or both? > > > > > > > > > > > > Thanks for any comments, > > > > > > Lu�s > > > > > > > > > > > > Anne Thomas Manes wrote: > > > > > > > > > > > > >A lot of the performance differences come from the parsing > > > > > > technology used. > > > > > > >GLUE uses Electric XML, which is a highly optimized JDOM-like > > > > > > parser. WASP > > > > > > >uses a pull parser. > > > > > > > > > > > > > > > > > > > > > > > > > > > >>-----Original Message----- > > > > > > >>From: Lu�s Fraga [mailto:[EMAIL PROTECTED]] > > > > > > >>Sent: Monday, February 10, 2003 7:20 AM > > > > > > >>To: [EMAIL PROTECTED] > > > > > > >>Subject: Re: Axis performance in compare with XRPC (reference > > > > > > >>implementation from SUN)! > > > > > > >> > > > > > > >> > > > > > > >>10-15x faster than Axis!!??? I will have to check that! > > > > > > >>What are your toughts regarding these performance issues? > > > > > > >> > > > > > > >> Lu�s > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >>Anne Thomas Manes wrote: > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >>>You'll find Axis performance much faster than Sun's > > > > JAX-RPC RI. It also > > > > > > >>>provides much easier tools. But the commercial > > > > implementations are much > > > > > > >>>faster and easier still. There are free versions available for > > > > > > >>> > > > > > > >>> > > > > > > >>both GLUE and > > > > > > >> > > > > > > >> > > > > > > >>>WASP. GLUE Standard is always free. The footprint is > > tiny, too. See > > > > > > >>>http://www.themindelectric.com. WASP is always free for > > > > > > >>> > > > > > > >>> > > > > > > >>development, and you > > > > > > >> > > > > > > >> > > > > > > >>>can get a free deployment license for a single CPU > > > > (multi-CPUs require > > > > > > >>>payment). See http://www.systinet.com. These two > > implementations > > > > > > >>> > > > > > > >>> > > > > > > >>offer the > > > > > > >> > > > > > > >> > > > > > > >>>best performance (10-15x faster than Axis) and the best tools. > > > > > > >>> > > > > > > >>>Anne > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>>>-----Original Message----- > > > > > > >>>>From: Armond Avanes [mailto:[EMAIL PROTECTED]] > > > > > > >>>>Sent: Saturday, February 08, 2003 2:45 AM > > > > > > >>>>To: [EMAIL PROTECTED] > > > > > > >>>>Subject: Axis performance in compare with XRPC (reference > > > > > > implementation > > > > > > >>>> > > > > > > >>>> > > > > > > >>>>from SUN)! > > > > > > >>> > > > > > > >>> > > > > > > >>>>Hi SOAP Folks, > > > > > > >>>> > > > > > > >>>>Anyone has compared the performance of these two > > implementations > > > > > > >>>>(Apache's Axis and Sun's XRPC) in a real environment?! > > > > > > >>>> > > > > > > >>>>FYI, I'm in the phase of replacing the communication > > > > layer (which is > > > > > > >>>>reference implementation of SUN) of the application, I'm > > > > working on, > > > > > > >>>>with Axis. Sun's implementation generates so many classes and > > > > > > uses many > > > > > > >>>>libraries so causes the whole result (application jars, > > > > ear's, war's, > > > > > > >>>>etc) to be very huge. Another side effect is the > > build time of the > > > > > > >>>>project, which is really much! > > > > > > >>>> > > > > > > >>>>I need all your ideas/suggestions/comments in this regard. > > > > > > >>>>What problems may I get into with Axis? How's the performance > > > > > > in compare > > > > > > >>>>with other implementations? Is there any better > > > > alternative than Axis > > > > > > >>>>(free for sure!) And so on... > > > > > > >>>> > > > > > > >>>>Best Regards, > > > > > > >>>>Armond > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >>> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
