just throw SAXException to signal when you want to parse to stop.
use wrapped Exception to differentiate termination signal from legitimate parse errors
same effect
On Thursday, February 20, 2003, at 08:11 AM, Brain, Jim wrote:
You are correct, mostly.
In DOM, the parser parses the entire document into memory, handing you back
a more bulky representation of the document (XML elements replaced by
Objects holding objects). Then, if you need all the fields, you have to
navigate through the DOM tree again, which can be thought of as a double
parse, if a bit oversimplistic
In SAX, you skip the double parse, because you tell SAX what you need, and
SAX will simply call you when it reaches certain elements. However, SAX is
a one time deal, so you have to set up triggers for all of the fields you
MIGHT be interested in up front. For a full parse of the document, you scan
it once, and no in-memory representation, unless you create one.
In XPP (XML Pull Parser), the idea is like SAX, where no in-memory model is
kept, and the code is basically scanning through the XML. However, As Anne
stated, you can stop in the middle of a parse in XPP, and continue later, or
start over, or whatever. Also, most XPP parsers throw away some of the XML
information, like extra whitespace in order to gain more performance. If
you have to scan the entire XML feed, XPP is still faster, because it throws
away information, but your most pronounced speedup in XPP is if you do a
conditional partial parse of the doc (It's much harder, if possible at all
to do a conditional partial parse using SAX.)
So,
DOM is memory and 1+ scan, all XML entities (once by the parser, more by
your app)
SAX is no memory and 1 scan, all XML entities
XPP is no memory and 0-1 scan , not all XML entities (your app scans as it
desires)
Examples of things XPP throws away:
<jim>
<brain>Hi there</brain>
</jim>
In DOM and SAX, the whitespace between jim and brain is represented in the
model, because it might be necessary, but in XPP, the document gets
represented as:
<jim><brain>Hi there</brain></jim>
Reason: XPP is tuned for SOAP and structured XML work, where the whitespace
and CRLF marks can be assumed to be there for prettiness only, and have no
code value.
Jim
Jim Brain, [EMAIL PROTECTED]
"Researching tomorrow's decisions today."
(319) 369-2070 (work)
SYSTEMS ARCHITECT, ITS, AEGON FINANCIAL PARTNERS
-----Original Message-----
From: Ricky Ho [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 20, 2003 9:57 AM
To: Anne Thomas Manes
Cc: [EMAIL PROTECTED]
Subject: RE: Why Pull-Parser faster ?
Thanks Annie, but it is still unclear to me why Pull-Parser is faster when
the application take control.
Is it because ...
1) Less work being done, or
2) Same work being done using a more efficient mechanism
After reading the article, I don't think "Pull" is using a more efficient
mechanism than "SAX".
The only possibility is potentially less work being done because the
application is in control. In other words, application can decide to stop
after parsing the information it need so it can skip the scanning of later
elements.
Is this the only reason ?
I know the result show that. But the article hasn't explained the theory
behind.
Best regards,
Ricky
At 12:35 AM 2/11/2003 -0500, Anne Thomas Manes wrote:
The main difference between SAX and Pull is in who controls the process.
SAXis event driven; Pull is application driven.This article goes into much more detail: http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-xmljava3.html? Anne-----Original Message----- From: Ricky Ho [mailto:[EMAIL PROTECTED]] Sent: Monday, February 10, 2003 10:02 PM To: Anne Thomas Manes Subject: RE: Why Pull-Parser faster ? I don't see they are giving an introduction of how Pull Parser works. The programming model seems to be similar to a SAX parser except the application make an explicit "next()" call to get a token by token. Why is this faster than a SAX parser ? Best regards, Ricky At 02:04 PM 2/10/2003 -0500, you wrote:Here's a really nice, simple description of pull parsing:
http://www.extreme.indiana.edu/xgws/xsoap/xpp/
This article compares the various types of parsing:
http://www-106.ibm.com/developerworks/xml/library/x-injava/ index.html
Pull parsing essentially tokenizes the XML stream. Then you can just
grabwhat you need when you need it. Systinet developed its own pull parser. They started with XPP, butfoundthat it didn't do what they needed, so they developed their own from scratch. Zdenek would be happy to provide you with more information,I'msure. Anne-----Original Message----- From: Ricky Ho [mailto:[EMAIL PROTECTED]] Sent: Monday, February 10, 2003 12:45 PM To: [EMAIL PROTECTED] Subject: Why Pull-Parser faster ? Anne, I understand DOM parser which read the whole XML into a memory and construct a Tree that you can manipulate. The downside is the application have to wait until the whole XML string is digested. Alsothe whole treecan take up a lot of memory.
I also understand SAX which treat the XML document as a character
stream. Once it hit certain recognized "string patterns", it
will callback
the application code. The downside is it only scan the XML document
once. If you want to rescan the string multiple times.
I heard about Pull-Parser but haven't look into any detail.
Can you giveme a summary intro on what is "pull parser", how it works andwhy is itfaster ? A common technique that I use is to use SAX to construct a highly condensed Tree (filter out all unneeded elements). And then manipulate thismuchsmaller tree. How is this compared with Pull Parser ? Best regards, Ricky At 11:07 AM 2/10/2003 -0500, you wrote:Both sides. (you have to parse the message on both sides)There are otherissues that affect performance and (even more so) scalability --especiallyon the server -- such as lifecycle management. But these otherperformanceissues are negligible next to parsing. We had another discussion on this list [1] recently aboutperformance. TheJAX-RPC spec forces the use of SAX, which isn't the mostefficient way toparse structured messages.
[1] http://marc.theaimsgroup.com/?l=axis-user&m=104429792424850&w=2
Anne
-----Original Message----- From: Lu�s Fraga [mailto:[EMAIL PROTECTED]] Sent: Monday, February 10, 2003 10:16 AM To: [EMAIL PROTECTED] Subject: Re: Axis performance in compare with XRPC (reference implementation from SUN)! Hi Anne! The issues you are referring to concern mainly server side,client sideor both? Thanks for any comments, Lu�s Anne Thomas Manes wrote:A lot of the performance differences come from the parsingtechnology used.GLUE uses Electric XML, which is a highly optimized JDOM-likeparser. WASPuses a pull parser.-----Original Message----- From: Lu�s Fraga [mailto:[EMAIL PROTECTED]] Sent: Monday, February 10, 2003 7:20 AM To: [EMAIL PROTECTED] Subject: Re: Axis performance in compare with XRPC (reference implementation from SUN)! 10-15x faster than Axis!!??? I will have to check that! What are your toughts regarding these performance issues? Lu�s Anne Thomas Manes wrote:You'll find Axis performance much faster than Sun'sJAX-RPC RI. It alsoprovides much easier tools. But the commercialimplementations are muchfaster and easier still. There are free versions availableforboth GLUE andWASP. GLUE Standard is always free. The footprint istiny, too. Seehttp://www.themindelectric.com. WASP is always free fordevelopment, and youcan get a free deployment license for a single CPU(multi-CPUs requirepayment). See http://www.systinet.com. These twoimplementationsoffer thebest performance (10-15x faster than Axis) and the besttools.Anne-----Original Message----- From: Armond Avanes [mailto:[EMAIL PROTECTED]] Sent: Saturday, February 08, 2003 2:45 AM To: [EMAIL PROTECTED] Subject: Axis performance in compare with XRPC (referenceimplementationfrom SUN)!Hi SOAP Folks, Anyone has compared the performance of these twoimplementations(Apache's Axis and Sun's XRPC) in a real environment?! FYI, I'm in the phase of replacing the communicationlayer (which isreference implementation of SUN) of the application, I'mworking on,with Axis. Sun's implementation generates so many classesanduses manylibraries so causes the whole result (application jars,ear's, war's,etc) to be very huge. Another side effect is thebuild time of theproject, which is really much! I need all your ideas/suggestions/comments in this regard. What problems may I get into with Axis? How's theperformancein comparewith other implementations? Is there any betteralternative than Axis(free for sure!) And so on... Best Regards, Armond
