Re: Why Pull-Parser faster ?

Sasha Lerner Thu, 20 Feb 2003 10:47:47 -0800

if you want to do partial parse with SAX parser,
just throw SAXException to signal when you want to parse to stop.
use wrapped Exception to differentiate termination signal from legitimate parse errors
same effect

On Thursday, February 20, 2003, at 08:11 AM, Brain, Jim wrote:

You are correct, mostly.

In DOM, the parser parses the entire document into memory, handing you back
a more bulky representation of the document (XML elements replaced by
Objects holding objects). Then, if you need all the fields, you have to
navigate through the DOM tree again, which can be thought of as a double
parse, if a bit oversimplistic

In SAX, you skip the double parse, because you tell SAX what you need, and
SAX will simply call you when it reaches certain elements. However, SAX is
a one time deal, so you have to set up triggers for all of the fields you
MIGHT be interested in up front. For a full parse of the document, you scan
it once, and no in-memory representation, unless you create one.

In XPP (XML Pull Parser), the idea is like SAX, where no in-memory model is
kept, and the code is basically scanning through the XML. However, As Anne
stated, you can stop in the middle of a parse in XPP, and continue later, or
start over, or whatever. Also, most XPP parsers throw away some of the XML
information, like extra whitespace in order to gain more performance. If
you have to scan the entire XML feed, XPP is still faster, because it throws
away information, but your most pronounced speedup in XPP is if you do a
conditional partial parse of the doc (It's much harder, if possible at all
to do a conditional partial parse using SAX.)

So,
DOM is memory and 1+ scan, all XML entities (once by the parser, more by
your app)
SAX is no memory and 1 scan, all XML entities
XPP is no memory and 0-1 scan , not all XML entities (your app scans as it
desires)

Examples of things XPP throws away:

<jim>

<brain>Hi there</brain>

</jim>

In DOM and SAX, the whitespace between jim and brain is represented in the
model, because it might be necessary, but in XPP, the document gets
represented as:

<jim><brain>Hi there</brain></jim>

Reason: XPP is tuned for SOAP and structured XML work, where the whitespace
and CRLF marks can be assumed to be there for prettiness only, and have no
code value.

Jim

Jim Brain, [EMAIL PROTECTED]
"Researching tomorrow's decisions today."
(319) 369-2070 (work)
SYSTEMS ARCHITECT, ITS, AEGON FINANCIAL PARTNERS

-----Original Message-----
From: Ricky Ho [mailto:[EMAIL PROTECTED]]
Sent: Thursday, February 20, 2003 9:57 AM
To: Anne Thomas Manes
Cc: [EMAIL PROTECTED]
Subject: RE: Why Pull-Parser faster ?

Thanks Annie, but it is still unclear to me why Pull-Parser is faster when
the application take control.

Is it because ...
1) Less work being done, or
2) Same work being done using a more efficient mechanism

After reading the article, I don't think "Pull" is using a more efficient
mechanism than "SAX".
The only possibility is potentially less work being done because the
application is in control. In other words, application can decide to stop
after parsing the information it need so it can skip the scanning of later
elements.

Is this the only reason ?

I know the result show that. But the article hasn't explained the theory
behind.

Best regards,
Ricky

At 12:35 AM 2/11/2003 -0500, Anne Thomas Manes wrote:

The main difference between SAX and Pull is in who controls the process.
SAX
is event driven; Pull is application driven.
This article goes into much more detail:
http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-xmljava3.html?

Anne
-----Original Message-----
From: Ricky Ho [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 10, 2003 10:02 PM
To: Anne Thomas Manes
Subject: RE: Why Pull-Parser faster ?


I don't see they are giving an introduction of how Pull Parser
works.  The
programming model seems to be similar to a SAX parser except the
application make an explicit "next()" call to get a token by
token.  Why is
this faster than a SAX parser ?

Best regards,
Ricky

At 02:04 PM 2/10/2003 -0500, you wrote:
Here's a really nice, simple description of pull parsing:
http://www.extreme.indiana.edu/xgws/xsoap/xpp/

This article compares the various types of parsing:
http://www-106.ibm.com/developerworks/xml/library/x-injava/ index.html

Pull parsing essentially tokenizes the XML stream. Then you can just
grab
what you need when you need it.

Systinet developed its own pull parser. They started with XPP, but
found
that it didn't do what they needed, so they developed their own from
scratch. Zdenek would be happy to provide you with more information,
I'm
sure.

Anne
-----Original Message-----
From: Ricky Ho [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 10, 2003 12:45 PM
To: [EMAIL PROTECTED]
Subject: Why Pull-Parser faster ?


Anne,

I understand DOM parser which read the whole XML into a memory and
construct a Tree that you can manipulate.  The downside is the
application
have to wait until the whole XML string is digested.  Also
the whole tree
can take up a lot of memory.

I also understand SAX which treat the XML document as a character
stream. Once it hit certain recognized "string patterns", it
will callback
the application code. The downside is it only scan the XML document
once. If you want to rescan the string multiple times.

I heard about Pull-Parser but haven't look into any detail.
Can you give
me a summary intro on what is "pull parser", how it works and
why is it
faster ?

A common technique that I use is to use SAX to construct a highly
condensed
Tree (filter out all unneeded elements). And then manipulate this
much
smaller tree.  How is this compared with Pull Parser ?

Best regards,
Ricky

At 11:07 AM 2/10/2003 -0500, you wrote:
Both sides. (you have to parse the message on both sides)
There are other
issues that affect performance and (even more so) scalability --
especially
on the server -- such as lifecycle management. But these other
performance
issues are negligible next to parsing.

We had another discussion on this list [1] recently about
performance. The
JAX-RPC spec forces the use of SAX, which isn't the most
efficient way to
parse structured messages.

[1] http://marc.theaimsgroup.com/?l=axis-user&m=104429792424850&w=2

Anne
-----Original Message-----
From: Lu�s Fraga [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 10, 2003 10:16 AM
To: [EMAIL PROTECTED]
Subject: Re: Axis performance in compare with XRPC (reference
implementation from SUN)!


Hi Anne!

The issues you are referring to concern mainly server side,
client side
or both?

Thanks for any comments,
    Lu�s

Anne Thomas Manes wrote:
A lot of the performance differences come from the parsing
technology used.
GLUE uses Electric XML, which is a highly optimized JDOM-like
parser. WASP
uses a pull parser.
-----Original Message-----
From: Lu�s Fraga [mailto:[EMAIL PROTECTED]]
Sent: Monday, February 10, 2003 7:20 AM
To: [EMAIL PROTECTED]
Subject: Re: Axis performance in compare with XRPC (reference
implementation from SUN)!


10-15x faster than Axis!!??? I will have to check that!
What are your toughts regarding these performance issues?

   Lu�s



Anne Thomas Manes wrote:
You'll find Axis performance much faster than Sun's
JAX-RPC RI. It also
provides much easier tools. But the commercial
implementations are much
faster and easier still. There are free versions available
for
both GLUE and
WASP. GLUE Standard is always free. The footprint is
tiny, too. See
http://www.themindelectric.com. WASP is always free for
development, and you
can get a free deployment license for a single CPU
(multi-CPUs require
payment). See http://www.systinet.com. These two
implementations
offer the
best performance (10-15x faster than Axis) and the best
tools.
Anne
-----Original Message-----
From: Armond Avanes [mailto:[EMAIL PROTECTED]]
Sent: Saturday, February 08, 2003 2:45 AM
To: [EMAIL PROTECTED]
Subject: Axis performance in compare with XRPC (reference
implementation
from SUN)!
Hi SOAP Folks,

Anyone has compared the performance of these two
implementations
(Apache's Axis and Sun's XRPC) in a real environment?!

FYI, I'm in the phase of replacing the communication
layer (which is
reference implementation of SUN) of the application, I'm
working on,
with Axis. Sun's implementation generates so many classes
and
uses many
libraries so causes the whole result (application jars,
ear's, war's,
etc) to be very huge. Another side effect is the
build time of the
project, which is really much!

I need all your ideas/suggestions/comments in this regard.
What problems may I get into with Axis? How's the
performance
in compare
with other implementations? Is there any better
alternative than Axis
(free for sure!) And so on...

Best Regards,
Armond

Re: Why Pull-Parser faster ?

Reply via email to