Re: Why Pull-Parser faster ?

Dennis Sosnoski Thu, 20 Feb 2003 18:17:13 -0800

Hi Ricky,

This is getting pretty far off-topic for this list, but I'll give it one more shot. With a pull parser it's easy to create a wrapper method for handling elements that are similar. For instance, if you've got elements that have an int value as content, or other primitive types, or just plain text, you can define methods to take care of the details of processing these. Suppose B1 is an int in your example, and B2 is a String. Then using method wrappers for the pull parser interface (parseIntElement(name) to parse an element with the supplied name, returning the character data content as an int or throwing an exception on error; parseStringElement(name) to do the same thing with plain text content) I can write code like this to process the B element:

private XmlPullParser m_parser;

public B parseB() throws ParseException {
B b = new B();
b.b1 = parseIntElement("B1");
b.b2 = parseStringElement("B2");
return B;
}

If parseB returns successfully I know I've got a good B element. If one of the components is missing or invalid I'll get an exception, and not try to process garbage. If you do the same thing with SAX you need to either turn on parser validation (slow) or keep track of the child elements you've seen while processing the B element (complex) - otherwise, if you're missing a B1 child element you'll never know it.

Hope that helps clarify why I say it's simpler and faster to do this type of operation with a pull parser - it's not that the parser itself is necessarily faster, it's that the code that uses the parser can be simpler and faster.

- Dennis

Ricky Ho wrote:

Thanks Dennis, some more followup question ....

Is it true that Pull Parser is also a forward-only iterator ? The only difference that I see is still that Pull Parser allows you to stop anytime. Maybe lets go through an example as follows ...

Look at the following XML element, lets say I want to do some processing for the <B> tag and its child element <B1>, <B2>

<A>
<B>
<B1> ... </B1>
<B2> ... </B2>
</B>
<C>
....
</C>
</A>

In SAX, you do it this way ...

class MyHandler {
State state;

void startElement(String elementName, Attributes attr) {
if (elementName.equals("B")) {
B objectB = new B(attr);
state.save(objectB);
return;
}

if (elementName.equals("B1")) {
B1 objectB1 = new B1(attr);
state.save(objectB1);
return;
}

if (elementName.equals("B")) {
B2 objectB2 = new B2(attr);
state.save(objectB2);
return;
}
}

void endElement(String elementName) {
if (elementName.equals("B")) {
process(state);
return;
}
}
}

If I do it in XPP, I'll do it this way ...

class XmlPullHandler {
State state
public void handle () {
XmlPullParser xpp = factory.newPullParser();
xpp._setInput_(....);
int eventType = xpp.getEventType();
while (eventType != xpp.END_DOCUMENT) {
if(eventType == xpp.START_TAG) {
if (xpp.getName.equals("B")) {
B objectB = new B(attr);
state.save(objectB);
xpp.next();
if (xpp.getName().equals("B1")) {
B1 objectB1 = new B1(xpp.getAttributes());
state.save(objectB1);
}
xpp.next();
if (xpp.getName().equals("B2")) {
B2 objectB2 = new B2(xpp.getAttributes());
state.save(objectB2);
}
process(state);
return;
}
}
eventType = xpp.next();
}
}
}

Of course the code is structured differently and I agree that XPP is much easier to understand then SAX (which has code scattered).
But if you look at the run time thread execution, they are doing almost exactly the same thing, except that in XPP, it stop after enough data is gathered. And this is the only difference that I see.

Best regards,
Ricky

At 09:46 AM 2/20/2003 -0800, Dennis Sosnoski wrote:
I'd argue that a pull parser is more efficient than a "push" (SAX) parser for Web services-type applications because it allows you to make use of the inherent ordering of elements in your code. When you're processing a document with a push parser you need to keep all kinds of state information (in the case of JAX-RPC this includes things such as the object currently under construction, the current open element, etc. - if you write a custom deserializer for Axis you'll see what I mean). With a pull parser the state information is built into your code: You know the required element order, so you can just grab the contents of one element after another and process it directly. This gives code that's both simpler and faster than the event-driven code used with a push parser.

Think of a pull parser as a (forward-only) iterator for moving through the document. For applications involved with turning XML into objects (the core of Web services) this iterator makes conversions really simple. The fact that you can stop parsing if you want is also useful at times, but not that big of a deal (you can always stop a SAX parse by throwing an exception from your handler, after all).

- Dennis

Ricky Ho wrote:
Thanks Annie, but it is still unclear to me why Pull-Parser is faster when the application take control.

Is it because ...
1) Less work being done, or
2) Same work being done using a more efficient mechanism

After reading the article, I don't think "Pull" is using a more efficient mechanism than "SAX".
The only possibility is potentially less work being done because the application is in control. In other words, application can decide to stop after parsing the information it need so it can skip the scanning of later elements.

Is this the only reason ?

I know the result show that. But the article hasn't explained the theory behind.

Best regards,
Ricky

At 12:35 AM 2/11/2003 -0500, Anne Thomas Manes wrote:

The main difference between SAX and Pull is in who controls the process. SAX
is event driven; Pull is application driven.

This article goes into much more detail:
http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-xmljava3.html?

Anne

> -----Original Message-----
> From: Ricky Ho [mailto:[EMAIL PROTECTED]]
> Sent: Monday, February 10, 2003 10:02 PM
> To: Anne Thomas Manes
> Subject: RE: Why Pull-Parser faster ?
>
>
> I don't see they are giving an introduction of how Pull Parser
> works. The
> programming model seems to be similar to a SAX parser except the
> application make an explicit "next()" call to get a token by
> token. Why is
> this faster than a SAX parser ?
>
> Best regards,
> Ricky
>
> At 02:04 PM 2/10/2003 -0500, you wrote:
> >Here's a really nice, simple description of pull parsing:
> >http://www.extreme.indiana.edu/xgws/xsoap/xpp/
> >
> >This article compares the various types of parsing:
> >http://www-106.ibm.com/developerworks/xml/library/x-injava/index.html
> >
> >Pull parsing essentially tokenizes the XML stream. Then you can just grab
> >what you need when you need it.
> >
> >Systinet developed its own pull parser. They started with XPP, but found
> >that it didn't do what they needed, so they developed their own from
> >scratch. Zdenek would be happy to provide you with more information, I'm
> >sure.
> >
> >Anne
> >
> > > -----Original Message-----
> > > From: Ricky Ho [mailto:[EMAIL PROTECTED]]
> > > Sent: Monday, February 10, 2003 12:45 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Why Pull-Parser faster ?
> > >
> > >
> > > Anne,
> > >
> > > I understand DOM parser which read the whole XML into a memory and
> > > construct a Tree that you can manipulate. The downside is the
> > > application
> > > have to wait until the whole XML string is digested. Also
> the whole tree
> > > can take up a lot of memory.
> > >
> > > I also understand SAX which treat the XML document as a character
> > > stream. Once it hit certain recognized "string patterns", it
> > > will callback
> > > the application code. The downside is it only scan the XML document
> > > once. If you want to rescan the string multiple times.
> > >
> > > I heard about Pull-Parser but haven't look into any detail.
> Can you give
> > > me a summary intro on what is "pull parser", how it works and
> why is it
> > > faster ?
> > >
> > > A common technique that I use is to use SAX to construct a highly
> > > condensed
> > > Tree (filter out all unneeded elements). And then manipulate this much
> > > smaller tree. How is this compared with Pull Parser ?
> > >
> > > Best regards,
> > > Ricky
> > >
> > > At 11:07 AM 2/10/2003 -0500, you wrote:
> > > >Both sides. (you have to parse the message on both sides)
> There are other
> > > >issues that affect performance and (even more so) scalability --
> > > especially
> > > >on the server -- such as lifecycle management. But these other
> > > performance
> > > >issues are negligible next to parsing.
> > > >
> > > >We had another discussion on this list [1] recently about
> > > performance. The
> > > >JAX-RPC spec forces the use of SAX, which isn't the most
> efficient way to
> > > >parse structured messages.
> > > >
> > > >[1] http://marc.theaimsgroup.com/?l=axis-user&m=104429792424850&w=2 <http://marc.theaimsgroup.com/?l=axis-user&m=104429792424850&w=2>
> > > >
> > > >Anne
> > > >
> > > > > -----Original Message-----
> > > > > From: Luís Fraga [mailto:[EMAIL PROTECTED]]
> > > > > Sent: Monday, February 10, 2003 10:16 AM
> > > > > To: [EMAIL PROTECTED]
> > > > > Subject: Re: Axis performance in compare with XRPC (reference
> > > > > implementation from SUN)!
> > > > >
> > > > >
> > > > > Hi Anne!
> > > > >
> > > > > The issues you are referring to concern mainly server side,
> > > client side
> > > > > or both?
> > > > >
> > > > > Thanks for any comments,
> > > > > Luís
> > > > >
> > > > > Anne Thomas Manes wrote:
> > > > >
> > > > > >A lot of the performance differences come from the parsing
> > > > > technology used.
> > > > > >GLUE uses Electric XML, which is a highly optimized JDOM-like
> > > > > parser. WASP
> > > > > >uses a pull parser.
> > > > > >
> > > > > >
> > > > > >
> > > > > >>-----Original Message-----
> > > > > >>From: Luís Fraga [mailto:[EMAIL PROTECTED]]
> > > > > >>Sent: Monday, February 10, 2003 7:20 AM
> > > > > >>To: [EMAIL PROTECTED]
> > > > > >>Subject: Re: Axis performance in compare with XRPC (reference
> > > > > >>implementation from SUN)!
> > > > > >>
> > > > > >>
> > > > > >>10-15x faster than Axis!!??? I will have to check that!
> > > > > >>What are your toughts regarding these performance issues?
> > > > > >>
> > > > > >> Luís
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>Anne Thomas Manes wrote:
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>>You'll find Axis performance much faster than Sun's
> > > JAX-RPC RI. It also
> > > > > >>>provides much easier tools. But the commercial
> > > implementations are much
> > > > > >>>faster and easier still. There are free versions available for
> > > > > >>>
> > > > > >>>
> > > > > >>both GLUE and
> > > > > >>
> > > > > >>
> > > > > >>>WASP. GLUE Standard is always free. The footprint is
> tiny, too. See
> > > > > >>>http://www.themindelectric.com <http://www.themindelectric.com/>. WASP is always free for
> > > > > >>>
> > > > > >>>
> > > > > >>development, and you
> > > > > >>
> > > > > >>
> > > > > >>>can get a free deployment license for a single CPU
> > > (multi-CPUs require
> > > > > >>>payment). See http://www.systinet.com <http://www.systinet.com/>. These two
> implementations
> > > > > >>>
> > > > > >>>
> > > > > >>offer the
> > > > > >>
> > > > > >>
> > > > > >>>best performance (10-15x faster than Axis) and the best tools.
> > > > > >>>
> > > > > >>>Anne
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>>-----Original Message-----
> > > > > >>>>From: Armond Avanes [mailto:[EMAIL PROTECTED]]
> > > > > >>>>Sent: Saturday, February 08, 2003 2:45 AM
> > > > > >>>>To: [EMAIL PROTECTED]
> > > > > >>>>Subject: Axis performance in compare with XRPC (reference
> > > > > implementation
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>from SUN)!
> > > > > >>>
> > > > > >>>
> > > > > >>>>Hi SOAP Folks,
> > > > > >>>>
> > > > > >>>>Anyone has compared the performance of these two
> implementations
> > > > > >>>>(Apache's Axis and Sun's XRPC) in a real environment?!
> > > > > >>>>
> > > > > >>>>FYI, I'm in the phase of replacing the communication
> > > layer (which is
> > > > > >>>>reference implementation of SUN) of the application, I'm
> > > working on,
> > > > > >>>>with Axis. Sun's implementation generates so many classes and
> > > > > uses many
> > > > > >>>>libraries so causes the whole result (application jars,
> > > ear's, war's,
> > > > > >>>>etc) to be very huge. Another side effect is the
> build time of the
> > > > > >>>>project, which is really much!
> > > > > >>>>
> > > > > >>>>I need all your ideas/suggestions/comments in this regard.
> > > > > >>>>What problems may I get into with Axis? How's the performance
> > > > > in compare
> > > > > >>>>with other implementations? Is there any better
> > > alternative than Axis
> > > > > >>>>(free for sure!) And so on...
> > > > > >>>>
> > > > > >>>>Best Regards,
> > > > > >>>>Armond
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > >
>

Re: Why Pull-Parser faster ?

Reply via email to