Re: Async Parsing?

Garrett Rooney Wed, 12 Jul 2006 18:07:02 -0700

On 7/12/06, James M Snell <[EMAIL PROTECTED]> wrote:

I'm not sure I could reasonably envision any use of nonblocking i/o
operations in an xml parser.  I'm not sure if I've ever seen anyone do
it before.


Well, I wouldn't expect that you'd put the nonblocking IO inside the
parser, it would be more like you'd be using nonblocking IO to pull
data off the wire and then passing that data off to a SAX style parser
once you get it.

In any case, I figured you might find this entertaining:

http://www.snellspace.com/wp/?p=381

http://danga.com:8081/atom-stream.xml is a never-ending xml stream.

    URL url = new URL("http://danga.com:8081/atom-stream.xml";);
    // we only care about the feed title and alternate link,
    // we'll ignore everything else
    ParseFilter filter = new WhiteListParseFilter();
    filter.add(new QName("atomStream"));
    filter.add(Constants.FEED);
    filter.add(Constants.TITLE);
    filter.add(Constants.LINK);
    ParserOptions options = Parser.INSTANCE.getDefaultParserOptions();
    options.setParseFilter(filter);
    Document doc = Parser.INSTANCE.parse(
      url.openStream(),(URI)null,options);
    Element el = doc.getRoot();
    // get the first feed in the stream, then continue to iterate
    // from there, printing the title and alt link to the console
    Feed feed = el.getFirstChild(Constants.FEED);
    while (feed != null) {
      System.out.println(
        feed.getTitle() + "t" + feed.getAlternateLink().getHref());
      Feed next = feed.getNextSibling(Constants.FEED);
      feed.discard();
      feed = next;
    }

There are some memory-creep issues so I wouldn't recommend keeping this
running forever :-)


This is neat, but it's not really what I'm thinking of.  The use case
I was more concerned about would be a crawler that's trying to pull
down a scary amount of data, but doesn't want to devote a thread to
each one, so as it gets data down it hands it off to a parser as it
gets it.  Now, practically speaking it's debatable if you'd want to
actually do this, it might make more sense to spool the data off
someplace and then parse the feed after you've got it all, unless of
course you're talking about a never ending atom feed ;-)

-garrett

Re: Async Parsing?

Reply via email to