On 7/12/06, James M Snell <[EMAIL PROTECTED]> wrote:
I'm not sure I could reasonably envision any use of nonblocking i/o operations in an xml parser. I'm not sure if I've ever seen anyone do it before.
Well, I wouldn't expect that you'd put the nonblocking IO inside the parser, it would be more like you'd be using nonblocking IO to pull data off the wire and then passing that data off to a SAX style parser once you get it.
In any case, I figured you might find this entertaining: http://www.snellspace.com/wp/?p=381 http://danga.com:8081/atom-stream.xml is a never-ending xml stream. URL url = new URL("http://danga.com:8081/atom-stream.xml"); // we only care about the feed title and alternate link, // we'll ignore everything else ParseFilter filter = new WhiteListParseFilter(); filter.add(new QName("atomStream")); filter.add(Constants.FEED); filter.add(Constants.TITLE); filter.add(Constants.LINK); ParserOptions options = Parser.INSTANCE.getDefaultParserOptions(); options.setParseFilter(filter); Document doc = Parser.INSTANCE.parse( url.openStream(),(URI)null,options); Element el = doc.getRoot(); // get the first feed in the stream, then continue to iterate // from there, printing the title and alt link to the console Feed feed = el.getFirstChild(Constants.FEED); while (feed != null) { System.out.println( feed.getTitle() + "t" + feed.getAlternateLink().getHref()); Feed next = feed.getNextSibling(Constants.FEED); feed.discard(); feed = next; } There are some memory-creep issues so I wouldn't recommend keeping this running forever :-)
This is neat, but it's not really what I'm thinking of. The use case I was more concerned about would be a crawler that's trying to pull down a scary amount of data, but doesn't want to devote a thread to each one, so as it gets data down it hands it off to a parser as it gets it. Now, practically speaking it's debatable if you'd want to actually do this, it might make more sense to spool the data off someplace and then parse the feed after you've got it all, unless of course you're talking about a never ending atom feed ;-) -garrett
