Re: High performance XML parser

Steven Schveighoffer Mon, 07 Feb 2011 04:41:01 -0800

On Fri, 04 Feb 2011 17:03:08 -0500, Simen kjaeraas<simen.kja...@gmail.com> wrote:

Steven Schveighoffer <schvei...@yahoo.com> wrote:
Here is how I would approach it (without doing any research).
First, we need a buffered I/O system where you can easily access andmanipulate the buffer. I have proposed one a few months ago in this NG.
Second, I'd implement the XML lib as a range where "front()" gives youan XMLNode. If the XMLNode is an element, it will have eager access tothe element tag, and lazy access to the attributes and the sub-nodes.Each XMLNode will provide a forward range for the child nodes.
Thus you can "skip" whole elements in the stream by popFront'ing arange, and dive deeper via accessing the nodes of the range.
I'm unsure how well this will work, or if you can accomplish all of itwithout reallocation (in particular, you may need to store the elementinformation, maybe via a specialized member function?).
Question:
For the lazily computed attributes and subnodes, will accessing oneelement
cause all elements to be computed? Same goes for getting the number of
elements.

The goal is to avoid double-buffering data. So you are using the bufferof the input stream to contain all data. So, advancing to the 'next'element/node/attribute makes the previous element/node/attribute invalid(i.e. the buffer is reused).

The trick is to make it seem like the node is fully there without actuallyreading the stream until you need it (hence the lazy part), becausereading the entire node means reading the entire file (in the case of theroot element).

Also, can this be efficiently combined with mmapping? What I sortaimagineis a kind of lazy slice: It determines whether it ends within this page,and
if not, does not progress past that page until asked to do so.

mmaping would make things more accessible, but the common denominator isnot mmap. If it's supported as a special case, then maybe it can offersome interesting features, but something like mmap can't be done for say anetwork stream.


-Steve

Re: High performance XML parser

Reply via email to