On Fri, 04 Feb 2011 17:03:08 -0500, Simen kjaeraas <simen.kja...@gmail.com> wrote:

Steven Schveighoffer <schvei...@yahoo.com> wrote:

Here is how I would approach it (without doing any research).

First, we need a buffered I/O system where you can easily access and manipulate the buffer. I have proposed one a few months ago in this NG.

Second, I'd implement the XML lib as a range where "front()" gives you an XMLNode. If the XMLNode is an element, it will have eager access to the element tag, and lazy access to the attributes and the sub-nodes. Each XMLNode will provide a forward range for the child nodes.

Thus you can "skip" whole elements in the stream by popFront'ing a range, and dive deeper via accessing the nodes of the range.

I'm unsure how well this will work, or if you can accomplish all of it without reallocation (in particular, you may need to store the element information, maybe via a specialized member function?).

Question:

For the lazily computed attributes and subnodes, will accessing one element
cause all elements to be computed? Same goes for getting the number of
elements.

The goal is to avoid double-buffering data. So you are using the buffer of the input stream to contain all data. So, advancing to the 'next' element/node/attribute makes the previous element/node/attribute invalid (i.e. the buffer is reused).

The trick is to make it seem like the node is fully there without actually reading the stream until you need it (hence the lazy part), because reading the entire node means reading the entire file (in the case of the root element).

Also, can this be efficiently combined with mmapping? What I sorta imagine is a kind of lazy slice: It determines whether it ends within this page, and
if not, does not progress past that page until asked to do so.

mmaping would make things more accessible, but the common denominator is not mmap. If it's supported as a special case, then maybe it can offer some interesting features, but something like mmap can't be done for say a network stream.

-Steve

Reply via email to