On Fri, 04 Feb 2011 17:03:08 -0500, Simen kjaeraas
<simen.kja...@gmail.com> wrote:
Steven Schveighoffer <schvei...@yahoo.com> wrote:
Here is how I would approach it (without doing any research).
First, we need a buffered I/O system where you can easily access and
manipulate the buffer. I have proposed one a few months ago in this NG.
Second, I'd implement the XML lib as a range where "front()" gives you
an XMLNode. If the XMLNode is an element, it will have eager access to
the element tag, and lazy access to the attributes and the sub-nodes.
Each XMLNode will provide a forward range for the child nodes.
Thus you can "skip" whole elements in the stream by popFront'ing a
range, and dive deeper via accessing the nodes of the range.
I'm unsure how well this will work, or if you can accomplish all of it
without reallocation (in particular, you may need to store the element
information, maybe via a specialized member function?).
Question:
For the lazily computed attributes and subnodes, will accessing one
element
cause all elements to be computed? Same goes for getting the number of
elements.
The goal is to avoid double-buffering data. So you are using the buffer
of the input stream to contain all data. So, advancing to the 'next'
element/node/attribute makes the previous element/node/attribute invalid
(i.e. the buffer is reused).
The trick is to make it seem like the node is fully there without actually
reading the stream until you need it (hence the lazy part), because
reading the entire node means reading the entire file (in the case of the
root element).
Also, can this be efficiently combined with mmapping? What I sorta
imagine
is a kind of lazy slice: It determines whether it ends within this page,
and
if not, does not progress past that page until asked to do so.
mmaping would make things more accessible, but the common denominator is
not mmap. If it's supported as a special case, then maybe it can offer
some interesting features, but something like mmap can't be done for say a
network stream.
-Steve