On Fri, 04 Feb 2011 17:36:50 -0500, Tomek Sowiński <j...@ask.me> wrote:
Steven Schveighoffer napisał:
Here is how I would approach it (without doing any research).
First, we need a buffered I/O system where you can easily access and
manipulate the buffer. I have proposed one a few months ago in this NG.
Second, I'd implement the XML lib as a range where "front()" gives you
an
XMLNode. If the XMLNode is an element, it will have eager access to the
element tag, and lazy access to the attributes and the sub-nodes. Each
XMLNode will provide a forward range for the child nodes.
Thus you can "skip" whole elements in the stream by popFront'ing a
range,
and dive deeper via accessing the nodes of the range.
I'm unsure how well this will work, or if you can accomplish all of it
without reallocation (in particular, you may need to store the element
information, maybe via a specialized member function?).
Heh, yesterday when I couldn't sleep I was sketching the design. I
converged to a pretty much same concept, so your comment is reassuring
:).
The design I'm thinking is that the node iterator will own a buffer. One
consequence is that the fields of the current node will point to the
buffer akin to foreach(line; File.byLine), so in order to lift the input
the user will have to dup (or process the node in-place). As new nodes
will be overwritten on the same piece of memory, an important trait of
the design emerges: cache intensity. Because of XML namespaces I think
it is necessary for the buffer to contain the current node plus all its
parents.
That might not scale well. For instance, if you are accessing the 1500th
child element of a parent, doesn't that mean that the buffer must contain
the full text for the previous 1499 elements in order to also contain the
parent?
Maybe I'm misunderstanding what you mean.
I would start out with a non-compliant parser, but one that allocates
nothing beyond the I/O buffer, one that simply parses lazily and can be
used as well as a SAX parser. Then see how much extra allocations we need
to get it to be compliant. Then, one can choose the compliancy level
based on what performance penalties one is willing to incur.
-Steve