Re: High performance XML parser

Steven Schveighoffer Wed, 09 Feb 2011 05:40:58 -0800

On Tue, 08 Feb 2011 19:16:37 -0500, Tomek Sowiński <j...@ask.me> wrote:

Steven Schveighoffer napisał:
> The design I'm thinking is that the node iterator will own a buffer.One
> consequence is that the fields of the current node will point to the
> buffer akin to foreach(line; File.byLine), so in order to lift theinput
> the user will have to dup (or process the node in-place). As new nodes
> will be overwritten on the same piece of memory, an important trait of
> the design emerges: cache intensity. Because of XML namespaces I think
> it is necessary for the buffer to contain the current node plus allits
> parents.
That might not scale well. For instance, if you are accessing the1500thchild element of a parent, doesn't that mean that the buffer mustcontainthe full text for the previous 1499 elements in order to also containthe
parent?

Maybe I'm misunderstanding what you mean.
Let's talk on an example:

<a name="value">
        <b>
                Some Text 1
                <c2>      
                Some text 2
                </c2>
                Some Text 3
        </b>
</a>

The buffer of the iterator positioned HERE would be:

[Node a | Node b | Node c2]

OK, so you mean a buffer other than the I/O buffer. This means doublebuffering data. I was thinking of a solution that allows simply using theI/O buffer for parsing. I think this is one of the keys to Tango's xmlperformance.


-Steve

Re: High performance XML parser

Reply via email to