Re: New XML parser written for D1 and D2.

Jeremie Pelletier Wed, 14 Oct 2009 17:25:17 -0700

Justin Johansson wrote:

Andrei Alexandrescu Wrote:

Saaa wrote:

Michael Rynn wrote
Where and to whom can I post the 56 KB source code zip?
Attaching it to an enhancement in bugzilla would be best, I think.

Yes please. Making the code work with ranges as input would be great.

Andrei


Hi Andrei,

Still being a D apprentice and not 100% conversant with D terminology yet, I 
assume,
and not wanting to make an *ass* out of *u* and *me* :-),
that by "ranges" you mean making use of D sub char[] arrays over the input so 
as to
minimize/obviate the need to allocate lots of small(er) strings to hold element 
tagnames,
attribute names and values, text node contents and so on.

He meant range structs as found in std.range and their array wrappers instd.array.

This assumption being correct, can you confirm or otherwise that the 
consequence of such
a design would mean that by parsing, say, a 1MB XML in-memory document, 
constructing a
node tree from the same and having the nodes directly referencing substrings in the input 
document via string "ranges", the entire 1MB would be locked into memory by the 
GC and not
collectable until the node tree itself is done with?

That is not the goal of ranges, a memory mapped file would be moreefficient for what you describe.

A range is D's version of streams, so for example a simple reader mightlook like:


void read(T)(in T range) if(isInputRange!T) {
        while(!range.empty()) {
                auto elem = range.front();
                // process element
                range.popFront();
        }
}

The range implementation can be a simple 'string', a 'char[]', or acustom network channel that blocks on front() if the data is still loading.

Now I might be completely off track;  perhaps instead you are thinking of SAX 
style
parsing and passing arguments to the SAX event handling function via the said 
ranges.  In
this scenario I guess the SAX client could decide whether or not to .dup the 
ranges.

I think you confuse ranges with slices. Ranges are simply an interfacefor sequential or random data access. DOM trees and SAX callbacks aredifferent methods of parsing the xml, a range is a method of accessingthe data :)

Speaking of SAX, do we have a D implementation yet? If not I could writeone, it sounds fun.

Depending on your clarification, I may have further comment based upon my 
practical
experience in the XML domain.

Regards

Justin Johansson

Re: New XML parser written for D1 and D2.

Reply via email to