On 2009-05-24 20:31:05 -0400, Daniel Keep <daniel.keep.li...@gmail.com> said:

Michel Fortin wrote:
On 2009-05-24 12:51:43 -0400, Daniel Keep <daniel.keep.li...@gmail.com>
said:

(Cutting us mostly going back-and-forth on what a callback api would
look like.

...

Like I said, this seems like a lot of work to bolt a callback interface
onto something a pull api is designed for.

...

Except of course that you now can't easily control the loop, nor can do
you do fall-through on the cases.

Again, my definition of a callback API doesn't include an implicit loop,
just a callback. And I intend the callback to be a template argument so
it can be dispatched using function overloading and/or function
templates. So you'll have this instead:

bool continue = true;
do
continue = pp.readNext!(callback)();
while (continue);

void callback(OpenElementToken t) { blah(t.name); }
void callback(CloseElementToken t) { ... }
void callback(CharacterDataToken t) { ... }
...

No switch statement and no inversion of control.

Except that you can't define overloads of a function inside a function.

I didn't know that. Interesting point.

Perhaps that's just a bug in the compiler that we could get fixed though. Any clue on that? I notice it also happen if you want to specialize a nested template function.


Which means you have to stuff all of your code in a set of increasingly
obtusely-named globals or private members.  Like elemAStart, elemAData,
elemAAttr, elemAClose, elemBStart, elemBData, elemBAttr, ...

But when inside a function you can still dispatch using a nested function template:

        void callback(T)(T t)
        {
                static if (is(T : OpenElementToken))
                {
                        blah(t.name);
                }
                static if (is(T : CloseElementToken))
                {
                        ...
                }
        }

It sure is a little less elegant, but you still skip a switch.


...
And at that point, I've just reinvented SAX.  Well, almost.  I have
control over the loop.  I still can't simply break out of it; I've got
to mess around with flags to get that done.

Meanwhile, if I write that code with a PullParser, it's just a
collection of normal functions, one per element type with all the
related code together in one place.  Or, if I don't want them all
bundled together, I can dispatch to smaller functions.

There's no way I'm not including a pull API, most likely implemented as a range.


I have a feeling you're going to head down this path irrespective, so
I'll just hope you can figure out a way to make the api not suck.

I want to offer at least two API options (so you can choose the most appropriate parser API for what you do), and I want all of them to share the same underlying parser (so I don't write two or three parsers) with no compromise on speed.

I'm now realizing that an inversion of control can increase the performance of the parser by not having to rebranch on the current state each time you ask for a new token. I don't want to force inversion of control to anyone, but surely an API with inversion of control should be possible at full speed, and it can't be built on top of a pull parser.

So basically, the way I see it, you'd have two APIs: the inversion of control callback parser (for which you can specify a stop criterion so that it saves it state and release control) and the range parser. The range is built on top of the inversion of control parser with a stop criterion making it stop and save its state after each token. With inlining, both APIs should run at optimal speed.

Perhaps you'll say that it's complicated, but if you have a better idea capable of extracting a maximum of performance for both parser APIs, then I'd like to know.

--
Michel Fortin
michel.for...@michelf.com
http://michelf.com/

Reply via email to