[issue17741] event-driven XML parser

Eli Bendersky Mon, 26 Aug 2013 09:05:39 -0700

Eli Bendersky added the comment:

On Sun, Aug 25, 2013 at 10:40 PM, Stefan Behnel <rep...@bugs.python.org>wrote:


>
> Stefan Behnel added the comment:
>
> Hmm, did you look at my last comment at all? It solves both the technical
> issues and the API issues very nicely and avoids any problems of potential
> future changes. Let me quickly explain why.
>
> The feature in question depends on two existing parts of the API: the
> event generation of the parser, and the return values of the parser target
> (e.g. a tree builder). So there are really only three places where this
> feature makes sense, both technically and API-wise.
>
> 1) in the target
> 2) in the parser
> 3) between parser and target
>
> Note how a separate class is ruled out right from the start by the fact
> that the feature lives somehwere between parser and target. It's an
> inherent part of the existing design already (and of the implementation,
> BTW), so I don't see how adding a separate thing to control it makes any
> sense.
>
> 1) is impossible because the target is user provided and we do not control
> it
> 2) works fine because the parser controls both the call to the target and
> its return value
> 3) would be nice (and was my original favourite) but is hard to do with
> the current implementation and requires further changes to the API of
> parser targets
>
> So 2) is the choice that remains.
>
>
I think folding it all into XMLParser is a bad idea. XMLParser is a fairly
simple API and I don't want to complicate it. But more importantly,
XMLParser knows nothing about Elements, at least in the direct API of
today. The one constructing Elements is the target. The "read_events"
method proposed for the new class (currently IncrementalParser.events)
already returns Elements, having used a TreeBuilder to build them.
XMLParser emits start/end/data calls into the target, but these only carry
tag names, attributes and chunks of data. The hierarchical element
construction is done by TreeBuilder.

What I actually think would be better for the long term is to add new
target invocations in XMLParser - start-ns and end-ns. So XMLParser would
just keep *parsing*, leaving the interpretation of the parsed data to the
target. Today's TreeBuilder is free to ignore these calls. A custom
"EventCollectingTreeBuilder" can collect an event list, having all the
information at its disposal. Thus, XMLParser would remain what it is today
(minus the _setevents hack) - a router for pyexpat events.

These discussions of the future API are interesting, but what's more
important today is to have an API for IncrementalParser (using this name
before a new one is agreed upon) that doesn't block future implementation
changes. And I believe the API proposed here fits the bill.

> > The class will be named EventParser.
>
> Obviously because it's parsing Events, as opposed to the XMLParser, which
> parses XML, or the HTMLParser, which parses HTML, right?
>

The name is not perfect, and proposals for a better one are welcome. FWIW,
since it already lives in the xml.etree namespace, "XML" does not
necessarily have to be part of the name. So, some alternatives:

* EventStreamer - proposed by Nick. I have to admit I don't feel good with
it, because I still want to be crystal clear it's a *parser* we're talking
about.
* EventBasedParser
* EventCollectingParser
* NonblockingParser
* ... other ideas?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue17741>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue17741] event-driven XML parser

Reply via email to