We're currently using an obscure feature of the expat parser library
``CurrentByteIndex`` for at least two purposes: tricking expat into
giving us the raw text without resolving entities, and to parse
fragmented documents (e.g. with more than one top-level node).

This does not work on Jython.

An alternative is to incorporate the ``pxdom`` parser; it's pure
Python and licensed under "New BSD". I believe it can be adapted
fairly easily to our needs. However, the optimal solution would be
either:

1) Pure Python library that parses straight into
ElementTree-compatible elements;
2) Pure Python parser that yields SAX events.

Note that the SAX-parser provided with CPython relies on ``expat``. On
Jython there's a different underlying implementation; it's probably
best to stay away from libraries which merely bind to native
code––afterall, compatible, not speed, is the goal here.

Feedback on how to go forward is appreciated; as are development grants.

\malthe
_______________________________________________
Repoze-dev mailing list
Repoze-dev@lists.repoze.org
http://lists.repoze.org/listinfo/repoze-dev

Reply via email to