We're currently using an obscure feature of the expat parser library ``CurrentByteIndex`` for at least two purposes: tricking expat into giving us the raw text without resolving entities, and to parse fragmented documents (e.g. with more than one top-level node).
This does not work on Jython. An alternative is to incorporate the ``pxdom`` parser; it's pure Python and licensed under "New BSD". I believe it can be adapted fairly easily to our needs. However, the optimal solution would be either: 1) Pure Python library that parses straight into ElementTree-compatible elements; 2) Pure Python parser that yields SAX events. Note that the SAX-parser provided with CPython relies on ``expat``. On Jython there's a different underlying implementation; it's probably best to stay away from libraries which merely bind to native code––afterall, compatible, not speed, is the goal here. Feedback on how to go forward is appreciated; as are development grants. \malthe _______________________________________________ Repoze-dev mailing list Repoze-dev@lists.repoze.org http://lists.repoze.org/listinfo/repoze-dev