Simone Gianni wrote:
Hi Simone and Sylvain,
aren't XSLT transformers already SAX/Xpath optimized? I mean, an XSLT
containing an XPath expression and used in a SAX context, isn't
already able to resolve the XPath while keeping buffering at the
minimum possible?
I can clearly remember that there has been a lot of work about this in
Xalan and other XSLT engines, and also how a complex XPath expressions
could change the performance of a transformation because of increased
buffering.
Xalan has an optimized implementation of the document tree [1], more
efficient than the standard DOM for read-only and selection operations.
Xalan has an incremental processing mode, but IIRC it's more about being
able to produce some output before the whole document has been read
rather than avoiding to build parts of the document tree. So it will
allow for faster processing, but won't change memory consumption.
In that case, maybe, instead of reinventing it, it should be possible
to delegate the "transformation" (extraction of a fragment from the
entire XML stream) to an XSLT processor. The simplest way could be to
generate an XSLT on the fly :) .. the correct way would be to use the
[Xalan|Saxon|any other] internal APIs to perform the XPath resolution.
In both cases, it will be faster than transforming to DOM.
Agree. It may be easier to produce a small XSL transformation from the
XPointer expression than using Axiom. But still, for simple expressions,
the pure streaming approach used by Tika would be way more efficient.
Sylvain
[1] http://xml.apache.org/xalan-j/dtm.html
--
Sylvain Wallez - http://bluxte.net