Simone Gianni wrote:
Hi Simone and Sylvain,
aren't XSLT transformers already SAX/Xpath optimized? I mean, an XSLT containing an XPath expression and used in a SAX context, isn't already able to resolve the XPath while keeping buffering at the minimum possible?

I can clearly remember that there has been a lot of work about this in Xalan and other XSLT engines, and also how a complex XPath expressions could change the performance of a transformation because of increased buffering.

Xalan has an optimized implementation of the document tree [1], more efficient than the standard DOM for read-only and selection operations. Xalan has an incremental processing mode, but IIRC it's more about being able to produce some output before the whole document has been read rather than avoiding to build parts of the document tree. So it will allow for faster processing, but won't change memory consumption.

In that case, maybe, instead of reinventing it, it should be possible to delegate the "transformation" (extraction of a fragment from the entire XML stream) to an XSLT processor. The simplest way could be to generate an XSLT on the fly :) .. the correct way would be to use the [Xalan|Saxon|any other] internal APIs to perform the XPath resolution. In both cases, it will be faster than transforming to DOM.

Agree. It may be easier to produce a small XSL transformation from the XPointer expression than using Axiom. But still, for simple expressions, the pure streaming approach used by Tika would be way more efficient.

Sylvain

[1] http://xml.apache.org/xalan-j/dtm.html

--
Sylvain Wallez - http://bluxte.net

Reply via email to