Joseph Kessselman created XALANJ-2691:
-----------------------------------------
Summary: Refactor: Replace direct DTM reference with XCI
Key: XALANJ-2691
URL: https://issues.apache.org/jira/browse/XALANJ-2691
Project: XalanJ2
Issue Type: Wish
Security Level: No security risk; visible to anyone (Ordinary problems in
Xalan projects. Anybody can view the issue.)
Components: DOM, DTM, JAXP, Xalan, Xalan-interpretive, XSLTC
Reporter: Joseph Kessselman
Assignee: Gary D. Gregory
DTM was never intended to be a general interface to document models. It was a
specific solution for cramming as large a document as possible into the limited
resources available in PCs around the year 2000.
We have been able to create DTM proxy layers for other data sources –
databases, for example – but they are uniformly ugly and inefficient code.
A better solution, which I implemented in IBM's internal derivative of Xalan
and hoped to contribute back to Apache, was to make DTM _only_ a data model,
and access it through a set of cursor-object interfacesI named XCI (XML Cursor
Interface ... IBM does love its three-letter initialisms). These slotted in
essentially where the DTM Iterators sit now, but provided complete
encapsulation of the data implementation –- all access to a node was via an XCI
cursor currently pointing to that node, with no DTM artifacts leaking through.
The advantage is the obvious one: XCI can be efficiently implemented _directly_
over data models (DTM, DOM, database, custom data trees in any representation,
potentially directly over Java reflection...), without needing to build a map
to DTM node numbers. Little to no efficiency is lost accessing DTM, as the DTM
implementation of the XCI is essentially a repackaging of DTM iterators.
Efficiency gains, and as importantly ease-of-implementation gains, for other
back-end models are significant.
For models which are general networks rather than trees there is risk of
getting into loops; stylesheets can be written to avoid that, or the XCI
implementation can have some mechanism to break those potential loops; in any
case, Xalan already has protection against unreasonably deep recursion so those
will be caught and reported.
I had hoped to donate this improvement back to Apache. Unfortunately, at the
same time I was doing this, the IBM XSLT processor was undergoing other major
rewrites to build a true optimizing compiler as replacement for XSLTC, which we
were not ready (willing?) to contribute to Apache and I didn't have a chance to
disentangle the two.
Having done it once: It *is* a big rewrite task. Not an insanely complicated
one, since the switch from DTM iterators to XCI cursors is nearly a 1:1
mapping,. But even with experience I'd say 3 months of full-time work minimum;
possibly much more. And it's been long enough that I'd have to redevelop it de
novo – which is not a bad thing, as I'm not sure who, if anyone, in IBM could
be persuaded to donate the XCI I created for them.
I have set the initial priority as {_}*minor*{_}; Xalan can certainly continue
as it is, with DTM as the (ugly) glue layer, and we've got lots of
higher-severity items stacked up on the backlog. However, if we had the
resources we once did (with people actually paid to work on Xalan), I believe
this is a significant improvement in Xalan architecture and would be worth
investing in. Certainly if anyone attempts a true Xalan 3.0 (as opposed to a
Xalan 2.x that supports part or all of XSLT 3.0), this should be part of the
effort.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]