Good analysis, Mukul, though perhaps a bit kinder to us than we deserve. Probably worth adding to our official documentation.
The xpath and XSLT intermediate representation was originally designed for interpretive Xalan. One of the reasons it is so compact is that PCs were much smaller and slower at the time so storage and access efficiency was crucial. (The same reason DTM its designed essentially at the byte level for compactness.) But compactness can also buy performance advantages; it's as close to C or ASM data models as Java let us get. When Sun donated their xslt compiler to Apache, the IBM / Apache team adjusted it to share this representation so it could be integrated with Xalan. I don't recall how much the Xalan front end had to change to support that; not a great deal, I think. There could certainly be better choices. I've mentioned some of them in past discussion of IBM Research's Xylem processor. -- Joe (Typo'ed on my phone.) Car Talk has been advised that the staff opticians, C. F. Eyecare, are now under new management. It's Ike and Zeke Leary, now; Lorraine is gone. ________________________________ From: Mukul Gandhi <[email protected]> Sent: Sunday, May 17, 2026 12:47:48 PM To: [email protected] <[email protected]> Subject: Xalan Java XPath parser design Hello Xalan dev list, I've always been curious to learn about the detailed design for Xalan Java XPath parser. I did find following information, a while ago on Google Gemini AI software, and thought someone might be interested to know these details. Question : What is software design for Xalan Java class org.apache.xpath.compiler.OpMap? Answers (with minor grammatical improvements): The org.apache.xpath.compiler.OpMap class in Apache Xalan-Java serves as the structural foundation for XPath expression representation. It is designed to act as an optimized, flat data structure (an array of integers) to represent an XPath abstract syntax tree (AST), prioritizing fast execution speeds and low memory overhead. 1. The Core Design Concept: The "OpMap" Instead of building a traditional tree of objects (which introduces heavy memory and allocation overhead), Xalan parses an XPath string into a flat integer array called the Operations Map (OpMap). 1.1) Flat Array Representation: Each node and its parameters (like operations, step operators, and literals) are stored sequentially in an OpMapVector. 1.2) Opcode + Arguments: The array is traversed in chunks. A specific opcode (e.g., OpCodes.OP_LOCATIONPATH) sits at a given position, immediately followed by the length of the operation and the lengths/pointers to its child nodes. 1.3) Helper Methods: OpMap is equipped with protected/utility methods like getArgLength() and getFirstChildPos() to parse and navigate this array structure without writing explicit, repetitive pointer math. 2. Primary Components The OpMap encapsulates the basic state and memory structures required to store a parsed XPath: 2.1) m_opMap: An OpMapVector (a custom array-based list) that holds the actual tree of integer opcodes and their arguments. 2.2) m_tokenQueue: An ObjectVector containing the literal tokens (e.g., string literals, node names, and numeric constants) referenced by the opcodes in m_opMap. 2.3) m_currentPattern: Stores the raw XPath string, which is mostly kept for diagnostic and debugging purposes. 3. Structural Hierarchy 3.1) Base Class (OpMap): Contains the data structures and basic navigation algorithms for traversing the opcode array. 3.2) Derived Class (Compiler): Extends OpMap to actually iterate over the flat integer map and build a proper executable expression tree (e.g., Expression objects) that evaluates the XPath dynamically against context nodes. This design, for Xalan Java XPath parser creating a flat array (OpMap) and a compiler traversing it was chosen to balance Xalan's need for both interpretative tooling and fast, compiled runtime performance. -- Regards, Mukul Gandhi --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
