Axiom always creates the nodes based on the events received from the underlying parser. If javax.xml.stream.isCoalescing is set to false on the parser, then by definition the parser may return large text nodes in multiple chunks. The problem is that if javax.xml.stream.isCoalescing is set to true, StAX doesn't report CDATA sections in the document as CDATA events, but as CHARACTER events. It is however possible to configure Woodstox to report CDATA sections without splitting text nodes into chunks. Note that even with such a configuration, OMElement#getText should always be used to extract the text content of an element (to cover the case where the element contains a mix of text nodes and CDATA sections).
Note that while coalescing is switched off by default at the StAX level, Axiom overrides this so that by default coalescing is turned on [1]. It is not surprising that there is code that implicitly relies on this. Therefore, working with Axiom in non coalescing mode is always a risk. Andreas [1] http://people.apache.org/~veithen/axiom/userguide/ch04.html#d0e866 On Fri, Apr 30, 2010 at 11:51, Kasun Indrasiri <[email protected]> wrote: > Hi, > > When parsing XML in non-coalescing mode ("javax.xml.stream.isCoalescing", > false) Axiom breaks down large text entries to multiple chunks. Therefore > CDATA > elements with lengthy texts get translated into multiple CDATA elements. > > thanks, > -- > Kasun Indrasiri > Senior Software Engineer, > WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/ > Blog : http://kasunpanorama.blogspot.com/ >
