Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Andreas Veithen Fri, 30 Apr 2010 08:42:58 -0700

Axiom always creates the nodes based on the events received from the
underlying parser. If javax.xml.stream.isCoalescing is set to false on
the parser, then by definition the parser may return large text nodes
in multiple chunks. The problem is that if
javax.xml.stream.isCoalescing is set to true, StAX doesn't report
CDATA sections in the document as CDATA events, but as CHARACTER
events. It is however possible to configure Woodstox to report CDATA
sections without splitting text nodes into chunks. Note that even with
such a configuration, OMElement#getText should always be used to
extract the text content of an element (to cover the case where the
element contains a mix of text nodes and CDATA sections).

Note that while coalescing is switched off by default at the StAX
level, Axiom overrides this so that by default coalescing is turned on
[1]. It is not surprising that there is code that implicitly relies on
this. Therefore, working with Axiom in non coalescing mode is always a
risk.

Andreas

[1] http://people.apache.org/~veithen/axiom/userguide/ch04.html#d0e866

On Fri, Apr 30, 2010 at 11:51, Kasun Indrasiri <[email protected]> wrote:
> Hi,
>
> When parsing XML in non-coalescing mode ("javax.xml.stream.isCoalescing",
> false) Axiom breaks down large text entries to multiple chunks. Therefore 
> CDATA
> elements with lengthy texts get translated into multiple CDATA elements.
>
> thanks,
> --
> Kasun Indrasiri
> Senior Software Engineer,
> WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
> Blog : http://kasunpanorama.blogspot.com/
>

Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Reply via email to