Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Hiranya Jayathilaka Sat, 01 May 2010 02:40:59 -0700

On Fri, Apr 30, 2010 at 9:12 PM, Andreas Veithen
<andreas.veit...@gmail.com>wrote:


> Axiom always creates the nodes based on the events received from the
> underlying parser. If javax.xml.stream.isCoalescing is set to false on
> the parser, then by definition the parser may return large text nodes
> in multiple chunks. The problem is that if
> javax.xml.stream.isCoalescing is set to true, StAX doesn't report
> CDATA sections in the document as CDATA events, but as CHARACTER
> events. It is however possible to configure Woodstox to report CDATA
> sections without splitting text nodes into chunks. Note that even with
> such a configuration, OMElement#getText should always be used to
> extract the text content of an element (to cover the case where the
> element contains a mix of text nodes and CDATA sections).
>
> Note that while coalescing is switched off by default at the StAX
> level, Axiom overrides this so that by default coalescing is turned on
> [1]. It is not surprising that there is code that implicitly relies on
> this. Therefore, working with Axiom in non coalescing mode is always a
> risk.
>

Thanks Andreas. This explains a lot.

Thanks,
Hiranya


>
> Andreas
>
> [1] http://people.apache.org/~veithen/axiom/userguide/ch04.html#d0e866
>
> On Fri, Apr 30, 2010 at 11:51, Kasun Indrasiri <kasun...@gmail.com> wrote:
> > Hi,
> >
> > When parsing XML in non-coalescing mode ("javax.xml.stream.isCoalescing",
> > false) Axiom breaks down large text entries to multiple chunks. Therefore
> CDATA
> > elements with lengthy texts get translated into multiple CDATA elements.
> >
> > thanks,
> > --
> > Kasun Indrasiri
> > Senior Software Engineer,
> > WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
> > Blog : http://kasunpanorama.blogspot.com/
> >
>



-- 
Hiranya Jayathilaka
Software Engineer;
WSO2 Inc.;  http://wso2.org
E-mail: hira...@wso2.com;  Mobile: +94 77 633 3491
Blog: http://techfeast-hiranya.blogspot.com

Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Reply via email to