Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Andreas Veithen Sat, 01 May 2010 03:42:46 -0700

On Sat, May 1, 2010 at 07:33, Kasun Indrasiri <[email protected]> wrote:
> Hi,
>
> I guess this becomes even more riskier in a scenario like this.
>
> XML string :  "<a> a_ lengthy_string</a>" -> omElem
>
> Once we parse this xml in non-coalescing mode and create an OM
> element(omElem) with this,
>
> - first Child : contains the first portion of 'a_lengthy_string' string
> - last Child : contains the rest
>
> However, as Hiranya mentioned 'omEle.getText()' will give us the correct
> value of the text content.
>
> Is this the acceptable behavior?


It's not the default behavior, but if someone explicitly configures
Axiom to switch off coalescing, then he has to live with the
consequences ;-)

> regards,
>
> Kasun
>
>
> On Fri, Apr 30, 2010 at 9:12 PM, Andreas Veithen
> <[email protected]>wrote:
>
>> Axiom always creates the nodes based on the events received from the
>> underlying parser. If javax.xml.stream.isCoalescing is set to false on
>> the parser, then by definition the parser may return large text nodes
>> in multiple chunks. The problem is that if
>> javax.xml.stream.isCoalescing is set to true, StAX doesn't report
>> CDATA sections in the document as CDATA events, but as CHARACTER
>> events. It is however possible to configure Woodstox to report CDATA
>> sections without splitting text nodes into chunks. Note that even with
>> such a configuration, OMElement#getText should always be used to
>> extract the text content of an element (to cover the case where the
>> element contains a mix of text nodes and CDATA sections).
>>
>> Note that while coalescing is switched off by default at the StAX
>> level, Axiom overrides this so that by default coalescing is turned on
>> [1]. It is not surprising that there is code that implicitly relies on
>> this. Therefore, working with Axiom in non coalescing mode is always a
>> risk.
>>
>> Andreas
>>
>> [1] http://people.apache.org/~veithen/axiom/userguide/ch04.html#d0e866
>>
>> On Fri, Apr 30, 2010 at 11:51, Kasun Indrasiri <[email protected]> wrote:
>> > Hi,
>> >
>> > When parsing XML in non-coalescing mode ("javax.xml.stream.isCoalescing",
>> > false) Axiom breaks down large text entries to multiple chunks. Therefore
>> CDATA
>> > elements with lengthy texts get translated into multiple CDATA elements.
>> >
>> > thanks,
>> > --
>> > Kasun Indrasiri
>> > Senior Software Engineer,
>> > WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
>> > Blog : http://kasunpanorama.blogspot.com/
>> >
>>
>
>
>
> --
> Kasun Indrasiri
> Senior Software Engineer,
> WSO2 Inc. - "Lean . Enterprise . Middleware" - http://www.wso2.com/
> Blog : http://kasunpanorama.blogspot.com/
>

Re: XML with large text entries are broken down to chunks when parsing with Axiom - non-coalescing mode.

Reply via email to