Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Torsten Curdt Wed, 12 Nov 2003 03:39:35 -0800

We could also increasing the max length of a stored
character event in general. ...but that would waste
2 bytes per event. Hm...
What do you think?
--
Torsten
Hi,

why should we handle the UTFDataFormat exception, at all?. The last solution ignores this exception, doesn't it? Where is the difference between

event string 32k string 4k

and event string 36k

in the bytestream?

The questions is if we need the UTFDataFormatException or not. If not a patch can simply remove the statement if(string>32k){} and then we get the result:
event
string xxk (the limit is than the java integer-range)


Well, that true ...but the current length is hold
as 15-bit integer. The highest bit decides whether
it's an index in a HashMap or not.

As I said we could increase the length to 31-bit
but that gives 2 additional bytes per character
event.

Stefano, did I explain this right?!

Maybe I'm totally wrong, but i think the string 32k limitation comes from the 
CXML-format from   Stefano Mazzocchi
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97194999124269&w=2

Yes, that's correct

I understand it in this way, that the cxml-format is independent from cocoon and java, so if anyone writes a decoder in Language C he can use that bytestream, too.

Yes

The Sax-Events should not be the problem, every SaxHandler has to process the following correct

<node> text here  text here </node>

this gives a Character-Event,Comment-Event,Character-Event for one node, or do i misunterstand the SAX-processing totally?


That's correct if the text nodes are not affected by the patch.
If there is a character event larger than 32k the events come
out differently.

If it's correct, a Character-Event,Character-Event,... should not be a problem.


<node>
  text here
  <!-- comment here -->
  long text here
</node>

could come out like

<node>
  "text here"
  <!-- comment here -->
  "long text""here"
</node>

and relies on the transformer to normalize the text nodes!!

Let's not discuss this to death. I'll fix it :)

cheers
--
Torsten

Re: DO NOT REPLY [Bug 23299] - [PATCH] UTFDataFormatException: String cannot be longer than 32k.

Reply via email to