We could also increasing the max length of a stored character event in general. ...but that would waste 2 bytes per event. Hm...
What do you think? -- Torsten
Hi,
why should we handle the UTFDataFormat exception, at all?. The last solution ignores this exception, doesn't it?
Where is the difference between
event string 32k
string 4k
and
event
string 36k
in the bytestream?
The questions is if we need the UTFDataFormatException or not. If not a patch can simply remove the statement if(string>32k){} and then we get the result:
event string xxk (the limit is than the java integer-range)
Well, that true ...but the current length is hold as 15-bit integer. The highest bit decides whether it's an index in a HashMap or not.
As I said we could increase the length to 31-bit but that gives 2 additional bytes per character event.
Stefano, did I explain this right?!
Maybe I'm totally wrong, but i think the string 32k limitation comes from the CXML-format from Stefano Mazzocchi http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97194999124269&w=2
Yes, that's correct
I understand it in this way, that the cxml-format is independent from cocoon and java, so if anyone writes a decoder in Language C he can use that bytestream, too.
Yes
The Sax-Events should not be the problem, every SaxHandler has to process the following correct
<node>
text here <!-- comment here -->
text here
</node>
this gives a Character-Event,Comment-Event,Character-Event for one node, or do i misunterstand the SAX-processing totally?
That's correct if the text nodes are not affected by the patch. If there is a character event larger than 32k the events come out differently.
If it's correct, a Character-Event,Character-Event,... should not be a problem.
<node> text here <!-- comment here --> long text here </node>
could come out like
<node> "text here" <!-- comment here --> "long text""here" </node>
and relies on the transformer to normalize the text nodes!!
Let's not discuss this to death. I'll fix it :)
cheers -- Torsten