We could also increasing the max length of a stored
character event in general. ...but that would waste
2 bytes per event. Hm...

What do you think?
--
Torsten


Hi,


why should we handle the UTFDataFormat exception, at all?. The last solution ignores this exception, doesn't it?
Where is the difference between


event string 32k
string 4k


and
event
string 36k


in the bytestream?

The questions is if we need the UTFDataFormatException or not. If not a patch can simply remove the statement if(string>32k){} and then we get the result:

event
string xxk (the limit is than the java integer-range)

Well, that true ...but the current length is hold as 15-bit integer. The highest bit decides whether it's an index in a HashMap or not.

As I said we could increase the length to 31-bit
but that gives 2 additional bytes per character
event.

Stefano, did I explain this right?!

Maybe I'm totally wrong, but i think the string 32k limitation comes from the 
CXML-format from   Stefano Mazzocchi
http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=97194999124269&w=2

Yes, that's correct


I understand it in this way, that the cxml-format is independent from cocoon and java, so if anyone writes a decoder in Language C he can use that bytestream, too.

Yes


The Sax-Events should not be the problem, every SaxHandler has to process the following correct

<node>
text here <!-- comment here -->
text here
</node>


this gives a Character-Event,Comment-Event,Character-Event for one node, or do i misunterstand the SAX-processing totally?

That's correct if the text nodes are not affected by the patch. If there is a character event larger than 32k the events come out differently.

If it's correct, a Character-Event,Character-Event,... should not be a problem.

<node> text here <!-- comment here --> long text here </node>

could come out like

<node>
  "text here"
  <!-- comment here -->
  "long text""here"
</node>

and relies on the transformer to normalize the text nodes!!

Let's not discuss this to death. I'll fix it :)

cheers
--
Torsten



Reply via email to