On 20 Jan 2004, at 18:40, Ken Gengler wrote:
On Jan 19, 2004, at 10:49 PM, Adam Megacz wrote:I still take issue with
What characters are allowed in strings? Non-printable characters?
Null characters? Can a "string" be used to hold an arbitrary chunk of
binary data?
Any characters are allowed in a string except < and &, which are encoded as < and &. A string can be used to encode binary data.
Can I ask a likely stupid question here? Why aren't the strings sent in CDATA sections? I have to admit that I hacked this implementation to use CDATA for any string over 100 byte (arbitrary). The results I return to the caller usually include an XML document that's often sizable. The cost of encoding the < and & characters was fairly high and I achieved quite a performance improvement by just using CDATA. I didn't offer it to the community since I figure there was a very good reason why it wasn't used. But, I'm curious as to that reason.
If you are sending XML documents then using a CDATA section is a good idea (I presume that you handle the case where the document you are sending has a CDATA section in it). It might make sense to look at the numbers of <, > and & characters in a string and to switch to CDATA encoding if there are more than three or four of them.
CDATA sections don't work well if you have characters which have to be replaced by numeric character references because they can't be directly represented in the encoding as you need to end the CDATA section emit the numeric character reference and start the CDATA section again.
Its quite an interesting challenge to chose the optimal mixture of CDATA sections and character entities to represent an arbitrary document.
John Wilson The Wilson Partnership http://www.wilson.co.uk