On 20 Jan 2004, at 03:32, Daniel L. Rall wrote:


I'd like to be able to send multibyte characters over XML-RPC. In CVS
revision 1.4, we basically removed this ability with the following log
message:

----------------------------
revision 1.4
date: 2002/08/20 16:48:49;  author: dlr;  state: Exp;  lines: +15 -6
writeObject(): Noted new exception thrown.

chardata(): Characters out of range for the XML spec won't be written
as XML to avoid parse errors on the client side.  Instead, and
exception will be thrown.  This was originally noted by John Wilson
<[EMAIL PROTECTED]>, with this follow up by Adam Megacz
<[EMAIL PROTECTED]>:

Ah, the joys of the self-contradictory XML-RPC spec ;)

"<string> [is an] ASCII string"

"Any characters are allowed in a string except < and &... A string can
be used to encode binary data."


Dave absolutely refuses to fix or clarify this.
----------------------------


Here's a patch to add that ability back in for Unicode encodings, encodings which I believe XML parsers are required to support by the XML specification.


Daniel,


all character encodings can represent all Unicode characters. The valid characters allowed by the XML spec (production 2 section 2.2) is #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] The last range is academic as Java only supports 16 bit Unicode characters.

i believe strongly that the XML-RPC implementation should refuse to emit non well formed XML. Therefore attempts to emit characters below 0X20 other than 0X09, 0X0A and 0X0D, within the range 0XD800 to 0XDFFF and above 0XFFFD should cause an exception to be thrown.

If the encoding is ISO 8859-1 then characters > 0XFF should be represented as numeric character references.

For UTF-8 and UTF-16 the checks for illegal characters should be performed but no numeric character reference encoding is needed.

I also think that it would be useful to support a conservative encoding where the XML declaration is omitted and all characters > 0X7F is encoded using numeric character references. This would mean that we would interoperate with all XML-RPC implementations as long as we are only sending US-ASCII characters. We can still send characters > 0X7F but not all implementations will cope with them.

I don't seem to be able to apply your patch to the CVS HEAD code (using Eclipse). So I can't provide you with more constructive comments on your modification. If you send me the actual file off list I'll take a detailed look at it.

Cheers



John Wilson
The Wilson Partnership
http://www.wilson.co.uk



Reply via email to