I'd like to be able to send multibyte characters over XML-RPC. In CVS revision 1.4, we basically removed this ability with the following log message:
---------------------------- revision 1.4 date: 2002/08/20 16:48:49; author: dlr; state: Exp; lines: +15 -6 writeObject(): Noted new exception thrown. chardata(): Characters out of range for the XML spec won't be written as XML to avoid parse errors on the client side. Instead, and exception will be thrown. This was originally noted by John Wilson <[EMAIL PROTECTED]>, with this follow up by Adam Megacz <[EMAIL PROTECTED]>: > Ah, the joys of the self-contradictory XML-RPC spec ;) > > "<string> [is an] ASCII string" > > "Any characters are allowed in a string except < and &... A string can > be used to encode binary data." > > Dave absolutely refuses to fix or clarify this. ---------------------------- Here's a patch to add that ability back in for Unicode encodings, encodings which I believe XML parsers are required to support by the XML specification. -- Daniel Rall Index: XmlWriter.java =================================================================== RCS file: /home/cvs/ws-xmlrpc/src/java/org/apache/xmlrpc/XmlWriter.java,v retrieving revision 1.6 diff -u -u -r1.6 XmlWriter.java --- XmlWriter.java 21 Nov 2002 21:57:39 -0000 1.6 +++ XmlWriter.java 20 Jan 2004 03:24:07 -0000 @@ -312,6 +312,11 @@ throws XmlRpcException, IOException { int l = text.length (); + String enc = super.getEncoding(); + boolean isUnicode = UTF8.equals(enc) || "UTF-16".equals(enc); + // ### TODO: Use a buffer rather than going character by + // ### character to scale better for large text sizes. + //char[] buf = new char[32]; for (int i = 0; i < l; i++) { char c = text.charAt (i); @@ -332,16 +337,38 @@ write(AMPERSAND_ENTITY); break; default: - if (c < 0x20 || c > 0xff) + if (c < 0x20 || c > 0x7f) { // Though the XML-RPC spec allows any ASCII // characters except '<' and '&', the XML spec // does not allow this range of characters, // resulting in a parse error from most XML - // parsers. - throw new XmlRpcException(0, "Invalid character data " + - "corresponding to XML entity &#" + - String.valueOf((int) c) + ';'); + // parsers. However, the XML spec does require + // XML parsers to support UTF-8 and UTF-16. + if (isUnicode) + { + if (c < 0x20) + { + // Entity escape the character. + write("&#"); + // ### Do we really need the String conversion? + write(String.valueOf((int) c)); + write(';'); + } + else // c > 0x7f + { + // Write the character in our encoding. + write(new String(String.valueOf(c).getBytes(enc))); + } + } + else + { + throw new XmlRpcException(0, "Invalid character data " + + "corresponding to XML " + + "entity &#" + + String.valueOf((int) c) + + ';'); + } } else {