I'd like to be able to send multibyte characters over XML-RPC. In CVS
revision 1.4, we basically removed this ability with the following log
message:

----------------------------
revision 1.4
date: 2002/08/20 16:48:49;  author: dlr;  state: Exp;  lines: +15 -6
writeObject(): Noted new exception thrown.

chardata(): Characters out of range for the XML spec won't be written
as XML to avoid parse errors on the client side.  Instead, and
exception will be thrown.  This was originally noted by John Wilson
<[EMAIL PROTECTED]>, with this follow up by Adam Megacz
<[EMAIL PROTECTED]>:

> Ah, the joys of the self-contradictory XML-RPC spec ;)
>
>   "<string> [is an] ASCII string"
>
>   "Any characters are allowed in a string except < and &... A string can
>    be used to encode binary data."
>
> Dave absolutely refuses to fix or clarify this.
----------------------------


Here's a patch to
add that ability back in for Unicode encodings, encodings which I
believe XML parsers are required to support by the XML specification.
-- 

Daniel Rall



Index: XmlWriter.java
===================================================================
RCS file: /home/cvs/ws-xmlrpc/src/java/org/apache/xmlrpc/XmlWriter.java,v
retrieving revision 1.6
diff -u -u -r1.6 XmlWriter.java
--- XmlWriter.java      21 Nov 2002 21:57:39 -0000      1.6
+++ XmlWriter.java      20 Jan 2004 03:24:07 -0000
@@ -312,6 +312,11 @@
         throws XmlRpcException, IOException
     {
         int l = text.length ();
+        String enc = super.getEncoding();
+        boolean isUnicode = UTF8.equals(enc) || "UTF-16".equals(enc);
+        // ### TODO: Use a buffer rather than going character by
+        // ### character to scale better for large text sizes.
+        //char[] buf = new char[32];
         for (int i = 0; i < l; i++)
         {
             char c = text.charAt (i);
@@ -332,16 +337,38 @@
                 write(AMPERSAND_ENTITY);
                 break;
             default:
-                if (c < 0x20 || c > 0xff)
+                if (c < 0x20 || c > 0x7f)
                 {
                     // Though the XML-RPC spec allows any ASCII
                     // characters except '<' and '&', the XML spec
                     // does not allow this range of characters,
                     // resulting in a parse error from most XML
-                    // parsers.
-                    throw new XmlRpcException(0, "Invalid character data " +
-                                              "corresponding to XML entity &#" +
-                                              String.valueOf((int) c) + ';');
+                    // parsers.  However, the XML spec does require
+                    // XML parsers to support UTF-8 and UTF-16.
+                    if (isUnicode)
+                    {
+                        if (c < 0x20)
+                        {
+                            // Entity escape the character.
+                            write("&#");
+                            // ### Do we really need the String conversion?
+                            write(String.valueOf((int) c));
+                            write(';');
+                        }
+                        else // c > 0x7f
+                        {
+                            // Write the character in our encoding.
+                            write(new String(String.valueOf(c).getBytes(enc)));
+                        }
+                    }
+                    else
+                    {
+                        throw new XmlRpcException(0, "Invalid character data "
+                                                  + "corresponding to XML "
+                                                  + "entity &#"
+                                                  + String.valueOf((int) c)
+                                                  + ';');
+                    }
                 }
                 else
                 {

Reply via email to