REPOST: Character encoding problems sending UTF-8 back to client

Douglas Bitting Thu, 19 Feb 2004 12:00:35 -0800

I apologize if this has come across already, but I still haven't seen it on the 
mailing list after 18 hours.


All,

I can't really figure out if I'm doing something wrong here or if there is a defect 
involved.  Basically, I have a Japanese string that I'm attempting
to send back to the client.  However, when the client receives the string, it is 
mangled beyond repair.  I've put together a small test case, and
include it (and it's results here).

Here is the method that is invoked via Axis on the server:

   public String getString() {
      String str = "SDK 
\u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }
      return str;
   }

The output of this method is as follows:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 12364
char[10]: 35211
...

I generated client side stubs via WSDL2Java, and put together a quick client that 
simply does this:

      String str = stub.getString();
      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

This emits the following:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 227
char[5]: 402
char[6]: 169
char[7]: 227
char[8]: 8218
...

The first 4 chars are returned properly, but everything after that is completely 
munged.

As near as I can tell, during serialization Axis is manually converting my string into 
a UTF-8 encoded byte array.  However, the inverse operation
does not appear to happen on the client side.  Am I doing something wrong here, or is 
this a defect?

Just for grins, I modified by client code to look like the following:

      String str = stub.getString();

      byte[] bytes = str.getBytes();
      str = new String(bytes, "UTF-8");

      for (int ii = 0; ii < str.length(); ii++) {
         System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
      }

The additional code attempts to reverse the manual encoding done within Axis; however, 
it is not entirely successful:

char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 65533
char[10]: 63

The first 8 chars are correct, but after that it goes downhill...

It's worth pointing out that the version of Axis I'm using is a few months old:

WSDL created by Apache Axis version: 1.2dev
Built on Aug 26, 2003 (12:11:48 PDT)

I'm hesitant to update at this point due to project time constraints, but will if I 
have to.  Has this scenario been addressed in the newer builds?

Thanks,
--Doug

REPOST: Character encoding problems sending UTF-8 back to client

Reply via email to