All,
I can't really figure out if I'm doing something wrong here or if there is a defect
involved. Basically, I have a Japanese string that I'm attempting
to send back to the client. However, when the client receives the string, it is
mangled beyond repair. I've put together a small test case, and
include it (and it's results here).
Here is the method that is invoked via Axis on the server:
public String getString() {
String str = "SDK
\u30e9\u30a4\u30bb\u30f3\u30b9\u304c\u898b\u3064\u304b\u308a\u307e\u305b\u3093\u3067\u3057\u305f\u3002";
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
return str;
}
The output of this method is as follows:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 12364
char[10]: 35211
...
I generated client side stubs via WSDL2Java, and put together a quick client that
simply does this:
String str = stub.getString();
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
This emits the following:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 227
char[5]: 402
char[6]: 169
char[7]: 227
char[8]: 8218
...
The first 4 chars are returned properly, but everything after that is completely
munged.
As near as I can tell, during serialization Axis is manually converting my string into
a UTF-8 encoded byte array. However, the inverse operation
does not appear to happen on the client side. Am I doing something wrong here, or is
this a defect?
Just for grins, I modified by client code to look like the following:
String str = stub.getString();
byte[] bytes = str.getBytes();
str = new String(bytes, "UTF-8");
for (int ii = 0; ii < str.length(); ii++) {
System.out.println("char[" + ii + "]: " + ((int) str.charAt(ii)));
}
The additional code attempts to reverse the manual encoding done within Axis; however,
it is not entirely successful:
char[0]: 83
char[1]: 68
char[2]: 75
char[3]: 32
char[4]: 12521
char[5]: 12452
char[6]: 12475
char[7]: 12531
char[8]: 12473
char[9]: 65533
char[10]: 63
The first 8 chars are correct, but after that it goes downhill...
It's worth pointing out that the version of Axis I'm using is a few months old:
WSDL created by Apache Axis version: 1.2dev
Built on Aug 26, 2003 (12:11:48 PDT)
I'm hesitant to update at this point due to project time constraints, but will if I
have to. Has this scenario been addressed in the newer builds?
Thanks,
--Doug