I was reading up on conversion of characters to UTF-8 and I now understand
why it is writing out UTF-8 (to be able to support most of the worlds
languages with minimal space?). But after reading up on the algorithms for
conversion as given below, does the writeChars method not support the
U+10000→U+10FFFF conversions or am I misreading the code?
Character Range
Bit Encoding
U+0000→U+007F
0xxxxxxx
U+0080→U+07FF
110xxxxx 10xxxxxx
U+0800→U+FFFF
1110xxxx 10xxxxxx 10xxxxxx
U+10000→U+10FFFF
11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
public void writeChars(String s, int start, int length)
throws IOException {
final int end = start + length;
for (int i = start; i < end; i++) {
final int code = (int)s.charAt(i);
if (code >= 0x01 && code <= 0x7F)
writeByte((byte)code);
else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) {
writeByte((byte)(0xC0 | (code >> 6)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
else {
writeByte((byte)(0xE0 | (code >>> 12)));
writeByte((byte)(0x80 | ((code >> 6) & 0x3F)));
writeByte((byte)(0x80 | (code & 0x3F)));
}
}
}