According to Unicode standart 4.0 (since 3.0) interpretation of non-shortest forms is forbidden for UTF-8. So if a byte sequence is not in table of well-formed UTF-8 byte sequences then it is considered as ill-formed and treated as error. Harmony follows Unicode spec. but RI doesn't. I didn't find in the spec. explanation but I assume it is caused by backward compatibility.
The following example demonstrates the difference. For example, code point '1071' should be represented by the next UTF-8 byte sequence <D0 AF>. But it may be represented as 3 bytes sequence: <E0 90 AF> that is its non-shortest form. So the following code prints "ERROR" on Harmony implementation and "Ok with non-shortest forms" on RI String s1 = new String(new byte[]{(byte) 0xE0, (byte) 0x90, (byte) 0xAF}, "UTF-8"); String s2 = new String(new char[]{1071}); if(s1.equals(s2)){ System.out.println("Ok with non-shortest forms"); } else { System.out.println("ERROR"); } We should decide whether we going to be compatible with RI or Unicode spec. Thanks, Stepan Mishura Intel Middleware Products Division