+/* Optimized for char set UTF-8 */ "charset" is a (poor misnomer!) jargon term, not two words.
--- + for (b = str[len]; b != '\0'; len++, b = str[len]) { + if (isAscii && b & 0x80) { + isAscii = JNI_FALSE; + } + } I would write this more like const signed char *p; int isAscii; for (isAscii = 0, p = (const signed char *) str; *p != '\0'; p++) isAscii &= (*p >= 0); Then length is (p - str) --- + jbyteArray hab = NULL; I'm having a hard time decoding the name "hab" --- The code below is not UTF-8 specific. Can it be refactored? + hab = (*env)->NewByteArray(env, len); + if (hab != 0) { + jclass strClazz = JNU_ClassString(env); + CHECK_NULL_RETURN(strClazz, 0); + (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte *)str); + result = (*env)->NewObject(env, strClazz, + String_init_ID, hab, jnuEncoding); + (*env)->DeleteLocalRef(env, hab); + return result; + } --- We probably want to use unicode escapes in out java sources to keep all source files strictly ASCII.