Re: RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Claes Redestad Mon, 12 Jun 2017 16:37:43 -0700

Hi Martin,

thanks for reviewing!


On 2017-06-12 22:14, Martin Buchholz wrote:

+/* Optimized for char set UTF-8 */

"charset" is a (poor misnomer!) jargon term, not two words.

I got that from the existing use of "char set" in this file, but willfix it in all places.


---

+    for (b = str[len]; b != '\0'; len++, b = str[len]) {
+        if (isAscii && b & 0x80) {
+            isAscii = JNI_FALSE;
+        }
+    }

I would write this more like

const signed char *p;
int isAscii;

for (isAscii = 0, p = (const signed char *) str; *p != '\0'; p++)isAscii &= (*p >= 0);

Did you mean for isAscii to be initialized to 1 (true) and then becleared to 0 (false) when *p >= 0 is false?


Then length is (p - str)


How about something like this to hoist the second comparison from the loop:

    int len;
    char asciiCheck;
    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
        asciiCheck |= *p;
    }
    len = (p - str);

    if (asciiCheck & 0x80) {
        // ascii fast-path
        return newSizedString8859_1(env, str, len);
    }

    ...

---

+    jbyteArray hab = NULL;

I'm having a hard time decoding the name "hab"


Not sure, but my guess is "heap allocated byteArray".


---

The code below is not UTF-8 specific. Can it be refactored?

+    hab = (*env)->NewByteArray(env, len);
+    if (hab != 0) {
+        jclass strClazz = JNU_ClassString(env);
+        CHECK_NULL_RETURN(strClazz, 0);
+        (*env)->SetByteArrayRegion(env, hab, 0, len, (jbyte *)str);
+        result = (*env)->NewObject(env, strClazz,
+                                       String_init_ID, hab, jnuEncoding);
+        (*env)->DeleteLocalRef(env, hab);
+        return result;
+    }

Yes, probably. The excessive copy-paste here is due to a crash issue Iran into whenCharset.isSupported was called without proper initialization. It hassince been

resolved, but seems I forgot to revisit this part.

I also realized I haven't added a test for this method. I'll look intodoing the

refactoring and adding some sanity testing tomorrow.

---
We probably want to use unicode escapes in out java sources to keepall source files strictly ASCII.


You mean in the test? Sure.

Refreshed the jdk webrev:http://cr.openjdk.java.net/~redestad/8181147/jdk.05/ - more to come.


Thanks!

/Claes

Re: RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Reply via email to