Re: RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Claes Redestad Thu, 15 Jun 2017 06:30:54 -0700

Hi Sherman,

On 06/15/2017 05:27 AM, Xueming Shen wrote:

Hi Claes,
The change looks fine. Yes, encoding the 2-byete latin1 at nativelooks reasonable.


Thanks for reviewing!

One nick picking is that it might be better to initialize the constant"LATIN1" from the String classwhen intializing String_coder_ID? recently I got a "naked constants"complain #JDK-8156530, soI guess it might be better to do so at you change, though I don'texpect it's going to change in the
foreseeable future, even we finally add a "utf8" variants there.

478 static const jbyte LATIN1 = 0;
479 static const jbyte UTF16  = 1;

Not sure how the "naked constants" issue applies, but maybe we can saveus somefuture surprises by adding a precondition check in the new test toensure theconstants defined in jni_util matches the constants defined injava.lang.String:


http://cr.openjdk.java.net/~redestad/8181147/jdk.06/

Does this suffice?

Thanks!

/Claes


thanks,
Sherman

On 6/13/17, 1:27 PM, Claes Redestad wrote:



On 2017-06-13 21:09, Martin Buchholz wrote:

Looks good to me, although still hoping for more review from others.


I expect Sherman to weigh in. :-)


+    if (bytes != 0) {
Style: use NULL.


Done.

+static jstring newStringJava(JNIEnv *env, const char *str) {
I expected two versions, one that took a length and one that didnot, as you did with newString8859_1.


Ah, yes. Refactored and updated in-place.

---
TIL that jbyte is signed char, not plain char, and so is moreconsistent with java "byte", hence less error prone, at least intheory.
---
Can we test different native environments by using-Dsun.jnu.encoding=... ?

I did try, but jtreg currently doesn't appear to support specifyingflags for main/native @run:s (and helpfully suggests usingmain/othervm instead...) - maybe someone knows a way around this?


/Claes

On Tue, Jun 13, 2017 at 7:03 AM, Claes Redestad<[email protected] <mailto:[email protected]>> wrote:


<adding build-dev; see
http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html

<http://mail.openjdk.java.net/pipermail/core-libs-dev/2017-June/048233.html>

    for missing context>

    Updated webrevs:
    http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
    http://cr.openjdk.java.net/~redestad/8181147/top.04/
<http://cr.openjdk.java.net/%7Eredestad/8181147/top.04/>

    In this version there are various cleanups based on Martin's
    comments, improved test coverage and some small changes to
    ensure the updated test builds and runs on all our supported
    platforms.

    If someone with access to any of the platforms not supported
    by Oracle could take this for a spin and ensure linking the newly
    added test against libjava works as expected (or suggest how
    to make it work if it doesn't) then that'd be much appreciated.

    Thanks!

    /Claes

    On 06/13/2017 02:01 AM, Martin Buchholz wrote:

        I'm hoping Xueming will review as well.
        +    char asciiCheck;
        +    for (asciiCheck = 0, p = str; *p != '\0'; p++) {
        +        asciiCheck |= *p;
        +    }
        +    len = (p - str);
        Probably conversion from ptrdiff_t to int needs a cast.

        Someday we might need to worry about input string longer than
        2^31, but that's a pre-existing problem.

        The signed-ness of char being implementation-defined makes me
        nervous.
        Maybe do all the bit operations using explicit unsigned char,
        then the final test becomes simply

        asciiCheck <= 0x80


        On Mon, Jun 12, 2017 at 4:36 PM, Claes Redestad
<[email protected] <mailto:[email protected]>
<mailto:[email protected]
<mailto:[email protected]>>> wrote:

            Hi Martin,

            thanks for reviewing!

            On 2017-06-12 22:14, Martin Buchholz wrote:

                +/* Optimized for char set UTF-8 */

                "charset" is a (poor misnomer!) jargon term, not two
        words.


            I got that from the existing use of "char set" in this
        file, but
            will fix it in all places.


                ---

+ for (b = str[len]; b != '\0'; len++, b =str[len]) {

                +        if (isAscii && b & 0x80) {
                +            isAscii = JNI_FALSE;
                +        }
                +    }

                I would write this more like

                const signed char *p;
                int isAscii;

                for (isAscii = 0, p = (const signed char *) str; *p !=
        '\0';
                p++) isAscii &= (*p >= 0);


            Did you mean for isAscii to be initialized to 1 (true) and
        then be
            cleared to 0 (false) when *p >= 0 is false?


                Then length is (p - str)


            How about something like this to hoist the second
        comparison from
            the loop:

                int len;
                char asciiCheck;
                for (asciiCheck = 0, p = str; *p != '\0'; p++) {
                    asciiCheck |= *p;
                }
                len = (p - str);

                if (asciiCheck & 0x80) {
                    // ascii fast-path
                    return newSizedString8859_1(env, str, len);
                }

                ...

                ---

                +    jbyteArray hab = NULL;

                I'm having a hard time decoding the name "hab"


            Not sure, but my guess is "heap allocated byteArray".


                ---

                The code below is not UTF-8 specific. Can it be
        refactored?

                +    hab = (*env)->NewByteArray(env, len);
                +    if (hab != 0) {
                +        jclass strClazz = JNU_ClassString(env);
                +        CHECK_NULL_RETURN(strClazz, 0);
                +        (*env)->SetByteArrayRegion(env, hab, 0, len,
        (jbyte
                *)str);
                +        result = (*env)->NewObject(env, strClazz,
                +  String_init_ID, hab,
                jnuEncoding);
                +        (*env)->DeleteLocalRef(env, hab);
                +        return result;
                +    }

Yes, probably. The excessive copy-paste here is due to acrash

            issue I ran into when
            Charset.isSupported was called without proper
        initialization. It
            has since been
            resolved, but seems I forgot to revisit this part.

            I also realized I haven't added a test for this method.
        I'll look
            into doing the
            refactoring and adding some sanity testing tomorrow.


                ---

                We probably want to use unicode escapes in out java
        sources to
                keep all source files strictly ASCII.


            You mean in the test? Sure.

            Refreshed the jdk webrev:
        http://cr.openjdk.java.net/~redestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/
<http://cr.openjdk.java.net/%7Eredestad/8181147/jdk.05/>> -
        more to
            come.

            Thanks!

            /Claes

Re: RFR(10): 8181147: JNU_GetStringPlatformChars should have a fast path for UTF-8

Reply via email to