Re: Request for review: 6896617: Optimize sun.nio.cs.ISO_8859_1$Encode.encodeArrayLoop() on x86

Ulf Zibis Mon, 21 Jan 2013 15:30:57 -0800

Am 17.01.2013 05:27, schrieb Vladimir Kozlov:

On 1/12/13 12:37 AM, Ulf Zibis wrote:

3) bugs.sun.com/bugdatabase/view_bug.do?bug_id=6896617 ==> This bug is
not available.


I opened it, should show up in few days.


Thanks!

4) What specific operation should be done by the intrinsic, i.e. is
there a fixed API for that method ???
When C2 (server JIT compiler in JVM) compiles encode methods it will replace new methodencodeArray() (matched by signature) with hand optimized assembler code which uses latestprocessor instructions. I will send Hotspot changes soon. So it is nothing to do with interpreteror bytecode sequence.
5) Can an intrinsic write back more than 1 value (see my hack via int[]
p) ?
6) Vladimir's webrev shows an integer as return type for that method,
I've added a variant with boolean return type, and the code from my last
approach could be transformed to a method with Object return type.

I wanted to say, there would maybe different results for the surrounding code depending on the APIwe choose for the intrinsic method call. If the surrounding code is *as short as possible*, there isa better chance, it will be JIT-compiled too after fewer invocations of encodeArrayLoop(). I guess,the fastest would be:


// while inlinig, JIT will erase the surrounding int[] p
private static CoderResult copyISOs(CoderResult cr,
        char[] sa, byte[] da, final int[] p, int sl) {
    for (int sp = p[0], dp = p[1]; sp < sl; sp++) {
        char c = sa[sp];
        // if (c & '\uFF00' != 0) // needs bug 6935994 to be fixed, would be 
fastest
        if ((byte)(c >> 8) != 0) // temporary replacement, fast byte-length 
operation on x86
            return null;
        da[dp++] = (byte)c;
    }
    return cr;
}

// No more needs try...finally block
private CoderResult encodeArrayLoop(
        CharBuffer src, ByteBuffer dst) {
    char[] sa = src.array();
    int soff = src.arrayOffset();
    int sp = soff + src.position();
    int sr = src.remaining();
    byte[] da = dst.array();
    int doff = dst.arrayOffset();
    int dp = doff + dst.position();
    int dr = dst.remaining();
    CoderResult cr;
    if (dr < sr) {
        sr = dr;
        cr = CoderResult.OVERFLOW;
    } else
        cr = CoderResult.UNDERFLOW;
    int sl = sp + sr;
    final int[] p = { sp, dp };
    cr = copyISOs(cr, sa, da, p, sl);
    src.position(p[0] - soff);
    dst.position(p[1] - doff);
    return result(cr, sa, p[0], sl);

}

// if adapted, maybe could also be reused in encodeBufferLoop()
private static CoderResult result(CoderResult cr, byte[] sa, int sp, int sl) {
    return cr != null ? cr :
        sgp.parse(sa[sp], sa, sp, sl) < 0
            ? sgp.error();
            : sgp.unmappableResult();
}


... so waiting for Vladimir's feedback :-[
(especially on performance/hsdis results)


Performance on x86 tested with next code (whole test will be in Hotspot 
changes) :

        ba = CharBuffer.wrap(a);
        bb = ByteBuffer.wrap(b);
        long start = System.currentTimeMillis();
        for (int i = 0; i < 1000000; i++) {
            ba.clear(); bb.clear();
            enc_res = enc_res && enc.encode(ba, bb, true).isUnderflow();
        }
        long end = System.currentTimeMillis();


1.) Wouldn't System.nanoTime() give more accurate results?

2.) I want to point out that it is not real world scenario, encoding the same data 1.000.000 times.If same data is used, it is likely, that the data itself becomes cached in the inner CPU cache soshould have very fast access times, which would be not the case on real world data.3.) It would also be interesting to see the results for less than 1.000.000 iterations inconsidering, the surrounding code would be JIT-compiled or not. Also performance on C1 should be tested.

I also worry about the naming of method encodeArray(...). I think, it should reflect the fact, thatit only encodes ISO-8859-1 charset characters.


Please add a comment on the fact, that method encodeArray(...) is intended to 
be intrinsified.

-Ulf

Re: Request for review: 6896617: Optimize sun.nio.cs.ISO_8859_1$Encode.encodeArrayLoop() on x86

Reply via email to