On 03/12/2015 10:04 PM, Peter Levart wrote:
... putLongUnaligned in the style of above getLongUnaligned is more tricky with current code structure. But there may be a middle ground (or a sweet spot):


    public final void putLongUnaligned(Object o, long offset, long x) {
        if (((int) offset & 1) == 1) {
            putLongParts(o, offset,
                (byte) (x >>> 0),
                (short) (x >>> 8),
                (short) (x >>> 24),
                (short) (x >>> 40),
                (byte) (x >>> 56));
        } else if (((int) offset & 2) == 2) {
            putLongParts(o, offset,
                (short)(x >>> 0),
                (int)(x >>> 16),
                (short)(x >>> 48));
        } else if (((int) offset & 4) == 4) {
            putLongParts(o, offset,
                (int)(x >> 0),
                (int)(x >>> 32));
        } else {
            putLong(o, offset, x);
        }
    }


...this has the same number of branches, but less instructions. You also need the following two:


At least on Intel (with -XX:-UseUnalignedAccesses) above code (Unaligned2) is not any faster then your code (Unaligned) according to a JMH random-access test. Neither is the reversal of if/else branches (Unaligned1). Unaligned3 is switch-based variant (just get) and is slowest. Your variant seems to be the fastest by a hair:

Benchmark Mode Samples Score Score error Units j.t.UnalignedTest.getLongUnaligned avgt 5 16.375 0.837 ns/op j.t.UnalignedTest.getLongUnaligned1 avgt 5 18.340 0.617 ns/op j.t.UnalignedTest.getLongUnaligned2 avgt 5 16.784 0.969 ns/op j.t.UnalignedTest.getLongUnaligned3 avgt 5 19.634 0.871 ns/op j.t.UnalignedTest.putLongUnaligned avgt 5 15.521 0.589 ns/op j.t.UnalignedTest.putLongUnaligned1 avgt 5 16.676 1.042 ns/op j.t.UnalignedTest.putLongUnaligned2 avgt 5 16.394 3.028 ns/op


Regards, Peter

Peter

Reply via email to