On 03/12/2015 08:29 PM, Peter Levart wrote:


On 03/12/2015 07:37 PM, Andrew Haley wrote:
On 03/12/2015 05:15 PM, Peter Levart wrote:
...or are JIT+CPU smart enough and there would be no difference?
C2 always orders things based on profile counts, so there is no
difference.  Your suggestion would be better for interpreted code
and I guess C1 also, so I agree it is worthwhile.

Thanks,
Andrew.


What about the following variant (or similar with ifs in case switch is sub-optimal):

    public final long getLongUnaligned(Object o, long offset) {
        switch ((int) offset & 7) {
            case 1:
            case 5: return
                (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getShort(o, offset + 1)) << pickPos(48, 8)) | (toUnsignedLong(getInt(o, offset + 3)) << pickPos(32, 24)) | (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 56));
            case 2:
            case 6: return
                (toUnsignedLong(getShort(o, offset)) << pickPos(48, 0)) |
(toUnsignedLong(getInt(o, offset + 2)) << pickPos(32, 16)) | (toUnsignedLong(getShort(o, offset + 6)) << pickPos(48, 48));
            case 3:
            case 7: return
                (toUnsignedLong(getByte(o, offset)) << pickPos(56, 0)) |
(toUnsignedLong(getInt(o, offset + 1)) << pickPos(32, 8)) | (toUnsignedLong(getShort(o, offset + 5)) << pickPos(48, 40)) | (toUnsignedLong(getByte(o, offset + 7)) << pickPos(56, 56));
            case 4: return
                (toUnsignedLong(getInt(o, offset)) << pickPos(32, 0)) |
(toUnsignedLong(getInt(o, offset + 4)) << pickPos(32, 32));
            case 0:
            default: return
                getLong(o, offset);
        }
    }


...it may have more branches, but less instructions in average per call.



Peter


... putLongUnaligned in the style of above getLongUnaligned is more tricky with current code structure. But there may be a middle ground (or a sweet spot):


    public final void putLongUnaligned(Object o, long offset, long x) {
        if (((int) offset & 1) == 1) {
            putLongParts(o, offset,
                (byte) (x >>> 0),
                (short) (x >>> 8),
                (short) (x >>> 24),
                (short) (x >>> 40),
                (byte) (x >>> 56));
        } else if (((int) offset & 2) == 2) {
            putLongParts(o, offset,
                (short)(x >>> 0),
                (int)(x >>> 16),
                (short)(x >>> 48));
        } else if (((int) offset & 4) == 4) {
            putLongParts(o, offset,
                (int)(x >> 0),
                (int)(x >>> 32));
        } else {
            putLong(o, offset, x);
        }
    }


...this has the same number of branches, but less instructions. You also need the following two:


private void putLongParts(Object o, long offset, byte i0, short i12, short i34, short i56, byte i7) {
        putByte(o, offset + 0, pick(i0, i7));
        putShort(o, offset + 1, pick(i12, i56));
        putShort(o, offset + 3, i34);
        putShort(o, offset + 5, pick(i56, i12));
        putByte(o, offset + 7, pick(i7, i0));
    }

private void putLongParts(Object o, long offset, short i0, int i12, short i3) {
        putShort(o, offset + 0, pick(i0, i3));
        putInt(o, offset + 2, i12);
        putShort(o, offset + 6, pick(i3, i0));
    }



Regards, Peter

Reply via email to