Re: Unsafe.{get,put}-X-Unaligned; Efficient array comparison intrinsics

Andrew Haley Tue, 24 Feb 2015 05:50:39 -0800

Hi,

On 02/24/2015 11:16 AM, Paul Sandoz wrote:


> This looks like a good start.

Good, thanks.

> On Feb 23, 2015, at 7:13 PM, Andrew Haley <a...@redhat.com> wrote:
> 
>> I've been kicking around a few ideas for Unsafe access methods for unaligned 
>> access to byte arrays and buffers in order to provide "whatever second-best 
>> mechanism the platform offers".  These would provide the base for fast 
>> lexicographic array comparisons, etc.
>> 
>> https://bugs.openjdk.java.net/browse/JDK-8044082
>> 
>> If the platform supports unaligned memory accesses, the implementation of 
>> {get,put}-X-Unaligned is obvious and trivial for both C1 and C2. It gets 
>> interesting when we want to provide efficient unaligned methods on machines 
>> with no hardware support.
>> 
>> We could provide compiler intrinsics which do when we need on such machines. 
>>  However, I think this wouldn't deliver the best results. From the 
>> experiments I've done, the best implementation is to write the access 
>> methods in Java and allow HotSpot to optimize them.  While this seemed a bit 
>> counter-intuitive to me, it's best because C2 has profile data that it can 
>> work on.  In many cases I suspect that data read and written from a byte 
>> array will be aligned for their type and C2 will take advantage of this, 
>> relegating the misaligned access to an out-of-line code path as appropriate.
> 
> I am all for keeping more code in Java if we can. I don't know enough about 
> assembler-based optimizations to determine if it might be possible to do 
> better on certain CPU architectures.

Me either, but I have tested this on the architectures I have, and I
suspect that C2 optimization is good enough.  And we'd have to write
assembly code for machines we haven't got; something for the future, I
think.

> One advantage, AFAIU, to intrinsics is they are not subject to the vagaries 
> of inlining thresholds. It's important that the loops operating over the 
> arrays to be compiled efficiently otherwise performance can drop off the 
> cliff if thresholds are reached within the loop. Perhaps these methods are 
> small enough it is not an issue? and also perhaps that is not a sufficient 
> argument to justify the cost of an intrinsic (and we should be really 
> tweaking the inlining mechanism)?

Maybe so.  There are essentially two ways to do this: new compiler
node types which punt everything to the back end (and therefore
require back-end authors to write them) or generic expanders, which is
how many of the existing intrinsics are done.  Generic C2 code would,
I suspect, be worse than writing this in Java bacause it would be
lacking profile data.

> With that in mind is there any need to intrinsify the new methods at all 
> given those new Java methods can defer to the older ones based on a constant 
> check? Also should that anyway be done for the interpreter?
> 
> 
> private static final boolean IS_UNALIGNED = theUnsafe.unalignedAccess();
> 
> public void putIntUnaligned(Object o, long offset, int x) { if (IS_UNALIGNED 
> || (offset & 3) == 0) { putInt(o, offset, x); } else if (byteOrder == 
> BIG_ENDIAN) { putIntB(o, offset, x); } else { putIntL(o, offset, x); } }

Yes.  It certainly could be done like this but I think C1 doesn't do
the optimization to remove the IS_UNALIGNED test, so we'd still want
the C1 builtins.  Perhaps we could do without the C2 builtins but they
cost very little, they save C2 a fair amount of work, and they remove
the vagaries of inlining.  I take your point about the interpreter,
though.

> I see you optimized the unaligned getLong by reading two aligned longs and 
> then bit twiddled. It seems harder to optimize the putLong by straddling an 
> aligned putInt with one to three required putByte.

Sure, that's always a possibility.  I have code to do it but it was
all getting rather complicated for my taste.

>> Also, these methods have the additional benefit that they are always atomic 
>> as long as the data are naturally aligned.
> 
> We should probably document that in general access is not guaranteed to be 
> atomic and an implementation detail that it currently is when naturally so.

I think that's a good idea.  The jcstress tests already come up with a
warning that the implementation is not atomic; this is not required,
but a high-quality implementation will be.

>> This does result in rather a lot of code for the methods for all sizes and 
>> endiannesses, but none of it will be used on machines with unaligned 
>> hardware support except in the interpreter.  (Perhaps the interpreter too 
>> could have intrinsics?)
>> 
>> I have changed HeapByteBuffer to use these methods, with a major performance 
>> improvement.  I've also provided Unsafe methods to query endianness and 
>> alignment support.
> 
> If we expose the endianness query via a new method in unsafe we should reuse 
> that in java.nio.Bits and get rid of the associated static code block.

Sure, I already did that.

Thanks,
Andrew.

Re: Unsafe.{get,put}-X-Unaligned; Efficient array comparison intrinsics

Reply via email to