Hi, On 02/24/2015 11:16 AM, Paul Sandoz wrote:
> This looks like a good start. Good, thanks. > On Feb 23, 2015, at 7:13 PM, Andrew Haley <a...@redhat.com> wrote: > >> I've been kicking around a few ideas for Unsafe access methods for unaligned >> access to byte arrays and buffers in order to provide "whatever second-best >> mechanism the platform offers". These would provide the base for fast >> lexicographic array comparisons, etc. >> >> https://bugs.openjdk.java.net/browse/JDK-8044082 >> >> If the platform supports unaligned memory accesses, the implementation of >> {get,put}-X-Unaligned is obvious and trivial for both C1 and C2. It gets >> interesting when we want to provide efficient unaligned methods on machines >> with no hardware support. >> >> We could provide compiler intrinsics which do when we need on such machines. >> However, I think this wouldn't deliver the best results. From the >> experiments I've done, the best implementation is to write the access >> methods in Java and allow HotSpot to optimize them. While this seemed a bit >> counter-intuitive to me, it's best because C2 has profile data that it can >> work on. In many cases I suspect that data read and written from a byte >> array will be aligned for their type and C2 will take advantage of this, >> relegating the misaligned access to an out-of-line code path as appropriate. > > I am all for keeping more code in Java if we can. I don't know enough about > assembler-based optimizations to determine if it might be possible to do > better on certain CPU architectures. Me either, but I have tested this on the architectures I have, and I suspect that C2 optimization is good enough. And we'd have to write assembly code for machines we haven't got; something for the future, I think. > One advantage, AFAIU, to intrinsics is they are not subject to the vagaries > of inlining thresholds. It's important that the loops operating over the > arrays to be compiled efficiently otherwise performance can drop off the > cliff if thresholds are reached within the loop. Perhaps these methods are > small enough it is not an issue? and also perhaps that is not a sufficient > argument to justify the cost of an intrinsic (and we should be really > tweaking the inlining mechanism)? Maybe so. There are essentially two ways to do this: new compiler node types which punt everything to the back end (and therefore require back-end authors to write them) or generic expanders, which is how many of the existing intrinsics are done. Generic C2 code would, I suspect, be worse than writing this in Java bacause it would be lacking profile data. > With that in mind is there any need to intrinsify the new methods at all > given those new Java methods can defer to the older ones based on a constant > check? Also should that anyway be done for the interpreter? > > > private static final boolean IS_UNALIGNED = theUnsafe.unalignedAccess(); > > public void putIntUnaligned(Object o, long offset, int x) { if (IS_UNALIGNED > || (offset & 3) == 0) { putInt(o, offset, x); } else if (byteOrder == > BIG_ENDIAN) { putIntB(o, offset, x); } else { putIntL(o, offset, x); } } Yes. It certainly could be done like this but I think C1 doesn't do the optimization to remove the IS_UNALIGNED test, so we'd still want the C1 builtins. Perhaps we could do without the C2 builtins but they cost very little, they save C2 a fair amount of work, and they remove the vagaries of inlining. I take your point about the interpreter, though. > I see you optimized the unaligned getLong by reading two aligned longs and > then bit twiddled. It seems harder to optimize the putLong by straddling an > aligned putInt with one to three required putByte. Sure, that's always a possibility. I have code to do it but it was all getting rather complicated for my taste. >> Also, these methods have the additional benefit that they are always atomic >> as long as the data are naturally aligned. > > We should probably document that in general access is not guaranteed to be > atomic and an implementation detail that it currently is when naturally so. I think that's a good idea. The jcstress tests already come up with a warning that the implementation is not atomic; this is not required, but a high-quality implementation will be. >> This does result in rather a lot of code for the methods for all sizes and >> endiannesses, but none of it will be used on machines with unaligned >> hardware support except in the interpreter. (Perhaps the interpreter too >> could have intrinsics?) >> >> I have changed HeapByteBuffer to use these methods, with a major performance >> improvement. I've also provided Unsafe methods to query endianness and >> alignment support. > > If we expose the endianness query via a new method in unsafe we should reuse > that in java.nio.Bits and get rid of the associated static code block. Sure, I already did that. Thanks, Andrew.