On 6/02/2016 8:21 AM, Mikael Vidstedt wrote:

I fully agree that moving the arguments checking up to Java makes more
sense, and I've prepared new webrevs which do exactly that, including
changes to address the other feedback from David, John and others:

Shouldn't the lowest-level do_conjoint_swap routines at least check preconditions with asserts to catch the cases where the calling Java code has failed to do the right thing? The other Copy methods seem to do this.

David
-----

hotspot:
http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04/hotspot/webrev/

jdk:
http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04/jdk/webrev/

Incremental webrevs for your convenience:

hotspot:
http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04.incr/hotspot/webrev/

jdk:
http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.04.incr/jdk/webrev/



I have done some benchmarking of this code and for large copies (16MB+)
this outperforms the old Bits.c implementation by *30-100%* depending on
platform and exact element sizes! For smaller copies the additional
checks which are now performed hurt performance on client VMs (80-90% of
old impl), but with the server VMs I see performance on par with, or in
most cases 5-10% better than the old implementation. There's a
potentially statistically significant regression of ~3-4% for
elemSize=2, but for now I'm going to declare success. There's certainly
room for further improvements here, but this should at least do for
addressing the original problem.


I filed https://bugs.openjdk.java.net/browse/JDK-8149159 for moving the
checks for Unsafe.copyMemory to Java, and will work on that next. I also
filed https://bugs.openjdk.java.net/browse/JDK-8149162 to cover the
potential renaming of the Bits methods to have more informative names.
Finally, I filed https://bugs.openjdk.java.net/browse/JDK-8149163 to
look at improving the behavior of Unsafe.addressSize(), after having
spent too much time trying to understand why the performance of the new
U.copySwapMemory Java checks wasn't quite living up to my expectations
(spoiler alert: Unsafe.addressSize() is not intrinsified, so will always
result in a call into the VM/unsafe.cpp).


Finally, I - too - would like to see the copy-swap logic moved into
Java, and as I mentioned I played around with that first before I
decided to do the native implementation to address the immediate
problem. Looking forward to what you find Paul!

Cheers,
Mikael

On 2016-02-05 05:00, Paul Sandoz wrote:
Hi,

Nice use of C++ templates :-)

Overall looks good.

I too would prefer if we could move the argument checking out, perhaps
even to the point of requiring callers do that rather than providing
another method, for example for Buffer i think the arguments are known
to be valid? I think in either case it is important to improve the
documentation on the method stating the constraints on arguments,
atomicity guarantees etc.

I have a hunch that for the particular case of copying-with-swap for
buffers i could get this to work work efficiently using Unsafe (three
separate methods for each unit type of 2, 4 and 8 bytes), since IIUC
the range is bounded to be less than Integer.MAX_VALUE so an int loop
rather than a long loop can be used and therefore safe points checks
will not be placed within the loop.

However, i think what you have done is more generally applicable and
could be made intrinsic. It would be a nice at some future point if it
could be made a pure Java implementation and intrinsified where
appropriate.

—

John, regarding array mismatch there were issues with the efficiency
of the unrolled loops with Unsafe access. (Since the loops were int
bases there were no issues with safe point checks.) Roland recently
fixed that so now code is generated that is competitive with direct
array accesses. We drop into the stub intrinsic and leverage 128bits
or 256bits where supported. Interestingly it seems the unrolled loop
using Unsafe is now slightly faster than the stub using 128bit
registers. I don’t know if that is due to unluckly alignment, and/or
the stub needs to do some manual unrolling. In terms of code-cache
efficiency the intrinsic is better.

Paul.





On 4 Feb 2016, at 06:27, John Rose <john.r.r...@oracle.com> wrote:

On Feb 2, 2016, at 11:25 AM, Mikael Vidstedt
<mikael.vidst...@oracle.com> wrote:
Please review this change which introduces a Copy::conjoint_swap and
an Unsafe.copySwapMemory method to call it from Java, along with the
necessary changes to have java.nio.Bits call it instead of the
Bits.c code.

http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.03/hotspot/webrev/

http://cr.openjdk.java.net/~mikael/webrevs/8141491/webrev.03/jdk/webrev/

This is very good.

I have some nit-picks:

These days, when we introduce a new intrinsic (@HSIntrCand),
we write the argument checking code separately in a non-intrinsic
bytecode method.  In this case, we don't (yet) have an intrinsic
binding for U.copy*, but we might in the future.  (C intrinsifies
memcpy, which is a precedent.)  In any case, I would prefer
if we could structure the argument checking code in a similar
way, with Unsafe.java containing both copySwapMemory
and a private copySwapMemory0.  Then we can JIT-optimize
the safety checks.

You might as well extend the same treatment to the pre-existing
copyMemory call.  The most important check (and the only one
in U.copyMemory) is to ensure that the size_t operand has not
wrapped around from a Java negative value to a crazy-large
size_t value.  That's the low-hanging fruit.  Checking the pointers
(for null or oob) is more problematic, of course.  Checking consistency
around elemSize is cheap and easy, so I agree that the U.copySM
should do that work also.  Basically, Unsafe can do very basic
checks if there is a tricky user model to enforce, but it mustn't
"sign up" to guard the user against all errors.

Rule of thumb:  Unsafe calls don't throw NPEs, they just SEGV.
And the rare bit that *does* throw (IAE usually) should be placed
into Unsafe.java, not unsafe.cpp.  (The best-practice rule for putting
argument checking code outside of the intrinsic is a newer one,
so Unsafe code might not always do this.)

The comment "Generalizing it would be reasonable, but requires
card marking" is bogus, since we never byte-swap managed pointers.

The test logic will flow a little smoother if your GenericPointer guy,
the onHeap version, stores the appropriate array base offset in his
offset field.
You won't have to mention p.isOnHeap nearly so much, and the code will
set a slightly better example.

The VM_ENTRY_BASE_FROM_LEAF macro is really cool.

The C++ template code is cool also.  It reminds me of the kind
of work Gosling's "Ace" processor could do, but now it's mainstreamed
for all to use in C++.  We're going to get some of that goodness
in Project Valhalla with specialization logic.

I find it amazing that the right way to code this in C is to
use memcpy for unaligned accesses and byte peek/poke
into registers for byte-swapping operators.  I'm glad we
can write this code *once* for the JVM and JDK.

Possible future work:  If we can get a better handle on
writing vectorizable loops from Java, including Unsafe-based
ones, we can move some of the C code back up to Java.
Perhaps U.copy* calls for very short lengths deserved to
be broken out into small loops of U.get/put* (with alignment).
I think you experimented with this, and there were problems
with the JIT putting fail-safe memory barriers between
U.get/put* calls.  Paul's work on Array.mismatch ran into
similar issues, with the right answer being to write manual
vector code in assembly.

Anyway, you can count me as a reviewer.

Thanks,

— John

Reply via email to