> On 30 Mar 2016, at 12:10, Andrew Haley <a...@redhat.com> wrote: > > On 03/30/2016 10:36 AM, Paul Sandoz wrote: > >> Unsafe.getLongUnaligned needs to perform three alignment checks >> before accessing bytes and then optionally perform a byte >> swap. Bits.getLong always access bytes and composing using the >> defined endianness and requires no additional byte swap. >> >> Since SPARC is Big Endian and buffers are by default Big Endian i >> can rule any byte swapping, but it potentially could increase the >> regression if Little Endian is chosen [2]. >> >> When access is performed in loops this can cost, as the alignment >> checks are not hoisted out. Theoretically could for regular 2, 4, 8 >> strides through the buffer contents. For such cases alignment of the >> base address can be checked. Not sure how complicated that would be >> to support. > > Going back to the "Unsafe.{get,put}-X-Unaligned performance" and > "Unsafe.{get,put}-X-Unaligned; Efficient array comparison intrinsics" > discussion a year ago, the rationale behind the way things are was to > enable auto-vectorization. We certainly can do things with more > complex checks at the Java level, but IMVHO vectorization is the right > way to fix it. >
I would be reluctant to add more complex checks to the Java implementation, it’s likely to push around the regression on SPARC. Unsafe.get*Unaligned is a "poor man’s” form of vectorization :-) But you are right that a higher-level vectorization of loops is a better longer term overall strategy, however that will still be tricky for architectures that can only vectorize with access on aligned boundaries. I am trying to think of opportunistic shorter term fixes that might work on current SPARC architectures. Paul.