Copilot commented on code in PR #3476:
URL: https://github.com/apache/parquet-java/pull/3476#discussion_r3106965051
##########
parquet-plugins/parquet-encoding-vector/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPacking512VectorLE.java:
##########
@@ -3175,13 +3171,56 @@ public final void unpackValuesUsingVector(final byte[]
in, final int inPos, fina
public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int
outPos) {
- ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos,
in.order(), inp_mask);
+ ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 =
byteVector.rearrange(perm_mask0).reinterpretAsInts();
tempRes1.intoArray(out, outPos, out_mask);
}
}
+ // TODO Replace these helpers with ByteVector.fromMemorySegment(...) once
the project's minimum
+ // supported JDK is >= 22 (where java.lang.foreign.MemorySegment became a
permanent API per
+ // JEP 454). fromMemorySegment is the direct successor to
ByteVector.fromByteBuffer (which was
+ // removed after JDK 21) and is intrinsifiable by HotSpot for both heap and
direct buffers via
+ // jdk.internal.misc.ScopedMemoryAccess, eliminating the byte[] copy that
this fallback uses
+ // for direct ByteBuffers. Until then, the implementations below are
constrained to APIs
+ // available under --release 17. A multi-release JAR overlay
(src/main/java22) would let JDK
+ // 22+ runtimes pick up the fromMemorySegment path automatically; see
GH-3475 discussion.
+ private static ByteVector fromByteBuffer(VectorSpecies<Byte> species,
ByteBuffer input, int inPos) {
+ // Heap buffers are loaded directly from their backing array: no
ByteBuffer.duplicate(),
+ // no intermediate byte[] allocation, no extra copy. ByteVector.fromArray
is intrinsified by
+ // HotSpot and lowered to a single AVX-512 load.
+ if (input.hasArray()) {
+ return ByteVector.fromArray(species, input.array(), input.arrayOffset()
+ inPos);
+ }
+ // Off-heap (direct) buffers: fall back to copying species.length() bytes
into a scratch
+ // array. The original JDK 17 fast path used
ScopedMemoryAccess.loadFromByteBuffer to avoid
+ // the copy, but that intrinsic is not exposed; the alternative
ByteVector.fromMemorySegment
+ // is only available on JDK 19+ and cannot be called from --release 17
sources.
+ return ByteVector.fromArray(species, readInputBytes(input, inPos,
species.length()), 0);
+ }
+
+ private static ByteVector fromByteBuffer(
+ VectorSpecies<Byte> species, ByteBuffer input, int inPos,
VectorMask<Byte> mask) {
+ // Mirror the fast path of OpenJDK 17's ByteVector.fromByteBuffer(species,
bb, offset, bo, m):
+ // when the full vector window fits in the buffer, read the entire window
unconditionally and
+ // let fromArray(species, array, 0, mask) apply the mask. This matches the
JDK semantics for
+ // arbitrary mask shapes (not just contiguous-prefix masks) and keeps
array.length equal to
+ // species.length(), which satisfies the bounds-check precondition of
fromArray.
+ if (input.hasArray()) {
+ return ByteVector.fromArray(species, input.array(), input.arrayOffset()
+ inPos, mask);
+ }
+ return ByteVector.fromArray(species, readInputBytes(input, inPos,
species.length()), 0, mask);
+ }
Review Comment:
The new `fromByteBuffer(...)` helpers introduce a distinct fallback path for
non-array-backed buffers (`!input.hasArray()`), including
`ByteBuffer.allocateDirect()` and read-only heap buffers. The existing vector
unpack tests only cover heap `ByteBuffer.wrap(byte[])`, so this
direct/read-only path is currently untested. Add a unit test that runs
`unpackValuesUsingVector(ByteBuffer, ...)` against a direct (and optionally
read-only) buffer to validate correctness and protect against regressions in
`readInputBytes`/masked loads on newer JDKs.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]