wgtmac commented on code in PR #3476:
URL: https://github.com/apache/parquet-java/pull/3476#discussion_r3105435778
##########
.gitignore:
##########
@@ -20,4 +21,3 @@ target/
mvn_install.log
.vscode/*
.DS_Store
-
Review Comment:
Let's keep this blank line
##########
.github/workflows/vector-plugins.yml:
##########
@@ -46,12 +46,19 @@ jobs:
run: |
EXTRA_JAVA_TEST_ARGS=$(./mvnw help:evaluate
-Dexpression=extraJavaTestArgs -q -DforceStdout)
export MAVEN_OPTS="$MAVEN_OPTS $EXTRA_JAVA_TEST_ARGS"
- ./mvnw install --batch-mode -Pvector-plugins -DskipTests=true
-Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true
-Djava.version=${{ matrix.java }} -pl
parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks
-am
+ ./mvnw install --batch-mode -Pvector-plugins -DskipTests=true
-Dmaven.javadoc.skip=true -Dsource.skip=true -Dmaven.buildNumber.skip=true
-Dspotless.check.skip=true -Djava.version=${{ matrix.java }} -pl
parquet-plugins/parquet-encoding-vector,parquet-plugins/parquet-plugins-benchmarks
-am
Review Comment:
Is it possible to use similar pattern like `SPOTLESS_ARGS` below?
##########
parquet-plugins/parquet-encoding-vector/src/main/java/org/apache/parquet/column/values/bitpacking/ByteBitPacking512VectorLE.java:
##########
@@ -3175,13 +3171,56 @@ public final void unpackValuesUsingVector(final byte[]
in, final int inPos, fina
public final void unpackValuesUsingVector(
final ByteBuffer in, final int inPos, final int[] out, final int
outPos) {
- ByteVector byteVector = ByteVector.fromByteBuffer(B512, in, inPos,
in.order(), inp_mask);
+ ByteVector byteVector = fromByteBuffer(B512, in, inPos, inp_mask);
IntVector tempRes1 =
byteVector.rearrange(perm_mask0).reinterpretAsInts();
tempRes1.intoArray(out, outPos, out_mask);
}
}
+ // TODO Replace these helpers with ByteVector.fromMemorySegment(...) once
the project's minimum
+ // supported JDK is >= 22 (where java.lang.foreign.MemorySegment became a
permanent API per
+ // JEP 454). fromMemorySegment is the direct successor to
ByteVector.fromByteBuffer (which was
+ // removed after JDK 21) and is intrinsifiable by HotSpot for both heap and
direct buffers via
+ // jdk.internal.misc.ScopedMemoryAccess, eliminating the byte[] copy that
this fallback uses
+ // for direct ByteBuffers. Until then, the implementations below are
constrained to APIs
+ // available under --release 17. A multi-release JAR overlay
(src/main/java22) would let JDK
+ // 22+ runtimes pick up the fromMemorySegment path automatically; see
GH-3475 discussion.
+ private static ByteVector fromByteBuffer(VectorSpecies<Byte> species,
ByteBuffer input, int inPos) {
+ // Heap buffers are loaded directly from their backing array: no
ByteBuffer.duplicate(),
+ // no intermediate byte[] allocation, no extra copy. ByteVector.fromArray
is intrinsified by
+ // HotSpot and lowered to a single AVX-512 load.
+ if (input.hasArray()) {
+ return ByteVector.fromArray(species, input.array(), input.arrayOffset()
+ inPos);
+ }
+ // Off-heap (direct) buffers: fall back to copying species.length() bytes
into a scratch
+ // array. The original JDK 17 fast path used
ScopedMemoryAccess.loadFromByteBuffer to avoid
+ // the copy, but that intrinsic is not exposed; the alternative
ByteVector.fromMemorySegment
+ // is only available on JDK 19+ and cannot be called from --release 17
sources.
+ return ByteVector.fromArray(species, readInputBytes(input, inPos,
species.length()), 0);
+ }
+
+ private static ByteVector fromByteBuffer(
+ VectorSpecies<Byte> species, ByteBuffer input, int inPos,
VectorMask<Byte> mask) {
Review Comment:
I'm not an expert on this, but Codex complains as below:
The new fromByteBuffer(species, input, inPos, mask) helper changes the
contract of the old JDK 17 masked load. Per the JDK 17
ByteVector.fromByteBuffer(..., m) docs, only masked-on lanes must be in bounds,
but this fallback always copies species.length() bytes for direct buffers
before applying the mask. That breaks packers whose mask intentionally covers
fewer bytes than the vector width, such as Packer3 (B128 with a 12-byte mask):
a direct ByteBuffer with exactly the required packed bytes now throws
BufferUnderflowException, even though the old implementation and the
array-backed path both accept it. The fix needs to preserve masked-load bounds
semantics instead of unconditionally reading the whole vector window.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]