L. C. Hsieh created SPARK-57111:
-----------------------------------
Summary: Use intrinsic bulk-fill APIs for putNotNulls
Key: SPARK-57111
URL: https://issues.apache.org/jira/browse/SPARK-57111
Project: Spark
Issue Type: Improvement
Components: SQL
Affects Versions: 4.3.0
Reporter: L. C. Hsieh
Assignee: L. C. Hsieh
Follow-up to SPARK-57024 / SPARK-57036.
WritableColumnVector exposes a putNotNulls(rowId, count) method that
clears a run of the nulls bitmap. It's called once per batch from
WritableColumnVector.reset() (when numNulls > 0) and from the
appendNotNulls() path. Both OnHeap and OffHeap implementations are
per-element loops:
// OnHeap
for (int i = 0; i < count; ++i) {
nulls[rowId + i] = (byte)0;
}
// OffHeap
long offset = nulls + rowId;
for (int i = 0; i < count; ++i, ++offset) {
Platform.putByte(null, offset, (byte) 0);
}
This is the same pattern fixed for putNulls in SPARK-57024 and for
putBytes / putBooleans in SPARK-57036. The same intrinsic substitutions
apply:
- OnHeap.putNotNulls -> Arrays.fill(byte[], ..., (byte) 0)
- OffHeap.putNotNulls -> Platform.setMemory(addr, (byte) 0, count)
with the existing SET_MEMORY_THRESHOLD = 128 fallback to an inline
byte loop for small counts.
Measured locally on Apple M4 Max + OpenJDK 21 via
WritableColumnVectorBulkFillBenchmark (a new putNotNulls case added by
parity (the C2 compiler already auto-vectorizes the original byte
loop).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]