[
https://issues.apache.org/jira/browse/SPARK-57036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
L. C. Hsieh resolved SPARK-57036.
---------------------------------
Fix Version/s: 4.3.0
Resolution: Fixed
Issue resolved by pull request 56082
[https://github.com/apache/spark/pull/56082]
> Use intrinsic bulk-fill APIs for constant-value WritableColumnVector methods
> ----------------------------------------------------------------------------
>
> Key: SPARK-57036
> URL: https://issues.apache.org/jira/browse/SPARK-57036
> Project: Spark
> Issue Type: Improvement
> Components: SQL
> Affects Versions: 4.3.0
> Reporter: L. C. Hsieh
> Assignee: L. C. Hsieh
> Priority: Major
> Labels: pull-request-available
> Fix For: 4.3.0
>
>
> SPARK-57024 fixed the degenerate per-element loops in three bulk-fill
> methods (OnHeap.putNulls, OnHeap.putInts(rowId, count, value),
> OffHeap.putNulls). The same pattern still exists in six sibling methods:
> OnHeapColumnVector:
> putBooleans(int rowId, int count, boolean value)
> putBytes(int rowId, int count, byte value)
> putShorts(int rowId, int count, short value)
> putLongs(int rowId, int count, long value)
> OffHeapColumnVector:
> putBooleans(int rowId, int count, boolean value)
> putBytes(int rowId, int count, byte value)
> These should use the same Arrays.fill / Platform.setMemory intrinsic
> substitutions as SPARK-57024, with the same SET_MEMORY_THRESHOLD=128
> fallback in OffHeap for small-count fills (where the setMemory JNI cost
> dominates).
> Measured on Apple M4 Max + OpenJDK 21.0.8, using a new
> WritableColumnVectorBulkFillBenchmark:
> OffHeap byte fills (putBytes / putBooleans) at count=4096+:
> baseline: ~4,300 M/s (per-byte Unsafe.putByte loop)
> patched: ~42,000 M/s (Platform.setMemory, threshold path)
> -> ~10x improvement
> OnHeap byte fills at small/medium count (8 to 4096):
> baseline: loop, partially auto-vectorized by C2
> patched: Arrays.fill intrinsic
> -> +5-25% improvement at counts <= 4096; saturated at memory bandwidth
> for larger counts (where Arrays.fill and the C2-vectorized loop
> converge)
> OnHeap shorts/longs: small (+1-14%) improvement at small counts,
> saturated at memory bandwidth for large counts. Included for consistency
> with the byte methods rather than for headline performance gains.
> The OffHeap multi-byte fills (putShorts / putInts / putLongs /
> putFloats / putDoubles) are NOT in scope: Platform.setMemory is byte-
> only and there is no clean intrinsic substitute for multi-byte values
> that aren't uniform-byte (the value=0 special case was prototyped under
> SPARK-57024 and rejected as offering no measurable gain).
> OnHeap putInts is excluded (already fixed in SPARK-57024).
> OnHeap putFloats / putDoubles are excluded (already use Arrays.fill).
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]