[ 
https://issues.apache.org/jira/browse/SPARK-57036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-57036.
---------------------------------
    Fix Version/s: 4.3.0
       Resolution: Fixed

Issue resolved by pull request 56082
[https://github.com/apache/spark/pull/56082]

> Use intrinsic bulk-fill APIs for constant-value WritableColumnVector methods
> ----------------------------------------------------------------------------
>
>                 Key: SPARK-57036
>                 URL: https://issues.apache.org/jira/browse/SPARK-57036
>             Project: Spark
>          Issue Type: Improvement
>          Components: SQL
>    Affects Versions: 4.3.0
>            Reporter: L. C. Hsieh
>            Assignee: L. C. Hsieh
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.3.0
>
>
> SPARK-57024 fixed the degenerate per-element loops in three bulk-fill
> methods (OnHeap.putNulls, OnHeap.putInts(rowId, count, value),
> OffHeap.putNulls). The same pattern still exists in six sibling methods:
>   OnHeapColumnVector:
>     putBooleans(int rowId, int count, boolean value)
>     putBytes(int rowId, int count, byte value)
>     putShorts(int rowId, int count, short value)
>     putLongs(int rowId, int count, long value)
>   OffHeapColumnVector:
>     putBooleans(int rowId, int count, boolean value)
>     putBytes(int rowId, int count, byte value)
> These should use the same Arrays.fill / Platform.setMemory intrinsic
> substitutions as SPARK-57024, with the same SET_MEMORY_THRESHOLD=128
> fallback in OffHeap for small-count fills (where the setMemory JNI cost
> dominates).
> Measured on Apple M4 Max + OpenJDK 21.0.8, using a new
> WritableColumnVectorBulkFillBenchmark:
> OffHeap byte fills (putBytes / putBooleans) at count=4096+:
>   baseline:  ~4,300 M/s  (per-byte Unsafe.putByte loop)
>   patched:   ~42,000 M/s (Platform.setMemory, threshold path)
>   -> ~10x improvement
> OnHeap byte fills at small/medium count (8 to 4096):
>   baseline:  loop, partially auto-vectorized by C2
>   patched:   Arrays.fill intrinsic
>   -> +5-25% improvement at counts <= 4096; saturated at memory bandwidth
>      for larger counts (where Arrays.fill and the C2-vectorized loop
>      converge)
> OnHeap shorts/longs: small (+1-14%) improvement at small counts,
> saturated at memory bandwidth for large counts. Included for consistency
> with the byte methods rather than for headline performance gains.
> The OffHeap multi-byte fills (putShorts / putInts / putLongs /
> putFloats / putDoubles) are NOT in scope: Platform.setMemory is byte-
> only and there is no clean intrinsic substitute for multi-byte values
> that aren't uniform-byte (the value=0 special case was prototyped under
> SPARK-57024 and rejected as offering no measurable gain).
> OnHeap putInts is excluded (already fixed in SPARK-57024).
> OnHeap putFloats / putDoubles are excluded (already use Arrays.fill).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to