Re: [External] : Proposal for SIMD/vectorized implementations for primitive array operations

David Alayachew Sat, 06 Jun 2026 11:08:19 -0700

Good to hear from you again Arnav,

This is a much more viable proposal. A perfomance analysis with metrics to
back it up is very likely to be a welcome change. And even if it's not the
exact path taken, the information gathered is useful on its own.


The only downside is that the Vector tools in the JDK are still in preview,
and thus, may or may not be usable for this implementation *yet*.
Nonetheless, I assure you that your thinking is in the right place.

So, I'd start with that question -- is the Vector API mature enough that it
can be used for this **now**, or not?


On Sat, Jun 6, 2026, 1:48 PM Arnav Somaghatta <[email protected]>
wrote:

>
> Hello,
>
> My name is Arnav Somaghatta and I am a rising developer who is 14 years
> old and is interested in contributing performance improvements to existing
> primitive array operations in the JDK, especially by introducing vectorized
> fast paths where applicable.
>
> Based on benchmarking work with small primitive arrays, I have observed
> that certain operations, such as scans and comparisons, still rely on
> scalar loops that may leave SIMD/vectorization potential unused in some
> cases. As a small concrete example, I ran a JMH benchmark comparing
> Arrays.mismatch(byte[], byte[]) against an equivalent naive scalar
> implementation across 64 byte, 256 byte, 1024 byte, and 8192 byte arrays,
> each with a single mismatch at the end. On my PC, using JDK 21 and JMH
> 1.37, Arrays.mismatch runs at about 4.5 ns/op, 9.9 ns/op, 28.5 ns/op, and
> 207.9 ns/op for those sizes, respectively, while the naive loop takes about
> 15.9 ns/op, 50.9 ns/op, 216.0 ns/op, and 1656.7 ns/op, respectively. That
> is approximately a 3.5x speedup at 64 bytes, 5.1x at 256 bytes, 7.6x at
> 1024 bytes, and 8.0x at 8192 bytes. This suggests that optimized
> implementations can provide substantial wins even for relatively small
> primitive arrays, and I would like to explore whether similar fast paths
> could be applied more broadly in core library array operations.
>
> I am not proposing any new public APIs. Instead, my goal is to work on
> improving the internal implementations of existing methods such as:
> - Arrays.mismatch(...)
> - primitive array equality checks
> - byte/char scan heavy operations (like String related internals)
>
> My intent is to investigate whether SIMD or vector based implementations,
> via the Vector API or intrinsics where appropriate, could provide
> meaningful performance improvements for small to medium arrays without
> negatively affecting maintainability.
>
> I would like to do the implementation work myself; however, before I begin
> prototyping, I wanted to ask whether this direction is considered viable
> for core libs work, and if there are specific implementation areas that
> would be most appropriate to target first.
>
> Thank you so much for your time.
>
> Best regards,
> Arnav

Re: [External] : Proposal for SIMD/vectorized implementations for primitive array operations

Reply via email to