On Thu, 5 Feb 2026 13:24:30 GMT, Mikhail Ablakatov <[email protected]> 
wrote:

>> Also: why not allow a vector with only 2 elements? Is there some restriction 
>> here?
>
> Hi @eme64 . That's probably not the only contributing factor but there's a 
> significant difference in latency if we compare a sequence of scalar `addf` 
> to the SVE F16 `fadda` instruction. According to [Neoverse 
> V1](https://developer.arm.com/documentation/109897/latest/) and [Neoverse 
> V2](https://developer.arm.com/documentation/109898/latest/) SWOGs, `fadda` 
> has an execution latency of 19 and 10 cycles for 16 and 8 elements-long 
> vector registers respectively. Scalar `fadd` has an execution latency of 2 
> cycles, which sums up to 32 and 16 cycles for 16 and 8 values respectively. I 
> hope this explanation makes sense and helps.

Ok, sounds good. Thanks for the explanations! I'll be on leave next week as 
well, so no hurry :)

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/27526#discussion_r2769460390

Reply via email to