Re: [PATCH v2] RISC-V: Add LMUL-aware RVV cost model for the Spacemit-X60 core

Nikola Ratkovac Fri, 20 Feb 2026 07:20:40 -0800

>> This patch introduces a vector cost model for the Spacemit-X60 core,
>> using dynamic LMUL scaling with the -madjust-lmul-cost flag.
>>
>> Compared to the previous patch, I dropped the local 'vector_lmul'
>> attribute and the corresponding LMUL-aware cost logic in spacemit-x60.md.
>> Instead, Spacemit-X60 tuning now enables -madjust-lmul-cost implicitly,
>> and riscv_sched_adjust_cost is updated so that the adjustment applies to
>> spacemit_x60 in addition to the generic out-of-order model.
>>
>> The stress tests I previously used to tune individual instruction costs
>> (with the LMUL-aware logic implemented directly in spacemit-x60.md)
>> now show a regression in performance. The most likely cause is the implicit
>> -madjust-lmul-cost scaling, since some instructions performed better
>> with non-power-of-two scaling (or with no LMUL scaling at all), so the
>> uniform ×(1,2,4,8) adjustment affects performance.
>>
>> Updated performance results:
>>
>> | Benchmark        | Metric | Trunk            | Vector Cost Model | Δ (%)   
>> |
>> |------------------|--------|------------------|-------------------|---------|
>> | SciMark2-C       | cycles | 311,450,555,453  | 313,278,899,107   | +0.56% |
>> |------------------|--------|------------------|-------------------|---------|
>> | tramp3d-v4       | cycles | 23,788,980,247   | 21,073,526,428    | -12.89% 
>> |
>> |------------------|--------|------------------|-------------------|---------|
>> | Freebench/neural | cycles | 471,707,641      | 435,842,612       | -8.23%  
>> |
>> |------------------|--------|------------------|-------------------|---------|


>>
>> Benchmarks were run from the LLVM test-suite
>> (MultiSource/Benchmarks) using:
>>
>> taskset -c 0 perf stat -r 10 ./...

> How sure are we about these results?  It has been notoriously difficult to
> obtain reliable benchmark numbers on the BPI.  Do the results hold after a
> reboot or on the next day?  What about an even higher number of iterations?

I repeated the measurements using perf stat on multiple isolated
cores, including runs after reboot and on different days. Increasing
the number of iterations from -r 10 to -r 100 did not change the outcome.

> I find it difficult to understand why two
> benchmarks improve a lot more and one
> regresses.  If the LMUL scaling is incorrect, wouldn't we expect similar
> behavior for all three?  Or does SciMark have a different footprint WRT
> instructions and e.g. uses some insns more for which the uniform scaling
> doesn't hold?

In the generated code for SciMark2, the compiler selects almost
exclusively LMUL=M1 (only two MF2 occurrences in the whole assembly),
so LMUL scaling itself is effectively a no-op here. Therefore, my assumption
is that the difference in performance is caused by the base M1 latencies.

In the previous MD model, the measured load latency did not follow a
power-of-two relationship across LMULs (M1=3, M2=4, M4=8, M8=16).
To make this compatible with the dynamic -madjust-lmul-cost scaling,
I normalized M1 to 2 so higher LMULs could be approximated as ×2, ×4,
etc. Otherwise this would result in 3/6/12/24 for M1/M2/M4/M8, which
deviates significantly more from the measured 4/8/16 at higher LMULs
than adjusting M1 from 3 to 2. This improves the fit for M2/M4/M8, but
likely reduces accuracy for the dominant M1 case.

Nikola


CONFIDENTIALITY: The contents of this e-mail are confidential and intended only 
for the above addressee(s). If you are not the intended recipient, or the 
person responsible for delivering it to the intended recipient, copying or 
delivering it to anyone else or using it in any unauthorized manner is 
prohibited and may be unlawful. If you receive this e-mail by mistake, please 
notify the sender and the systems administrator at [email protected] 
immediately.

Re: [PATCH v2] RISC-V: Add LMUL-aware RVV cost model for the Spacemit-X60 core

Reply via email to