Nihao,

Le lauantaina 15. marraskuuta 2025, 4.50.11 Itä-Euroopan normaaliaika 
yunfei_zhou--- via ffmpeg-devel a écrit :
> Segmented load/store performance: We’ve encountered similar bottlenecks in
> our video decoding optimizations. To address this, we’re actively proposing
> new vector instructions tailored for media workloads to the RISC-V
> International standards body. At the same time, we’re working closely with
> RISC-V CPU microarchitecture teams to improve the hardware efficiency of
> these memory operations.

Nathan Edge (Google / RISE) gathered a list of useful instructions for 
multimedia at last year's VDD in Seoul. I do not know what came out of it 
though. However as far as segmented loads and stores are concerned, I don't 
think that the instruction set has a much of a problem. This looks like an 
implementation limitation.

Of  course, FFmpeg could use an in-register transpose instruction. There are 
cases of transposition not immediately following a load or preceding a store - 
particularly with video codec two-dimensional transforms. But at the same 
time, FFmpeg probably has to retain support for RVV 1.0 and RVA23 processors 
for a long time. Any new instruction set extension will require additional 
specialised optimisations, adding to the maintenance burden. So from the open-
source project's standpoint, that really should be the last resort.

For comparison, FFmpeg is still in the process of removing MMX in favour of 
SSE2 and co... Maybe we will be done before MMX turns 30.

-- 
德尼-库尔蒙‧雷米
https://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to