Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit :
> > Probably missing VLENB checks.
> 
> Changed.
> 
> > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add
> > then shift, and by 17 with shift then add. You don't need multiplications.
> 
> Changed.
> 
> > Do you really need to splat? Can't .vx or .wx be used instead?
> 
> Okay, for example in ff_vc1_inv_trans_8x8_dc_rvv
> 
> + vsetvli      zero, t0, e8, m2, ta, ma
> + vwaddu.vx    v4, v0, zero
> + vsetvli      zero, t0, e16, m4, ta, ma
> + vadd.vx      v4, v4, t2
> - vsetvli      zero, t0, e16, m4, ta, ma
> - vmv.v.x      v4, t2
> - vsetvli      zero, t0, e8, m2, ta, ma
> - vwaddu.wv    v4, v4, v0
> 
> But the speed has slowed down slightly on the c910,
> I'm not sure if I should modify it.

OK, unfortunately, there is no widening addition with wide scalar operand. But 
you can do zero-extension then addition here. In the end, I doubt that you can 
reasonably optimise whilst working with a C910-based board. This function 
deviates too much on non-conformant hardware.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to