Le maanantaina 4. joulukuuta 2023, 10.48.56 EET flow gg a écrit : > > Probably missing VLENB checks. > > Changed. > > > You can multiply by 3, 5 or 9 with shift-and-add. By 12 with shift-and-add > > then shift, and by 17 with shift then add. You don't need multiplications. > > Changed. > > > Do you really need to splat? Can't .vx or .wx be used instead? > > Okay, for example in ff_vc1_inv_trans_8x8_dc_rvv > > + vsetvli zero, t0, e8, m2, ta, ma > + vwaddu.vx v4, v0, zero > + vsetvli zero, t0, e16, m4, ta, ma > + vadd.vx v4, v4, t2 > - vsetvli zero, t0, e16, m4, ta, ma > - vmv.v.x v4, t2 > - vsetvli zero, t0, e8, m2, ta, ma > - vwaddu.wv v4, v4, v0 > > But the speed has slowed down slightly on the c910, > I'm not sure if I should modify it.
OK, unfortunately, there is no widening addition with wide scalar operand. But you can do zero-extension then addition here. In the end, I doubt that you can reasonably optimise whilst working with a C910-based board. This function deviates too much on non-conformant hardware. -- レミ・デニ-クールモン http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".