Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit : > C908: > decorrelate_sm_c: 130.0 > decorrelate_sm_rvv_i32: 43.7
+ +func ff_decorrelate_sm_rvv, zve32x +1: + vsetvli t0, a2, e32, m8, ta, ma + vle32.v v0, (a0) + sub a2, a2, t0 + vle32.v v8, (a1) + vsra.vi v16, v8, 1 You should load v8 first, since it is used as input before v0. + vsub.vv v0, v0, v16 + vse32.v v0, (a0) + sh2add a0, t0, a0 + vadd.vv v0, v0, v8 You can use VSSRA, and then VADD won't need to depend on the output of VSUB. + vse32.v v0, (a1) + sh2add a1, t0, a1 + bnez a2, 1b + ret +endfunc -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".