Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit : > func ff_decorrelate_sm_rvv, zve32x > 1: > vsetvli t0, a2, e32, m8, ta, ma > vle32.v v8, (a1) > sub a2, a2, t0 > vle32.v v0, (a0) > vssra.vi v8, v8, 1 > vsub.vv v16, v0, v8 > vse32.v v16, (a0) > sh2add a0, t0, a0 > vadd.vv v16, v0, v8 > vse32.v v16, (a1) > sh2add a1, t0, a1 > bnez a2, 1b > ret > endfunc > > Is this way? In this situation, or when using vsra, there will be some > tests that fail, and the result value differs by 1. I'm not sure where the > problem..
No, I meant something like this, but it turns out slightly slower anyway. Saving the data dependency is not worth adding an instruction. func ff_decorrelate_sm_rvv, zve32x csrwi vxrm, 0 1: vsetvli t0, a2, e32, m8, ta, ma vle32.v v8, (a1) sub a2, a2, t0 vle32.v v0, (a0) vsra.vi v16, v8, 1 vssra.vi v8, v8, 1 vsub.vv v16, v0, v16 vadd.vv v8, v0, v8 vse32.v v16, (a0) sh2add a0, t0, a0 vse32.v v8, (a1) sh2add a1, t0, a1 bnez a2, 1b ret endfunc -- 雷米‧德尼-库尔蒙 http://www.remlab.net/ _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".