Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-22 Thread Rémi Denis-Courmont
Le perjantaina 22. joulukuuta 2023, 3.34.39 EET flow gg a écrit :
> func ff_decorrelate_sm_rvv, zve32x
> 1:
> vsetvli  t0, a2, e32, m8, ta, ma
> vle32.v  v8, (a1)
> sub a2,  a2, t0
> vle32.v  v0, (a0)
> vssra.vi  v8, v8, 1
> vsub.vv  v16, v0, v8
> vse32.v  v16, (a0)
> sh2add   a0, t0, a0
> vadd.vv  v16, v0, v8
> vse32.v  v16, (a1)
> sh2add   a1, t0, a1
> bnez a2, 1b
> ret
> endfunc
> 
> Is this way? In this situation, or when using vsra, there will be some
> tests that fail, and the result value differs by 1. I'm not sure where the
> problem..

No, I meant something like this, but it turns out slightly slower anyway. 
Saving the data dependency is not worth adding an instruction.

func ff_decorrelate_sm_rvv, zve32x
csrwi   vxrm, 0
1:
vsetvli t0, a2, e32, m8, ta, ma
vle32.v v8, (a1)
sub a2, a2, t0
vle32.v v0, (a0)
vsra.vi v16, v8, 1
vssra.vi v8, v8, 1
vsub.vv v16, v0, v16
vadd.vv v8, v0, v8
vse32.v v16, (a0)
sh2add  a0, t0, a0
vse32.v v8, (a1)
sh2add  a1, t0, a1
bneza2, 1b

ret
endfunc

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread flow gg
func ff_decorrelate_sm_rvv, zve32x
1:
vsetvli  t0, a2, e32, m8, ta, ma
vle32.v  v8, (a1)
sub a2,  a2, t0
vle32.v  v0, (a0)
vssra.vi  v8, v8, 1
vsub.vv  v16, v0, v8
vse32.v  v16, (a0)
sh2add   a0, t0, a0
vadd.vv  v16, v0, v8
vse32.v  v16, (a1)
sh2add   a1, t0, a1
bnez a2, 1b
ret
endfunc

Is this way? In this situation, or when using vsra, there will be some
tests that fail, and the result value differs by 1. I'm not sure where the
problem..

Rémi Denis-Courmont  于2023年12月22日周五 00:08写道:

> Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> > C908:
> > decorrelate_sm_c: 130.0
> > decorrelate_sm_rvv_i32: 43.7
>
> +
> +func ff_decorrelate_sm_rvv, zve32x
> +1:
> +vsetvli  t0, a2, e32, m8, ta, ma
> +vle32.v  v0, (a0)
> +sub a2,  a2, t0
> +vle32.v  v8, (a1)
> +vsra.vi  v16, v8, 1
>
> You should load v8 first, since it is used as input before v0.
>
> +vsub.vv  v0, v0, v16
> +vse32.v  v0, (a0)
> +sh2add   a0, t0, a0
> +vadd.vv  v0, v0, v8
>
> You can use VSSRA, and then VADD won't need to depend on the output of
> VSUB.
>
> +vse32.v  v0, (a1)
> +sh2add   a1, t0, a1
> +bnez a2, 1b
> +ret
> +endfunc
>
> --
> 雷米‧德尼-库尔蒙
> http://www.remlab.net/
>
>
>
> ___
> ffmpeg-devel mailing list
> ffmpeg-devel@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel
>
> To unsubscribe, visit link above, or email
> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread Rémi Denis-Courmont
Le torstaina 21. joulukuuta 2023, 18.07.55 EET Rémi Denis-Courmont a écrit :
> You can use VSSRA, and then VADD won't need to depend on the output of VSUB.

P.S.: I have NOT checked which approach is actually faster.

-- 
Rémi Denis-Courmont
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-devel] [PATCH 6/6] lavc/takdsp: R-V V decorrelate_sm

2023-12-21 Thread Rémi Denis-Courmont
Le maanantaina 18. joulukuuta 2023, 17.16.27 EET flow gg a écrit :
> C908:
> decorrelate_sm_c: 130.0
> decorrelate_sm_rvv_i32: 43.7

+
+func ff_decorrelate_sm_rvv, zve32x
+1:
+vsetvli  t0, a2, e32, m8, ta, ma
+vle32.v  v0, (a0)
+sub a2,  a2, t0
+vle32.v  v8, (a1)
+vsra.vi  v16, v8, 1

You should load v8 first, since it is used as input before v0.

+vsub.vv  v0, v0, v16
+vse32.v  v0, (a0)
+sh2add   a0, t0, a0
+vadd.vv  v0, v0, v8

You can use VSSRA, and then VADD won't need to depend on the output of VSUB.

+vse32.v  v0, (a1)
+sh2add   a1, t0, a1
+bnez a2, 1b
+ret
+endfunc

-- 
雷米‧德尼-库尔蒙
http://www.remlab.net/



___
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".