The issue here is that any load greater than e8 will fail the test(Bus error), so it cannot use vlse64 or similar methods...
Rémi Denis-Courmont <r...@remlab.net> 于2024年2月9日周五 18:32写道: > > > Le 9 février 2024 00:39:38 GMT+02:00, flow gg <hlefthl...@gmail.com> a > écrit : > >From my understanding, to use larger group multipliers, one needs to > >utilize vlse64 (8x8) vlse128 (16x16). > > > >However, due to the use in tests of > > > >ptr = img2 + y * WIDTH + x; > >d2 = call_ref(NULL, img1, ptr, WIDTH, h); > >d1 = call_new(NULL, img1, ptr, WIDTH, h); > > > >will get: pix_abs_1_0_rvv_i32 (fatal signal 7: Bus error) > > > >Because it can only load according to e8, it seems there's no way to use > >larger group multipliers. > > vlse128.v requires 128-bit elements, which no hardware supports. vlse64.v > works just fine; we're already using it. There's also the possibility of > segmented strided loads, or simply multiple unit loads. > > In any case, unrolling one way or other should improve performance. > > > > > > > > > >Rémi Denis-Courmont <r...@remlab.net> 于2024年2月9日周五 03:41写道: > > > >> Le keskiviikkona 7. helmikuuta 2024, 2.01.23 EET flow gg a écrit : > >> > I think in most cases it is like this, but specifically for this > >> function, > >> > using Reduction only once would be slower. > >> > > >> > The currently submitted version roughly takes: > >> > pix_abs_0_0_rvv_i32: 136.2 > >> > > >> > The version that uses Reduction only once takes: > >> > pix_abs_0_0_rvv_i32: 169.2 > >> > >> You're only using one vector and half a vector respectively, so the > >> logarithmic time of the sum is relatively small. > >> > >> But are you sure that it wouldn't be faster to process multiple rows and > >> larger group multiplers? > >> > >> > Here is the implementation of the version that uses it only once: > >> > > >> > func ff_pix_abs16_temp_rvv, zve32x > >> > vsetivli zero, 16, e32, m4, ta, ma > >> > vmv.v.i v24, 0 > >> > vmv.s.x v0, zero > >> > 1: > >> > vsetvli zero, zero, e8, m1, tu, ma > >> > vle8.v v4, (a1) > >> > vle8.v v12, (a2) > >> > addi a4, a4, -1 > >> > vwsubu.vv v16, v4, v12 > >> > add a1, a1, a3 > >> > vwsubu.vv v20, v12, v4 > >> > vsetvli zero, zero, e16, m2, tu, ma > >> > vmax.vv v16, v16, v20 > >> > add a2, a2, a3 > >> > vwadd.wv v24, v24, v16 > >> > bnez a4, 1b > >> > > >> > vsetvli zero, zero, e32, m4, ta, ma > >> > vwredsumu.vs v0, v24, v0 > >> > vmv.x.s a0, v0 > >> > ret > >> > endfunc > >> > > >> > Rémi Denis-Courmont <r...@remlab.net> 于2024年2月7日周三 00:58写道: > >> > > >> > > Hi, > >> > > > >> > > To sum a vector, you should only reduce once at the end of the > >> function, > >> > > c.f. > >> > > how it's done in existing scalar products. Reduction instructions > are > >> > > (intrinsically) slow. > >> > > > >> > > -- > >> > > Rémi Denis-Courmont > >> > > http://www.remlab.net/ > >> > > >> > _______________________________________________ > >> > ffmpeg-devel mailing list > >> > ffmpeg-devel@ffmpeg.org > >> > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >> > > >> > To unsubscribe, visit link above, or email > >> > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > >> > >> > >> -- > >> 雷米‧德尼-库尔蒙 > >> http://www.remlab.net/ > >> > >> > >> > >> _______________________________________________ > >> ffmpeg-devel mailing list > >> ffmpeg-devel@ffmpeg.org > >> https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > >> > >> To unsubscribe, visit link above, or email > >> ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > >> > >_______________________________________________ > >ffmpeg-devel mailing list > >ffmpeg-devel@ffmpeg.org > >https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > > >To unsubscribe, visit link above, or email > >ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ > ffmpeg-devel mailing list > ffmpeg-devel@ffmpeg.org > https://ffmpeg.org/mailman/listinfo/ffmpeg-devel > > To unsubscribe, visit link above, or email > ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe". > _______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-devel To unsubscribe, visit link above, or email ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".