Le sunnuntaina 4. syyskuuta 2022, 9.39.36 EEST Lynne a écrit :
> In particular, doing the tail, which consists of 2 equal length transforms.
> On AVX we interleave the coefficients from 2x4pt transforms during
> lookups since we can do them simultaneously and save on
> shuffles. Doing them individually wouldn't be as efficient.

I'm not going to boldy state that one size fits all, because I am pretty sure 
that it would come back to bite me in soft and sensitive tissue. But unlike 
SIMD extensions, RISC-V V and ARM SVE favour the use of offsets and masks to 
deal with misaligned edges, so I'm not sure how useful the insights from AVX 
are.

> > And besides, how do you want to get the value if not with assembler? This
> > is currently not found in ELF HWCAP and probably never will be.

> Sucks, knowing how wide the units are is as important as
> knowing how much L1 cache you have for me.

I understand that for some multidimensional calculations, you need to make 
special cases. The obvious case would be if the vector is too short to fit a 
column or row of elements whilst performing a transposition.

But even then, and even if we end up later on with, say, an arch_prctl() call 
to find the vector size, I don't think exposing it in CPU flags would be a good 
idea. VSETVL & VSETIVL also account for the element size and the vector group 
multiplier, so it seems better to use either of them than to reimplement the 
same logic in C based on the raw vector bit length.

-- 
レミ・デニ-クールモン
http://www.remlab.net/



_______________________________________________
ffmpeg-devel mailing list
ffmpeg-devel@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-devel

To unsubscribe, visit link above, or email
ffmpeg-devel-requ...@ffmpeg.org with subject "unsubscribe".

Reply via email to