On 2016-11-03 23:36:08 +0200, Martin Storsjö wrote: > On Thu, 3 Nov 2016, Diego Biurrun wrote: > > >On Thu, Nov 03, 2016 at 11:02:17PM +0200, Martin Storsjö wrote: > >>On Thu, 3 Nov 2016, Diego Biurrun wrote: > >>>On Thu, Nov 03, 2016 at 09:42:44PM +0100, Janne Grunau wrote: > >>>>On 2016-11-03 20:38:11 +0100, Diego Biurrun wrote: > >>>>> On Wed, Nov 02, 2016 at 07:24:03PM +0100, Janne Grunau wrote: > >>>>> > On 2016-11-02 15:29:34 +0100, Diego Biurrun wrote: > >>>>> > > On Wed, Nov 02, 2016 at 04:00:38PM +0200, Martin Storsjö wrote: > >>>>> > > > On Wed, 2 Nov 2016, Diego Biurrun wrote: > >>>>> > > > >On Wed, Nov 02, 2016 at 03:23:14PM +0200, Martin Storsjö wrote: > >>>>> > > > >>On Wed, 2 Nov 2016, Martin Storsjö wrote: > >>>>> > > > >>Technically, having a _neon prefix for them is wrong, but > >>>>> > > > >>anything else > >>>>> > > > >>(omitting these two while hooking up avg32/avg64 separately) is > >>>>> > > > >>more > >>>>> > > > >>complication - although I'm open for suggestions on how to > >>>>> > > > >>handle it best. > >>>>> > > > > > >>>>> > > > >Where exactly is the complication? In the way you assign the > >>>>> > > > >function > >>>>> > > > >pointers in the init file? > >>>>> > > > > > > > Yes, it'd require splitting up those macros a bit; > >>>>either for assigning the > >>>>> > > > same function pointers, but with a different simd instruction set > >>>>> > > > suffix, or > >>>>> > > > for only assigning the avg function pointer for those sizes. > >>>>> > > > > > Try something like > >>>>> > > > > > #define ff_vp9_copy32_neon ff_vp9_copy32_aarch64 > >>>>> > > #define ff_vp9_copy64_neon ff_vp9_copy64_aarch64 > >>>>> > > > > > before the assignment macros. You don't have to somehow drop > >>>>some of the > >>>>> > > assignments in the macros then. There's precedent for this in some > >>>>> > > of the > >>>>> > > x86 init files. > >>>>> > > > That only fixes the function names but not the cpu_flag based > >>>>assignment > > if I understand it correctly. And it is a little bit > >>>>silly since neon is > > not really optional on aarch64. At least in the > >>>>Application profile. > >>>>> > I forgot to say that ff_vp9_copy32_aarch64 and ff_vp9_copy64_aarch64 > >>>>> should be assigned to the function pointer unconditionally before > >>>>> checking for neon support. > >>>> > >>>>unconditionally is not nice since we want to compare them to C in > >>>>checkasm. That's the reason why we have the otherwise pointless > >>>>AV_CPU_FLAG_ARMV8 > >>> > >>>Unconditional in the aarch64 init function is not unconditional: > >>> > >>>libavcodec/foodsp.c: > >>> if (ARCH_AARCH64) > >>> init_aarch64(..); > >>> > >>>libavcodec/aarch64/foodsp_init.c: > >>> init_aarch64(..) > >>> { > >>> function.pointer = ff_vp9_copy32_aarch64; > >>> > >>> if (have_neon()) > >>> something.else = ff_whatever; > >>> > >>>Anyway, you get the idea. Quite possibly I'm overlooking something > >>>and it does not work as I envision... > >> > >>Yes, because as Janne said, in that case, you can't via cpuflags choose not > >>to use this function, and you can't benchmark against the C version in > >>checkasm, since the C version always gets overridden by this function, > >>regardless of cpuflags. > >> > >>That's why we've introduced the flag AV_CPU_FLAG_ARMV8, and one should wrap > >>such functions within have_armv8(cpuflags) instead of have_neon(cpuflags). > > > >OK, last try then :) > > > > #define ff_vp9_copy32_neon ff_vp9_copy32_aarch64 > > > > init_aarch64(..) > > { > > if (have_armv8()) > > function.pointer = ff_vp9_copy32_aarch64; > > > > if (have_neon()) > > [neon macro trickery > > function.pointer = ff_vp9_copy32_neon; > > neon macro trickery] > > Sure, something like that would work. Except if we go this way, I wouldn't > assign it at all within the neon section, since it already is hooked up > within have_armv8.
No the basic idea was to assign the same functions twice. Once under if (have_armv8()) with their real name and once under if (have_neon()) with the current macros. The missing _neon functions are provided by '#define ff_vp9_copy32_neon ff_vp9_copy32_aarch64'. That's not too messy and the functions have the correct debug symbols. I can live with the current misleading function names but this solution is not too messy. Janne _______________________________________________ libav-devel mailing list libav-devel@libav.org https://lists.libav.org/mailman/listinfo/libav-devel