On 26 October 2015 at 04:46, James Almer <jamr...@gmail.com> wrote: > On 10/25/2015 1:39 PM, Matt Oliver wrote: > > On 26 October 2015 at 00:43, Hendrik Leppkes <h.lepp...@gmail.com> > wrote: > > > >> On Sun, Oct 25, 2015 at 12:30 PM, Matt Oliver <protogo...@gmail.com> > >> wrote: > >>> On 25 October 2015 at 22:16, Hendrik Leppkes <h.lepp...@gmail.com> > >> wrote: > >>> > >>>> On Sun, Oct 25, 2015 at 11:26 AM, Matt Oliver <protogo...@gmail.com> > >>>> wrote: > >>>>> On 25 October 2015 at 12:22, Ganesh Ajjanagadde <gajja...@mit.edu> > >>>> wrote: > >>>>> > >>>>>> On Sat, Oct 24, 2015 at 7:03 PM, James Almer <jamr...@gmail.com> > >> wrote: > >>>>>>> On 10/24/2015 7:48 PM, Ganesh Ajjanagadde wrote: > >>>>>>>> On Sat, Oct 24, 2015 at 6:08 PM, James Almer <jamr...@gmail.com> > >>>> wrote: > >>>>>>>>> This gives the compiler some flexibility > >>>>>>>>> > >>>>>>>>> Signed-off-by: James Almer <jamr...@gmail.com> > >>>>>>>>> --- > >>>>>>>>> libavutil/x86/intmath.h | 2 +- > >>>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-) > >>>>>>>>> > >>>>>>>>> diff --git a/libavutil/x86/intmath.h b/libavutil/x86/intmath.h > >>>>>>>>> index 7881e3c..10d3e7f 100644 > >>>>>>>>> --- a/libavutil/x86/intmath.h > >>>>>>>>> +++ b/libavutil/x86/intmath.h > >>>>>>>>> @@ -36,7 +36,7 @@ static av_always_inline av_const int > >>>>>> ff_ctzll_x86(long long v) > >>>>>>>>> { > >>>>>>>>> # if ARCH_X86_64 > >>>>>>>>> uint64_t c; > >>>>>>>>> - __asm__("bsfq %1,%0" : "=r" (c) : "r" (v)); > >>>>>>>>> + __asm__ ("bsfq %1,%0" : "=r" (c) : "rm" (v)); > >>>>>>>>> return c; > >>>>>>>>> # else > >>>>>>>>> return ((uint32_t)v == 0) ? _bit_scan_forward((uint32_t)(v > >>>> > >>>>>> 32)) + 32 : _bit_scan_forward((uint32_t)v); > >>>>>>>>> -- > >>>>>>>>> 2.6.2 > >>>>>>>> > >>>>>>>> This question may be silly, but can you clarify when this asm > >> code is > >>>>>>>> used instead of __builtin_ctzll? For instance, I think on > >> GNU/Linux, > >>>>>>>> x86-64, sufficiently modern GCC (even with asm enabled), we should > >>>>>>>> always use the builtin: the compiler knows best what to do with > >> its > >>>>>>>> builtin. > >>>>>>> > >>>>>>> This is ICC/ICL, not GCC. > >>>>>> > >>>>>> Ah, now I recall that _bit_scan_forward64 was not always available > on > >>>>>> ICC. Anyway, this is just a heads up to whoever uses ICC/ICL and the > >>>>>> like: you may want to find out to which versions this built in > >>>>>> applies. > >>>>> > >>>>> > >>>>> bit_scan_forward64 isnt available on ICL at all, hence the use of > asm. > >>>>> > >>>> > >>>> Intels intrinsic guide lists _BitScanForward64 as the intrinsic to > use, > >>>> however. > >>>> > >>>> > >> > https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=Other&expand=373 > >>> > >>> > >>> _BitScanForward64 is a msvc intrinsic only available on windows > >> platforms > >>> so not usable with ICC. The native asm implementation also performs > >> better > >>> with ICL hence its use. > >> > >> The "Intel(R) C++ Compiler User and Reference Guide" would care to > >> disagree with you. > >> > > > > Well ill be... I dont have ICC on hand to check it but ill take Intels > word > > for it. Its interesting that Intel decided to support that intrinsic as > > opposed to just extending there own version to 64b (i.e. > > _bit_scan_forward64). _BitScanForward64 as far as I know was introduced > by > > msvc and made available in "winnt.h" hence the different naming > convention > > to other intrinsics. That said the inline asm version is still preferable > > with Intel as the intrinsic passes the result via pointer which when > tested > > with msvc11 and 12 crt's resulted in horrible performance hits as the > > compiler didnt optimise out the memory access in order to keep the result > > in register (works fine with newer ICL 16 and msvcrt14 though). > > Tried ICC 13 x86_64 from https://gcc.godbolt.org/ and it doesn't recognize > _BitScanForward64, or _BitScanForward for that matter. >
I didnt know about godbolt, definitely a nice tool, thanks for checking. I was a little surprised as I swore I checked those (and others) intrinsics several years ago and they were windows only. Atleast now I know my memory is not completely shot. > > > > Using tzcnt would be the more interesting option. Changing the asm to > tzcnt > > works with ICL. But also the use of the tzcnt intrinsic (_tzcnt_u64) > would > > be possible with both intel and msvc. I dont have anything older than > > Haswell or Piledriver to double check but I have seen its use in several > > other projects without issue so in theory it shouldnt be a problem. > > > > If tzcnt is preferable I can make up a patch to convert the intel and > msvc > > versions to use the required intrinsic. > > IIRC tzcnt (rep bsf) is only slower than plain bsf on really ancient > machines, e.g. 80486 and such, so there's not really any downside of > using it. Well in that case the optimizations can be heavily simplified to those in the attached patch.
0001-avutil-x86-intmath-Use-tzcnt-in-place-of-bsf.patch
Description: Binary data
_______________________________________________ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel