andreadb added inline comments.
================ Comment at: lib/Headers/x86intrin.h:49 +static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__)) +__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; } +#ifdef __x86_64__ ---------------- hans wrote: > probinson wrote: > > hans wrote: > > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a > > > faster encoding for BSF, but now we're putting a conditional in the way, > > > so this will be slower actually. On the other hand, the alternative is > > > weird too :-/ > > I thought there was a peephole to notice a guard like this and do the right > > thing? In which case having the guard is fine. > But for non-BMI targets it won't be able to remove the guard (unless it can > prove the argument is zero). > > MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg > wants. I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) is well defined on zero input. Changing that semantic for non-BMI targets sounds a bit odd/confusing to me. I guess ffmpeg is reliant on the fact that other compilers would either preserve the intrinsic call until instruction selection, or fold it to a meaningful value (i.e. 32 in this case) when the input is known to be zero. If we remove the zero check from tzcnt_u32 (for non BMI targets), we let the optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when the input is zero. As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. Maybe we should do something similar? https://reviews.llvm.org/D26335 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits