andreadb added inline comments.

================
Comment at: lib/Headers/x86intrin.h:49
+static __inline__ unsigned int __attribute__((__always_inline__, __nodebug__))
+__tzcnt_u32(unsigned int __X) { return __X ? __builtin_ctz(__X) : 32; }
+#ifdef __x86_64__
----------------
hans wrote:
> probinson wrote:
> > hans wrote:
> > > I'm worried about the conditional here. IIRC, ffmpeg uses TZCNT just as a 
> > > faster encoding for BSF, but now we're putting a conditional in the way, 
> > > so this will be slower actually. On the other hand, the alternative is 
> > > weird too :-/
> > I thought there was a peephole to notice a guard like this and do the right 
> > thing? In which case having the guard is fine.
> But for non-BMI targets it won't be able to remove the guard (unless it can 
> prove the argument is zero).
> 
> MSVC will just emit the raw 'tzcnt' instruction here, and that's what ffmpeg 
> wants.
I understand your point. However, the semantic of tzcnt_u32 (at least for BMI) 
is well defined on zero input. Changing that semantic for non-BMI targets 
sounds a bit odd/confusing to me.

I guess ffmpeg is reliant on the fact that other compilers would either 
preserve the intrinsic call until instruction selection, or fold it to a 
meaningful value (i.e. 32 in this case) when the input is known to be zero. If 
we remove the zero check from tzcnt_u32 (for non BMI targets), we let the 
optimizer enable rules to aggressively fold calls to tzcnt_u32 to undef when 
the input is zero.

As a side note: I noticed that gcc provides intrinsic `__bsfd` in ia32intrin.h. 
Maybe we should do something similar?


https://reviews.llvm.org/D26335



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to