On May 10, 2016 10:23:13 AM PDT, Peter Zijlstra <pet...@infradead.org> wrote: >On Tue, May 10, 2016 at 06:53:18PM +0200, Borislav Petkov wrote: >> static __always_inline unsigned int __arch_hweight32(unsigned int w) >> { >> - unsigned int res = 0; >> + unsigned int res; >> >> - asm (ALTERNATIVE("call __sw_hweight32", POPCNT32, >X86_FEATURE_POPCNT) >> - : "="REG_OUT (res) >> - : REG_IN (w)); >> + if (likely(static_cpu_has(X86_FEATURE_POPCNT))) { >> + /* popcnt %eax, %eax */ >> + asm volatile(POPCNT32 >> + : "="REG_OUT (res) >> + : REG_IN (w)); >> >> - return res; >> + return res; >> + } >> + return __sw_hweight32(w); >> } > >So what was wrong with using the normal thunk_*.S wrappers for the >calls? That would allow you to use the alternative() stuff which does >generate smaller code.
Also, to be fair... if the problem is with these being in C then we could just do it in assembly easily enough. -- Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.