On Wed, Nov 16, 2016 at 03:16:59PM -0500, Chris Metcalf wrote: > PeterZ (cc'ed) then improved it to use __int128 math via > mul_u64_u32_shr(), but that doesn't help tile; we only do one multiply > instead of two, but the multiply is handled by an out-of-line call to > __multi3, and the sched_clock() function ends up about 2.5x slower as > a result.
Well, only if you set CONFIG_ARCH_SUPPORTS_INT128, otherwise it reduces to 2 32x23->64 multiplications, of which one if conditional on there actually being bits set in the high word of the u64 argument.