On Fri, Oct 26, 2012 at 1:49 AM, Peter Zijlstra <pet...@infradead.org> wrote: > > No, it does a compare on two u128
Actually, it apparently compares two multiplications. That might be optimizable in itself. > The point is (as mentioned in the comments below) overflowing an actual > u64 is rare, however since some of this (specifically the > dl_{runtime,deadline} parameters) is user specified, we have to assume > we will overflow. Any chance we could just limit them? > + u128 left, right; > + > + /* > + * left and right are the two sides of the equation above, > + * after a bit of shuffling to use multiplications instead > + * of divisions. > + * > + * Note that none of the time values involved in the two > + * multiplications are absolute: dl_deadline and dl_runtime > + * are the relative deadline and the maximum runtime of each > + * instance, runtime is the runtime left for the last instance > + * and (deadline - t), since t is rq->clock, is the time left > + * to the (absolute) deadline. Therefore, overflowing the u64 > + * type is very unlikely to occur in both cases. > + */ > + left = mul_u64_u64(dl_se->dl_deadline, dl_se->runtime); > + right = mul_u64_u64((dl_se->deadline - t), dl_se->dl_runtime); > + > + if (cmp_u128(left, right) > 0) > + return true; > + > + return false; So how often could we do this without doing the multiplication at all? It's trivial to see that 'right > left' if the individual multiplicands are both bigger, for example. Maybe that is common? And even if it overflows in 64-bit does it overflow in 92? For 32-bit machines, the difference there is quite noticeable. So the above might actually be better written as a "compare_64bit_multiply(a,b,c,d)". At the same time, are we *seriously* ever talking about multi-second runtimes or deadlines? Because even in nanoseconds, I assume that the common case *by*far* in scheduling would be about values smaller than four seconds, in which case all of the above values are 32-bit, making the compares *much* cheaper. So on a 32-bit machine (say, x86-32), you might just have: - or all the high words together, jump to slow case if the result is non-zero - otherwise, do just two 32x32 multiplies and check which of the two is bigger. That's a *huge* reduction in expensive multiplications. And *THAT* is why generic 128-bit math is stupid. Don't do it. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/