Re: [PATCH 01/16] math128: Introduce various 128bit primitives

Linus Torvalds Fri, 26 Oct 2012 08:18:10 -0700

On Fri, Oct 26, 2012 at 1:49 AM, Peter Zijlstra <[email protected]> wrote:
>
> No, it does a compare on two u128


Actually, it apparently compares two multiplications.

That might be optimizable in itself.

> The point is (as mentioned in the comments below) overflowing an actual
> u64 is rare, however since some of this (specifically the
> dl_{runtime,deadline} parameters) is user specified, we have to assume
> we will overflow.

Any chance we could just limit them?

> +       u128 left, right;
> +
> +       /*
> +        * left and right are the two sides of the equation above,
> +        * after a bit of shuffling to use multiplications instead
> +        * of divisions.
> +        *
> +        * Note that none of the time values involved in the two
> +        * multiplications are absolute: dl_deadline and dl_runtime
> +        * are the relative deadline and the maximum runtime of each
> +        * instance, runtime is the runtime left for the last instance
> +        * and (deadline - t), since t is rq->clock, is the time left
> +        * to the (absolute) deadline. Therefore, overflowing the u64
> +        * type is very unlikely to occur in both cases.
> +        */
> +       left = mul_u64_u64(dl_se->dl_deadline, dl_se->runtime);
> +       right = mul_u64_u64((dl_se->deadline - t), dl_se->dl_runtime);
> +
> +       if (cmp_u128(left, right) > 0)
> +               return true;
> +
> +       return false;

So how often could we do this without doing the multiplication at all?

It's trivial to see that 'right > left' if the individual
multiplicands are both bigger, for example. Maybe that is common?

And even if it overflows in 64-bit does it overflow in 92? For 32-bit
machines, the difference there is quite noticeable.

So the above might actually be better written as a
"compare_64bit_multiply(a,b,c,d)". At the same time, are we
*seriously* ever talking about multi-second runtimes or deadlines?
Because even in nanoseconds, I assume that the common case *by*far* in
scheduling would be about values smaller than four seconds, in which
case all of the above values are 32-bit, making the compares *much*
cheaper.

So on a 32-bit machine (say, x86-32), you might just have:

 - or all the high words together, jump to slow case if the result is non-zero
 - otherwise, do just two 32x32 multiplies and check which of the two is bigger.

That's a *huge* reduction in expensive multiplications.

And *THAT* is why generic 128-bit math is stupid. Don't do it.

           Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 01/16] math128: Introduce various 128bit primitives

Reply via email to