On 09/01/2015 12:04, Frediano Ziglio wrote:
> 2015-01-09 10:35 GMT+00:00 Paolo Bonzini <pbonz...@redhat.com>:
>>
>>
>> On 09/01/2015 11:27, Frediano Ziglio wrote:
>>>
>>> Signed-off-by: Frediano Ziglio <frediano.zig...@huawei.com>
>>> ---
>>>  include/qemu-common.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/qemu-common.h b/include/qemu-common.h
>>> index f862214..5366220 100644
>>> --- a/include/qemu-common.h
>>> +++ b/include/qemu-common.h
>>> @@ -370,6 +370,7 @@ static inline uint8_t from_bcd(uint8_t val)
>>>  }
>>>
>>>  /* compute with 96 bit intermediate result: (a*b)/c */
>>> +#ifndef __x86_64__
>>>  static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>>  {
>>>      union {
>>> @@ -392,6 +393,18 @@ static inline uint64_t muldiv64(uint64_t a, uint32_t 
>>> b, uint32_t c)
>>>      res.l.low = (((rh % c) << 32) + (rl & 0xffffffff)) / c;
>>>      return res.ll;
>>>  }
>>> +#else
>>> +static inline uint64_t muldiv64(uint64_t a, uint32_t b, uint32_t c)
>>> +{
>>> +    uint64_t res;
>>> +
>>> +    asm ("mulq %2\n\tdivq %3"
>>> +         : "=a"(res)
>>> +         : "a"(a), "qm"((uint64_t) b), "qm"((uint64_t)c)
>>> +         : "rdx", "cc");
>>> +    return res;
>>> +}
>>> +#endif
>>>
>>
>> Good idea.  However, if you have __int128, you can just do
>>
>>    return (__int128)a * b / c
>>
>> and the compiler should generate the right code.  Conveniently, there is
>> already CONFIG_INT128 that you can use.
> 
> Well, it works but in our case b <= c, that is a * b / c is always <
> 2^64.

This is not necessarily the case.  Quick grep:

hw/timer/hpet.c:    return (muldiv64(value, HPET_CLK_PERIOD, FS_PER_NS));
hw/timer/hpet.c:    return (muldiv64(value, FS_PER_NS, HPET_CLK_PERIOD));

One of the two must disprove your assertion. :)

But it's true that we expect no overflow.

> This lead to no integer overflow in the last division. However
> the compiler does not know this so it does the entire (a*b) / c
> division which is mainly consists in two integer division instead of
> one (not taking into account that is implemented using a helper
> function).
> 
> I think that I'll write two patches. One implementing using the int128
> as you suggested (which is much easier to read that current one and
> assembly ones) that another for x86_64 optimization.

Right, that's even better.

Out of curiosity, have you seen it in some profiles?

Paolo

Reply via email to