Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data

Sergey Fedorov Fri, 03 Jun 2016 10:48:30 -0700

On 03/06/16 20:29, Sergey Fedorov wrote:
> On 03/06/16 20:22, Emilio G. Cota wrote:
>> On Sat, May 28, 2016 at 21:15:06 +0300, Sergey Fedorov wrote:
>>> On 25/05/16 04:13, Emilio G. Cota wrote:
>>> (snip)
>>>> +double qdist_avg(const struct qdist *dist)
>>>> +{
>>>> +    unsigned long count;
>>>> +    size_t i;
>>>> +    double ret = 0;
>>>> +
>>>> +    count = qdist_sample_count(dist);
>>>> +    if (!count) {
>>>> +        return NAN;
>>>> +    }
>>>> +    for (i = 0; i < dist->n; i++) {
>>>> +        struct qdist_entry *e = &dist->entries[i];
>>>> +
>>>> +        ret += e->x * e->count / count;
>>> Please use Welford’s method or something like that, see
>>> http://stackoverflow.com/a/1346890.
>> Yes, the way the mean is computed right now, we might suffer
>> from underflow if count is huge. But I'd rather take that, than the
>> perf penalty of an iterative method (such as the one used
>> in Welford's). Note that we might have huge amounts of
>> items, e.g. one item per head bucket in qht's occupancy qdist
>> (and 0.5M head buckets is easy to achieve).
>>
>> If we were to use an iterative method, we'd need to do something
>> like:
>>
>> double qdist_avg(const struct qdist *dist)
>> {
>>     size_t i, j;
>>     double ret = 0;
>>
>>     if (!qdist_sample_count(dist)) {
>>         return NAN;
>>     }
>>     /* compute moving average to prevent under/overflow */
>>     for (i = 0; i < dist->n; i++) {
>>         struct qdist_entry *e = &dist->entries[i];
>>
>>         for (j = 0; j < e->count; j++) {
>>
>>             ret += (e->x - ret) / (i + j + 1);
>>         }
>>     }
>>     return ret;
>> }
>>
>> Note that skipping the inner loop would be incorrect.
> Ah, it's a shame. I'm wondering if there is some other algorithm that
> could work for us?


Maybe something like
https://en.wikipedia.org/wiki/Kahan_summation_algorithm could help?

Regards,
Sergey

>
>> I measured the time it takes to execute qdist_avg(&hst.occupancy) at the
>> end of booting debian jessie for ARM. The difference is
>> significant:
>>
>> Original:  0.000002 s
>> Iterative: 0.002846 s
> Have you compared the results of computing the average as well?
>
>> So really I think we should be OK with a potential underflow. If you want
>> I can add a comment to remind our future selves of these findings.
> Kind regards,
> Sergey

Re: [Qemu-devel] [PATCH v6 08/15] qdist: add module to represent frequency distributions of data

Reply via email to