Re: [RFC] Large scale kernel performance recording

Abramo Bagnara Wed, 28 Mar 2001 07:50:52 -0800
Keith Owens wrote:
> 
> All counters are 64 bit, 4 Gb is not enough for everybody.  This raises
> its own problem, some architectures do not have atomic reads, writes or
> increments for 64 bit fields but must treat them as 2 32 bit words.
> This race exists.
> 
>   user space:        kernel
>     read word 0
>                      read word 0
>                      read word 1
>                      update
>                      store word 0
>                      store word 1
>     read word 1
> 
> That gets really interesting when the counter is about to overflow from
> 2**32-1 to 2**32.  User space will read two words of zero.
> 
> To close this race without a kernel call to get a semaphore (avoiding
> context switches), there is an int at the start of each cpu's data,
> this field counts the number of data updates in progress.  User space
> can see the int as part of the mmapped data, like this.
> 
>   Global header page.
>     int flag                 )
>     performance counter 1    ) data for cpu 0, page aligned
>     performance counter 2... )
>     int flag                 )
>     performance counter 1    ) data for cpu 1, page aligned
>     performance counter 2... )
> 
> The application calculates the address of the flag for the cpu it is
> interested in and the address of the counter for the instance it wants
> to read.  Then
> 
>   user space:
> 
>   copy_desired_counter_values()
>   {
>     volatile int *p_flag = /* address of flag for desired cpu */;
>     volatile __s64 *p_counter = /* address of counter for desired cpu and instance 
>*/;
>     __s64 counter; /* local copy */
> 
>     do {
>       counter = *p_counter;
>     } while (!*p_flag && counter == *p_counter);

I suppose you've made a mistake writing the continue condition.

>   }
> 
>   kernel:
> 
>     ++perf_flag_for_current_cpu;
>     smp_wmb();
>     counter += delta;
>     smp_wmb();
>     --perf_flag_for_current_cpu;
>     smp_wmb();
> 
> Looping until counter == *p_counter ensures that user space always sees
> a stable value, even if the kernel is updating the counter at the same
> time.  A non-zero flag tells user space that an update is in progress
> so the data it got, while stable, might be invalid.
> 
> The flag check is required because user space on one cpu can read data
> from another cpu while the second cpu is updating the counter.  The
> flag is per cpu and is only updated by the kernel on that cpu so an int
> is both smp and interrupt safe, it does not need to be atomic.
> smp_wmb() is used after each update to guarantee strong write ordering.

I've faced this problem in the past months for ALSA and I think to be
able to propose you a better solution.

I see two problems in your solution:
1) the continue condition is wrong and I'm not sure you'll be able to
indicate one without races with your current fields
2) you've no way to have multiple related counter that need to be read
congruently (i.e. atomically).
3) your solution may be wrong if field values sequences is not
monothonic

I draw my solution:

struct perfcounter {
    int begin, end;             /* both initialized to 0 */     
    u64 counter1;
    u64 counter2;
};

kernel:
s->begin++;
wmb();
update performance counter(s)
wmb();
s->end++;


user:
volatile struct perfcounter *s;
do {
        _end = s->end;
        c1 = s->counter1;
        c2 = s->counter2;
} while (_end != s->begin);

-- 
Abramo Bagnara                       mailto:[EMAIL PROTECTED]

Opera Unica                          Phone: +39.546.656023
Via Emilia Interna, 140
48014 Castel Bolognese (RA) - Italy

ALSA project is            http://www.alsa-project.org
sponsored by SuSE Linux    http://www.suse.com

It sounds good!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Re: [RFC] Large scale kernel performance recording

Reply via email to