> > > Other approaches under consideration include making CONFIG_PREEMPT_COUNT
> > > unconditional and thus allowing call_rcu() and kvfree_rcu() to determine
> > > whether direct calls to the allocator are safe (some guy named Linus
> > > doesn't like this one),
> > 
> > I assume that the primary argument is the overhead, right? Do you happen
> > to have any reference?
> 
> Jon Corbet wrote a very nice article summarizing the current situation:
> https://lwn.net/Articles/831678/.  Thomas's measurements show no visible
> system-level performance impact.  I will let Uladzislau present his more
> microbenchmarky performance work.
> 
I have done some analysis of the !PREEMPT kernel with and without PREEMPT_COUNT
configuration. The aim is to show a performance impact if the PREEMPT_COUNT is
unconditionally enabled.

As for the test i used the refscale kernel module, that does:

<snip>
static void ref_rcu_read_section(const int nloops)
{
 int i;

 for (i = nloops; i >= 0; i--) {
  rcu_read_lock();
  rcu_read_unlock();
 }
}
<snip>

How to run the microbenchmark:

<snip>
urezki@pc638:~$ sudo modprobe refscale
<snip>

The below is an average duration per loop (nanoseconds):

  !PREEMPT_COUNT            PREEMPT_COUNT
 Runs     Time(ns)         Runc     Time(ns)
 1        109.640          1        99.915
 2        102.303          2        111.106
 3        90.520           3        98.713
 4        106.347          4        111.239
 5        108.374          5        111.797
 6        108.012          6        111.558
 7        103.989          7        113.122
 8        106.194          8        111.515
 9        107.330          9        107.559
 10       105.877          10       105.965
 11       104.860          11       104.835
 12       104.299          12       106.342
 13       104.794          13       106.664
 14       104.916          14       104.914
 15       105.485          15       104.280
 16       104.610          16       105.642
 17       104.981          17       105.646
 18       103.089          18       106.370
 19       105.251          19       105.284
 20       104.133          20       105.973
 21       105.589          21       105.271
 22       104.154          22       106.063
 23       104.963          23       106.248
 24       102.431          24       105.568
 25       102.610          25       105.556
 26       103.474          26       105.655
 27       100.194          27       102.887
 28       102.340          28       104.347
 29       102.075          29       102.389
 30       102.808          30       103.123

The difference is ~1.8% in average. The maximum value is 109.640 vs 113.122
The minimum value is 90.520 vs 98.713.

Tested on:
    processor       : 63
    vendor_id       : AuthenticAMD
    cpu family      : 6
    model           : 6
    model name      : QEMU Virtual CPU version 2.5+
    cpu MHz         : 3700.204

I also can do more detailed testing using "perf" tool.

--
Vlad Rezki

Reply via email to