* jef...@chromium.org <jef...@chromium.org> [240415 12:35]:
> From: Jeff Xu <jef...@chromium.org>
> 
> This is V10 version, it rebases v9 patch to 6.9.rc3.
> We also applied and tested mseal() in chrome and chromebook.
> 
> ------------------------------------------------------------------
...

> MM perf benchmarks
> ==================
> This patch adds a loop in the mprotect/munmap/madvise(DONTNEED) to
> check the VMAs’ sealing flag, so that no partial update can be made,
> when any segment within the given memory range is sealed.
> 
> To measure the performance impact of this loop, two tests are developed.
> [8]
> 
> The first is measuring the time taken for a particular system call,
> by using clock_gettime(CLOCK_MONOTONIC). The second is using
> PERF_COUNT_HW_REF_CPU_CYCLES (exclude user space). Both tests have
> similar results.
> 
> The tests have roughly below sequence:
> for (i = 0; i < 1000, i++)
>     create 1000 mappings (1 page per VMA)
>     start the sampling
>     for (j = 0; j < 1000, j++)
>         mprotect one mapping
>     stop and save the sample
>     delete 1000 mappings
> calculates all samples.


Thank you for doing this performance testing.

> 
> Below tests are performed on Intel(R) Pentium(R) Gold 7505 @ 2.00GHz,
> 4G memory, Chromebook.
> 
> Based on the latest upstream code:
> The first test (measuring time)
> syscall__     vmas    t       t_mseal delta_ns        per_vma %
> munmap__      1       909     944     35      35      104%
> munmap__      2       1398    1502    104     52      107%
> munmap__      4       2444    2594    149     37      106%
> munmap__      8       4029    4323    293     37      107%
> munmap__      16      6647    6935    288     18      104%
> munmap__      32      11811   12398   587     18      105%
> mprotect      1       439     465     26      26      106%
> mprotect      2       1659    1745    86      43      105%
> mprotect      4       3747    3889    142     36      104%
> mprotect      8       6755    6969    215     27      103%
> mprotect      16      13748   14144   396     25      103%
> mprotect      32      27827   28969   1142    36      104%
> madvise_      1       240     262     22      22      109%
> madvise_      2       366     442     76      38      121%
> madvise_      4       623     751     128     32      121%
> madvise_      8       1110    1324    215     27      119%
> madvise_      16      2127    2451    324     20      115%
> madvise_      32      4109    4642    534     17      113%
> 
> The second test (measuring cpu cycle)
> syscall__     vmas    cpu     cmseal  delta_cpu       per_vma %
> munmap__      1       1790    1890    100     100     106%
> munmap__      2       2819    3033    214     107     108%
> munmap__      4       4959    5271    312     78      106%
> munmap__      8       8262    8745    483     60      106%
> munmap__      16      13099   14116   1017    64      108%
> munmap__      32      23221   24785   1565    49      107%
> mprotect      1       906     967     62      62      107%
> mprotect      2       3019    3203    184     92      106%
> mprotect      4       6149    6569    420     105     107%
> mprotect      8       9978    10524   545     68      105%
> mprotect      16      20448   21427   979     61      105%
> mprotect      32      40972   42935   1963    61      105%
> madvise_      1       434     497     63      63      115%
> madvise_      2       752     899     147     74      120%
> madvise_      4       1313    1513    200     50      115%
> madvise_      8       2271    2627    356     44      116%
> madvise_      16      4312    4883    571     36      113%
> madvise_      32      8376    9319    943     29      111%
> 

If I am reading this right, madvise() is affected more than the other
calls?  Is that expected or do we need to have a closer look?

...

> When I discuss the mm performance with Brian Makin, an engineer worked
> on performance, it was brought to my attention that such a performance
> benchmarks, which measuring millions of mm syscall in a tight loop, may
> not accurately reflect real-world scenarios, such as that of a database
> service. Also this is tested using a single HW and ChromeOS, the data
> from another HW or distribution might be different. It might be best
> to take this data with a grain of salt.
> 

Absolutely, these types of benchmarks are pointless to simulate what
will really happen with any sane program.

However, they are valuable in that they can highlight areas where
something may have been made more inefficient.  These inefficiencies
would otherwise be lost in the noise of regular system use.  They can be
used as a relatively high level sanity on what you believe is going on.

I appreciate you doing the work on testing the performance here.

...

Reply via email to