arm: Reduce overhead of cpu_get_tb_cpu_state

Emilio G. Cota Thu, 14 Feb 2019 09:22:22 -0800

On Wed, Feb 13, 2019 at 20:06:48 -0800, Richard Henderson wrote:
> We've talked about this before, caching state to reduce the
> amount of computation that happens looking up each TB.
> 
> I know that Peter has been concerned that we would not be able to 
> reliably maintain all of the places that need to be updates to
> keep this up-to-date.
> 
> Well, modulo dirty tricks within linux-user, it appears as if
> exception delivery and return, plus after every TB-ending write
> to a system register is sufficient.
> 
> There seems to be a noticable improvement, although wall-time
> is harder to come by -- all of my system-level measurements
> include user input, and my user-level measurements seem to be
> too small to matter.


Thanks for this!

Some SPEC06int user-mode numbers (before vs. after)

                   aarch64-linux-user speedup for SPEC06int (test set)
                      Host: Intel(R) Xeon(R) Gold 6142 CPU @ 2.60GHz

                      2 +-----------------------------------------+
                        |                                         |
                    1.9 |-+.........................a+-+r.......+-|
                        |                            +-+          |
                        |                            * *          |
                    1.8 |-+..........................*.*........+-|
                        |       +-+                  * *          |
                    1.7 |-+.....+-+...............+-+*.*...+-+..+-|
                        |       * *         +-+   * ** *   +-+    |
                    1.6 |-+.....*.*..........|....*.**.*+-+*.*..+-|
                        |       * *         *|*   * ** *+-+* *    |
                    1.5 |-+.....*.*.........*|*...*.**.**.**.*..+-|
                        |       * *         +-+   * ** ** ** *    |
                        |       * *         * *   * ** ** ** *    |
                    1.4 |-+.....*.*.........*.*...*.**.**.**.*+-+-|
                        |       * *   +-+   * *   * ** ** ** ** * |
                    1.3 |-+.....*.*...+-+...*.*...*.**.**.**.**.*-|
                        | +-+   * *   * *   * *   * ** ** ** ** * |
                    1.2 |-+-+...*.*...*.*...*.*...*.**.**.**.**.*-|
                        | * *   * *   * *   * *   * ** ** ** ** * |
                        | * *   * *   * *+-+* *   * ** ** ** ** * |
                    1.1 |-*.*...*.*...*.**.**.*...*.**.**.**.**.*-|
                        | * *+-+* *+-+* ** ** *+-+* ** ** ** ** * |
                      1 +-----------------------------------------+
              400.per401.b40344454462.li464471.483.xalangeomean
 png: https://imgur.com/RjkYYJ5

That is, a 1.4x average speedup.

                Emilio

Re: [Qemu-devel] [PATCH 0/4] target/arm: Reduce overhead of cpu_get_tb_cpu_state

Reply via email to