On Tue, Sep 12, 2017 at 12:55 PM, Paolo Bonzini <pbonz...@redhat.com> wrote:
> On 12/09/2017 18:48, Peter Feiner wrote:
>>>>
>>>> Because update_permission_bitmask is actually the top item in the profile
>>>> for nested vmexits, this speeds up an L2->L1 vmexit by about ten thousand
>>>> clock cycles, or up to 30%:
>>
>> This is a great improvement! Why not take it a step further and
>> compute the whole table once at module init time and be done with it?
>> There are only 5 extra input bits (nx, ept, smep, smap, wp),
>
> 4 actually, nx could be ignored (because unlike WP, the bit is reserved
> when nx is disabled).  It is only handled for clarity.
>
>> so the
>> whole table would only take up (1 << 5) * 16 = 512 bytes. Moreover, if
>> you had 32 VMs on the host, you'd actually save memory!
>
> Indeed; my thought was to write a script or something to generate the
> tables at compile time, but doing it at module init time would be clever
> and easier.
>
> That said, the generated code for the function, right now, is pretty
> good.  If it saved 1000 clock cycles per nested vmexit it would be very
> convincing, but if it were 50 or even 100 a bit less so.

ACK. I'm good with either approach :-) Please consider this one

Reviewed-By: Peter Feiner <pfei...@google.com>

Reply via email to