Re: [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches

Bandan Das Tue, 06 May 2014 21:47:24 -0700

Paolo Bonzini <pbonz...@redhat.com> writes:

> Il 06/05/2014 02:40, Bandan Das ha scritto:
>> On every instruction fetch, kvm_read_guest_virt_helper
>> does the gva to gpa translation followed by searching for the
>> memslot. Store the gva hva mapping so that if there's a match
>> we can directly call __copy_from_user()
>>
>> Suggested-by: Paolo Bonzini <pbonz...@redhat.com>
>> Signed-off-by: Bandan Das <b...@redhat.com>
>> ---
>>  arch/x86/include/asm/kvm_emulate.h |  7 ++++++-
>>  arch/x86/kvm/x86.c                 | 33 +++++++++++++++++++++++----------
>>  2 files changed, 29 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/include/asm/kvm_emulate.h 
>> b/arch/x86/include/asm/kvm_emulate.h
>> index 085d688..20ccde4 100644
>> --- a/arch/x86/include/asm/kvm_emulate.h
>> +++ b/arch/x86/include/asm/kvm_emulate.h
>> @@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
>>      int (*execute)(struct x86_emulate_ctxt *ctxt);
>>      int (*check_perm)(struct x86_emulate_ctxt *ctxt);
>>      /*
>> -     * The following five fields are cleared together,
>> +     * The following six fields are cleared together,
>>       * the rest are initialized unconditionally in x86_decode_insn
>>       * or elsewhere
>>       */
>> +    bool addr_cache_valid;
>
> You can just set gfn to -1 to invalidate the fetch.
>
>>      u8 rex_prefix;
>>      u8 lock_prefix;
>>      u8 rep_prefix;
>> @@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
>>      struct fetch_cache fetch;
>>      struct read_cache io_read;
>>      struct read_cache mem_read;
>> +    struct {
>> +            gfn_t gfn;
>> +            unsigned long uaddr;
>> +    } addr_cache;
>
> This is not used by emulate.c, so it should be directly in struct
> kvm_vcpu.  You could invalidate it in init_emulate_ctxt, together with
> emulate_regs_need_sync_from_vcpu.


Ok.

> In the big picture, however, what we really want is to persist the
> cache across multiple instructions, and also cache the linearization
> of the address (where you add RIP and CS.base).  This would be a
> bigger source of improvement.  If you do that, the cache has to be
> indeed in x86_emulate_ctxt, but on the other hand the implementation
> needs to be in emulate.c.
>
>>  };
>>
>>  /* Repeat String Operation Prefix */
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
>> index cf69e3b..7afcfc7 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, 
>> void *val, unsigned int bytes,
>>              unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
>>              int ret;
>>              unsigned long uaddr;
>> +            gfn_t gfn = addr >> PAGE_SHIFT;
>>
>> -            ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> -                                            exception, false,
>> -                                            NULL, &uaddr);
>> -            if (ret != X86EMUL_CONTINUE)
>> -                    return ret;
>> +            if (ctxt->addr_cache_valid &&
>> +                (ctxt->addr_cache.gfn == gfn))
>> +                    uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
>> +                            offset_in_page(addr);
>> +            else {
>> +                    ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
>> +                                                    exception, false,
>> +                                                    NULL, &uaddr);
>
> Did you measure the hit rate, and the speedup after every patch?  My
> reading of the code is that the fetch is done only once per page, with

Yes, I did. IIRC, patch 3 actually improved the speedup compared to 2.
A rough estimate for jmp - patch 2 reduced it to the low 600s, I guess
around 610, but I could get a fresh set of numbers.

So, not sure where the speed up is coming from, I will try to find out.

> the speedup coming from the microoptimization that patch 2 provides
> for single-page reads.  Single-page reads are indeed a very common
> case for the emulator.
>
> I suggest to start with making patch 2 ready for inclusion.
>
> Paolo
>
>> +                    if (ret != X86EMUL_CONTINUE)
>> +                            return ret;
>> +
>> +                    if (unlikely(kvm_is_error_hva(uaddr))) {
>> +                            r = X86EMUL_PROPAGATE_FAULT;
>> +                            return r;
>> +                    }
>>
>> -            if (unlikely(kvm_is_error_hva(uaddr))) {
>> -                    r = X86EMUL_PROPAGATE_FAULT;
>> -                    return r;
>> +                    /* Cache gfn and hva */
>> +                    ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
>> +                    ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
>> +                    ctxt->addr_cache_valid = true;
>>              }
>>
>>              ret = __copy_from_user(data, (void __user *)uaddr, toread);
>>              if (ret < 0) {
>>                      r = X86EMUL_IO_NEEDED;
>> +                    /* Where else should we invalidate cache ? */
>> +                    ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>>                      return r;
>>              }
>>
>> -            ctxt->ops->memory_finish(ctxt, NULL, uaddr);
>> -
>>              bytes -= toread;
>>              data += toread;
>>              addr += toread;
>> @@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct 
>> x86_emulate_ctxt *ctxt,
>>      struct kvm_memory_slot *memslot;
>>      gfn_t gfn;
>>
>> +    ctxt->addr_cache_valid = false;
>>      if (!opaque)
>>              return;
>>
>>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches

Reply via email to