Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-29 Thread Radim Krčmář
2016-04-28 17:08-0500, Suravee Suthikulanit:
> On 4/12/2016 11:22 AM, Radim Krčmář wrote:
>> 2016-04-07 03:20-0500, Suravee Suthikulpanit:
>> > From: Suravee Suthikulpanit 
>> > 
>> > This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
>> > and avic_unaccelerated_access_interception() along with two trace points
>> > (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).
>> > 
>> > Signed-off-by: Suravee Suthikulpanit 
>> > ---
>> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> > @@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
>> > [...]
>> > +  lid = ffs(dlid) - 1;
>> > +  ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
>> > +  if (ret)
>> > +  return 0;
>> 
>> OS can actually change LDR, so the old one should be invalidated.
>> 
>> (No OS does, but that is not an important factor for the hypervisor.)
>> 
> 
> By validating the old one, are you suggesting that we should disable the
> logical APIC table entry previously used by this vcpu? If so, that means we
> would need to cached the previous LDR value since the one in vAPIC backing
> page would already be updated.

Yes, the cache could be used to speed up recalculate_apic_map() too, so
it might not be a total waste.

Which reminds me that physical APIC_ID doesn't use correct cache.
svm->vcpu.vcpu_id is only the initial ID, but APIC_ID could be changed
more than once.
It would be great to make APIC_ID read-only in all cases, because x86
spec allows us to do so, but KVM_SET_LAPIC can set APIC ID too and I
think we don't retroactively modify userspace API ... Paolo?

>> > [...]
>> 
>> > +  if (vm_data->ldr_mode != mod) {
>> > +  
>> > clear_page(page_address(vm_data->avic_logical_id_table_page));
>> > +  vm_data->ldr_mode = mod;
>> > +  }
>> > +  break;
>> > +  }
>> 
>> All these cases need to be called on KVM_SET_LAPIC -- the userspace can
>> provide completely new set of APIC registers and AVIC should build its
>> maps with them.  Test with save/restore or migration.
> 
> Hm.. This means we might need to introduce a new hook which is called from
> the arch/x86/kvm/lapic.c: kvm_apic_post_state_restore(). Probably something
> like kvm_x86_ops->apic_post_state_restore(), which sets up the new physical
> and logical APIC id tables for AVIC.

Sounds good.  I imagine the callback would just call
avic_unaccel_trap_write() for relevant registers.

>> > +  return ret;
>> 
>> because we should not return, but continue to emulate the access.
> 
> Then, this would continue as if we are handling the normal fault access.

Exactly, it is a normal access to an undefined register.

>> > +  }
>> > +
>> > +  if (trap) {
>> > +  /* Handling Trap */
>> > +  if (!write) /* Trap read should never happens */
>> > +  BUG();
>> 
>> (BUG_ON(!write) is shorter, though I would avoid BUG -- only guests are
>>   going to fail, so we don't need to kill the host.)
> 
> Ok. What about just WARN_ONCE(!write, "svm: Handling trap read.\n");

Sure, it's a hardware bug and calling avic_unaccel_trap_write() on a
read access shouldn't result in a bug.  I am slightly inclined towards
'if (trap && write)' and optional 'WARN_ONCE(trap,' in the else branch.

Thanks.


Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-28 Thread Suravee Suthikulanit

Hi Radim / Paolo,

Sorry for delay in response.

On 4/12/2016 11:22 AM, Radim Krčmář wrote:

2016-04-07 03:20-0500, Suravee Suthikulpanit:

From: Suravee Suthikulpanit 

This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
and avic_unaccelerated_access_interception() along with two trace points
(trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).

Signed-off-by: Suravee Suthikulpanit 
---
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
@@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
+static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool flat)
+{
+   struct kvm_arch *vm_data = &vcpu->kvm->arch;
+   int index;
+   u32 *logical_apic_id_table;
+
+   if (flat) { /* flat */
+   if (mda > 7)


Don't you want to check that just one bit it set?


+   return NULL;
+   index = mda;
+   } else { /* cluster */
+   int apic_id = mda & 0xf;
+   int cluster_id = (mda & 0xf0) >> 8;


">> 4".


+
+   if (apic_id > 4 || cluster_id >= 0xf)
+   return NULL;
+   index = (cluster_id << 2) + apic_id;


ffs(apic_id), because 'apic_id' must be compacted into 2 bits.


+   }
+   logical_apic_id_table = (u32 *) 
page_address(vm_data->avic_logical_id_table_page);
+
+   return &logical_apic_id_table[index];
+}


I have quite messed up in the logic here for handling the logical 
cluster ID. Sorry for not catching this earlier. I'm rewriting this 
function altogether to simplify it in the V5.



[...]
+   lid = ffs(dlid) - 1;
+   ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
+   if (ret)
+   return 0;


OS can actually change LDR, so the old one should be invalidated.

(No OS does, but that is not an important factor for the hypervisor.)



By validating the old one, are you suggesting that we should disable the 
logical APIC table entry previously used by this vcpu? If so, that means 
we would need to cached the previous LDR value since the one in vAPIC 
backing page would already be updated.



[...]



+   if (vm_data->ldr_mode != mod) {
+   
clear_page(page_address(vm_data->avic_logical_id_table_page));
+   vm_data->ldr_mode = mod;
+   }
+   break;
+   }


All these cases need to be called on KVM_SET_LAPIC -- the userspace can
provide completely new set of APIC registers and AVIC should build its
maps with them.  Test with save/restore or migration.


Hm.. This means we might need to introduce a new hook which is called 
from the arch/x86/kvm/lapic.c: kvm_apic_post_state_restore(). Probably 
something like kvm_x86_ops->apic_post_state_restore(), which sets up the 
new physical and logical APIC id tables for AVIC.


If this works, I'll add support to handle this and test with the 
migration stuff in the V5.



+   if (offset >= 0x400) {
+   WARN(1, "Unsupported APIC offset %#x\n", offset);


"printk_ratelimited(KERN_INFO " is the most severe message you could
give.  I think that not printing anything is best,


+   return ret;


because we should not return, but continue to emulate the access.


Then, this would continue as if we are handling the normal fault access.




+   }
+
+   if (trap) {
+   /* Handling Trap */
+   if (!write) /* Trap read should never happens */
+   BUG();


(BUG_ON(!write) is shorter, though I would avoid BUG -- only guests are
  going to fail, so we don't need to kill the host.)


Ok. What about just WARN_ONCE(!write, "svm: Handling trap read.\n");

Thanks,
Suravee


Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-13 Thread Radim Krčmář
2016-04-13 00:29+0200, Paolo Bonzini:
> On 12/04/2016 18:22, Radim Krčmář wrote:
>>> > + if (apic_id > 4 || cluster_id >= 0xf)
>>> > + return NULL;
>>> > + index = (cluster_id << 2) + apic_id;
>> ffs(apic_id), because 'apic_id' must be compacted into 2 bits.
> 
> ffs(apic_id)-1 actually.

Yes, thanks.

(And I missed that the confusion begins by passing "ffs(dlid) - 1" as
 mda to avic_handle_ldr_write, because the rest cannot work after that.)


Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-12 Thread Paolo Bonzini


On 12/04/2016 18:22, Radim Krčmář wrote:
>> > +
>> > +  if (apic_id > 4 || cluster_id >= 0xf)
>> > +  return NULL;
>> > +  index = (cluster_id << 2) + apic_id;
> ffs(apic_id), because 'apic_id' must be compacted into 2 bits.
> 

ffs(apic_id)-1 actually.

Thanks for the review, Radim.

Paolo


Re: [PART1 RFC v4 08/11] svm: Add VMEXIT handlers for AVIC

2016-04-12 Thread Radim Krčmář
2016-04-07 03:20-0500, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit 
> 
> This patch introduces VMEXIT handlers, avic_incomplete_ipi_interception()
> and avic_unaccelerated_access_interception() along with two trace points
> (trace_kvm_avic_incomplete_ipi and trace_kvm_avic_unaccelerated_access).
> 
> Signed-off-by: Suravee Suthikulpanit 
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -3515,6 +3515,250 @@ static int mwait_interception(struct vcpu_svm *svm)
> +static u32 *avic_get_logical_id_entry(struct kvm_vcpu *vcpu, u8 mda, bool 
> flat)
> +{
> + struct kvm_arch *vm_data = &vcpu->kvm->arch;
> + int index;
> + u32 *logical_apic_id_table;
> +
> + if (flat) { /* flat */
> + if (mda > 7)

Don't you want to check that just one bit it set?

> + return NULL;
> + index = mda;
> + } else { /* cluster */
> + int apic_id = mda & 0xf;
> + int cluster_id = (mda & 0xf0) >> 8;

">> 4".

> +
> + if (apic_id > 4 || cluster_id >= 0xf)
> + return NULL;
> + index = (cluster_id << 2) + apic_id;

ffs(apic_id), because 'apic_id' must be compacted into 2 bits.

> + }
> + logical_apic_id_table = (u32 *) 
> page_address(vm_data->avic_logical_id_table_page);
> +
> + return &logical_apic_id_table[index];
> +}
> +
> +static int avic_handle_ldr_write(struct kvm_vcpu *vcpu, u8 g_physical_id,
> +  u8 logical_id)
> +{
> + u32 mod;
> + u32 *entry, new_entry;
> + struct vcpu_svm *svm = to_svm(vcpu);
> +
> + if (!svm)
> + return -EINVAL;

(No need to check, svm is taken for granted.)

> +
> + mod = (kvm_apic_get_reg(svm->vcpu.arch.apic, APIC_DFR) >> 28) & 0xf;
> + entry = avic_get_logical_id_entry(vcpu, logical_id, (mod == 0xf));

(Use "kvm_apic_get_reg(apic, APIC_DFR) == APIC_DFR_FLAT".)

> + if (!entry)
> + return -EINVAL;
> +
> + new_entry = READ_ONCE(*entry);
> + new_entry &= ~AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK;
> + new_entry |= (g_physical_id & 
> AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK);
> + new_entry |= AVIC_LOGICAL_ID_ENTRY_VALID_MASK;
> + WRITE_ONCE(*entry, new_entry);
> +
> + return 0;
> +}
> +
> +static int avic_unaccel_trap_write(struct vcpu_svm *svm)
> +{
> + u32 offset = svm->vmcb->control.exit_info_1 & 0xFF0;

The previous function knew the offset, just pass it as an argument.
(Or use AVIC_UNACCEL_ACCESS_OFFSET_MASK, because defining and not using
 it everywhere is too sad.)

> + struct kvm_lapic *apic = svm->vcpu.arch.apic;
> + u32 reg = kvm_apic_get_reg(apic, offset);
> +
> + switch (offset) {
> + case APIC_ID: {
> + u32 aid = (reg >> 24) & 0xff;
> + u64 *o_ent = avic_get_physical_id_entry(&svm->vcpu,
> + svm->vcpu.vcpu_id);
> + u64 *n_ent = avic_get_physical_id_entry(&svm->vcpu, aid);

Write old and new.  (Skip "_ent" if you want to save characters.)

> + if (!n_ent || !o_ent)
> + return 0;
> +
> + /* We need to move physical_id_entry to new offset */
> + *n_ent = *o_ent;
> + *o_ent = 0ULL;
> + svm->avic_physical_id_cache = n_ent;
> + break;
> + }
> + case APIC_LDR: {
> + int ret, lid;
> + int dlid = (reg >> 24) & 0xff;
> +
> + if (!dlid)
> + return 0;

ldr == 0 should be handled as well.

> +
> + lid = ffs(dlid) - 1;
> + ret = avic_handle_ldr_write(&svm->vcpu, svm->vcpu.vcpu_id, lid);
> + if (ret)
> + return 0;

OS can actually change LDR, so the old one should be invalidated.

(No OS does, but that is not an important factor for the hypervisor.)

> +
> + break;
> + }
> + case APIC_DFR: {
> + struct kvm_arch *vm_data = &svm->vcpu.kvm->arch;
> + u32 mod = (reg >> 28) & 0xf;
> +
> + /*
> +  * We assume that all local APICs are using the same type.
> +  * If this changes, we need to rebuild the AVIC logical
> +  * APID id table with subsequent write to APIC_LDR.
> +  */

The code doesn't rebuild avic_logical_id_table_page, it just flushes the
old one.

> + if (vm_data->ldr_mode != mod) {
> + 
> clear_page(page_address(vm_data->avic_logical_id_table_page));
> + vm_data->ldr_mode = mod;
> + }
> + break;
> + }

All these cases need to be called on KVM_SET_LAPIC -- the userspace can
provide completely new set of APIC registers and AVIC should build its
maps with them.  Test with save/restore or migration.

> +static int avic_unaccelerated_access_interception(struct vcpu_svm *svm)
> +{
> + int ret = 0;
> + u32 offset = svm->vmcb->control.exi