On 06/19/2014 02:12 AM, mtosa...@redhat.com wrote:
Allow vcpus to pin spte translations by:
1) Creating a per-vcpu list of pinned ranges.
2) On mmu reload request:
- Fault ranges.
- Mark sptes with a pinned bit.
- Mark shadow pages as pinned.
3) Then modify the following actions:
- Page age => skip spte flush.
- MMU notifiers => force mmu reload request (which kicks cpu out of
guest mode).
- GET_DIRTY_LOG => force mmu reload request.
- SLAB shrinker => skip shadow page deletion.
TDP-only.
+int kvm_mmu_register_pinned_range(struct kvm_vcpu *vcpu,
+ gfn_t base_gfn, unsigned long npages)
+{
+ struct kvm_pinned_page_range *p;
+
+ mutex_lock(&vcpu->arch.pinned_mmu_mutex);
+ list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
+ if (p->base_gfn == base_gfn && p->npages == npages) {
+ mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+ return -EEXIST;
+ }
+ }
+ mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+
+ if (vcpu->arch.nr_pinned_ranges >=
+ KVM_MAX_PER_VCPU_PINNED_RANGE)
+ return -ENOSPC;
+
+ p = kzalloc(sizeof(struct kvm_pinned_page_range), GFP_KERNEL);
+ if (!p)
+ return -ENOMEM;
+
+ vcpu->arch.nr_pinned_ranges++;
+
+ trace_kvm_mmu_register_pinned_range(vcpu->vcpu_id, base_gfn, npages);
+
+ INIT_LIST_HEAD(&p->link);
+ p->base_gfn = base_gfn;
+ p->npages = npages;
+ mutex_lock(&vcpu->arch.pinned_mmu_mutex);
+ list_add(&p->link, &vcpu->arch.pinned_mmu_pages);
+ mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+
+ return 0;
+}
+
What happens if ranges overlap (within a vcpu, cross-vcpu)? Or if a
range overflows and wraps around 0? Or if it does not refer to RAM?
Looks like you're limiting the number of ranges, but not the number of
pages, so a guest can lock all of its memory.
+
+/*
+ * Pin KVM MMU page translations. This guarantees, for valid
+ * addresses registered by kvm_mmu_register_pinned_range (valid address
+ * meaning address which posses sufficient information for fault to
+ * be resolved), valid translations exist while in guest mode and
+ * therefore no VM-exits due to faults will occur.
+ *
+ * Failure to instantiate pages will abort guest entry.
+ *
+ * Page frames should be pinned with get_page in advance.
+ *
+ * Pinning is not guaranteed while executing as L2 guest.
Does this undermine security?
+ *
+ */
+
+static void kvm_mmu_pin_pages(struct kvm_vcpu *vcpu)
+{
+ struct kvm_pinned_page_range *p;
+
+ if (is_guest_mode(vcpu))
+ return;
+
+ if (!vcpu->arch.mmu.direct_map)
+ return;
+
+ ASSERT(VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+ mutex_lock(&vcpu->arch.pinned_mmu_mutex);
Is the mutex actually needed? It seems it's only taken in vcpu context,
so the vcpu mutex should be sufficient.
+ list_for_each_entry(p, &vcpu->arch.pinned_mmu_pages, link) {
+ gfn_t gfn_offset;
+
+ for (gfn_offset = 0; gfn_offset < p->npages; gfn_offset++) {
+ gfn_t gfn = p->base_gfn + gfn_offset;
+ int r;
+ bool pinned = false;
+
+ r = vcpu->arch.mmu.page_fault(vcpu, gfn << PAGE_SHIFT,
+ PFERR_WRITE_MASK, false,
+ true, &pinned);
+ /* MMU notifier sequence window: retry */
+ if (!r && !pinned)
+ kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu);
+ if (r) {
+ kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu);
+ break;
+ }
+
+ }
+ }
+ mutex_unlock(&vcpu->arch.pinned_mmu_mutex);
+}
+
int kvm_mmu_load(struct kvm_vcpu *vcpu)
{
int r;
@@ -3916,6 +4101,7 @@
goto out;
/* set_cr3() should ensure TLB has been flushed */
vcpu->arch.mmu.set_cr3(vcpu, vcpu->arch.mmu.root_hpa);
+ kvm_mmu_pin_pages(vcpu);
out:
return r;
}
I don't see where you unpin pages, so even if you limit the number of
pinned pages, a guest can pin all of memory by iterating over all of
memory and pinning it a chunk at a time.
You might try something similar to guest MTRR handling.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html