Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption
2016-11-15 11:02-0600, Tom Lendacky: > On 11/15/2016 8:39 AM, Radim Krčmář wrote: >> 2016-11-09 18:37-0600, Tom Lendacky: >>> Since DMA addresses will effectively look like 48-bit addresses when the >>> memory encryption mask is set, SWIOTLB is needed if the DMA mask of the >>> device performing the DMA does not support 48-bits. SWIOTLB will be >>> initialized to create un-encrypted bounce buffers for use by these devices. >>> >>> Signed-off-by: Tom Lendacky <thomas.lenda...@amd.com> >>> --- >>> diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c >>> @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = { >>> * pci_swiotlb_detect_override - set swiotlb to 1 if necessary >>> * >>> * This returns non-zero if we are forced to use swiotlb (by the boot >>> - * option). >>> + * option). If memory encryption is enabled then swiotlb will be set >>> + * to 1 so that bounce buffers are allocated and used for devices that >>> + * do not support the addressing range required for the encryption mask. >>> */ >>> int __init pci_swiotlb_detect_override(void) >>> { >>> int use_swiotlb = swiotlb | swiotlb_force; >>> >>> - if (swiotlb_force) >>> + if (swiotlb_force || sme_me_mask) >>> swiotlb = 1; >>> >>> return use_swiotlb; >> >> We want to return 1 even if only sme_me_mask is 1, because the return >> value is used for detection. The following would be less obscure, IMO: >> >> if (swiotlb_force || sme_me_mask) >> swiotlb = 1; >> >> return swiotlb; > > If we do that then all DMA would go through the swiotlb bounce buffers. No, that is decided for example in swiotlb_map_page() and we need to call pci_swiotlb_init() to register that function. > By setting swiotlb to 1 we indicate that the bounce buffers will be > needed for those devices that can't support the addressing range when > the encryption bit is set (48 bit DMA). But if the device can support > the addressing range we won't use the bounce buffers. If we return 0 here, then pci_swiotlb_init() will not be called => dma_ops won't be set to swiotlb_dma_ops => we won't use bounce buffers. >> We setup encrypted swiotlb and then decrypt it, but sometimes set it up >> decrypted (late_alloc) ... why isn't the swiotlb set up decrypted >> directly? > > When swiotlb is allocated in swiotlb_init(), it is too early to make > use of the api to the change the page attributes. Because of this, > the callback to make those changes is needed. Thanks. (I don't know page table setup enough to see a lesser evil. :]) >>> @@ -541,7 +583,7 @@ static phys_addr_t >>> map_single(struct device *hwdev, phys_addr_t phys, size_t size, >>>enum dma_data_direction dir) >>> { >>> - dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start); >>> + dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start); >> >> We have decrypted io_tlb_start before, so shouldn't its physical address >> be saved without the sme bit? (Which changes a lot ...) > > I'm not sure what you mean here, can you elaborate a bit more? The C-bit (sme bit) is a part of the physical address. If we know that a certain physical page should be accessed as unencrypted (the bounce buffer) then the C-bit is 0. I'm wondering why we save the physical address with the C-bit set when we know that it can't be accessed that way (because we remove it every time). The naming is a bit confusing, because physical addresses are actually virtualized by SME -- maybe we should be calling them SME addresses? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [RFC PATCH v3 13/20] x86: DMA support for memory encryption
2016-11-09 18:37-0600, Tom Lendacky: > Since DMA addresses will effectively look like 48-bit addresses when the > memory encryption mask is set, SWIOTLB is needed if the DMA mask of the > device performing the DMA does not support 48-bits. SWIOTLB will be > initialized to create un-encrypted bounce buffers for use by these devices. > > Signed-off-by: Tom Lendacky> --- > diff --git a/arch/x86/kernel/pci-nommu.c b/arch/x86/kernel/pci-nommu.c > @@ -30,7 +30,7 @@ static dma_addr_t nommu_map_page(struct device *dev, struct > page *page, >enum dma_data_direction dir, >unsigned long attrs) > { > - dma_addr_t bus = page_to_phys(page) + offset; > + dma_addr_t bus = phys_to_dma(dev, page_to_phys(page)) + offset; > WARN_ON(size == 0); > if (!check_addr("map_single", dev, bus, size)) > return DMA_ERROR_CODE; > diff --git a/arch/x86/kernel/pci-swiotlb.c b/arch/x86/kernel/pci-swiotlb.c > @@ -12,6 +12,8 @@ > int swiotlb __read_mostly; > > void *x86_swiotlb_alloc_coherent(struct device *hwdev, size_t size, > @@ -64,13 +66,15 @@ static struct dma_map_ops swiotlb_dma_ops = { > * pci_swiotlb_detect_override - set swiotlb to 1 if necessary > * > * This returns non-zero if we are forced to use swiotlb (by the boot > - * option). > + * option). If memory encryption is enabled then swiotlb will be set > + * to 1 so that bounce buffers are allocated and used for devices that > + * do not support the addressing range required for the encryption mask. > */ > int __init pci_swiotlb_detect_override(void) > { > int use_swiotlb = swiotlb | swiotlb_force; > > - if (swiotlb_force) > + if (swiotlb_force || sme_me_mask) > swiotlb = 1; > > return use_swiotlb; We want to return 1 even if only sme_me_mask is 1, because the return value is used for detection. The following would be less obscure, IMO: if (swiotlb_force || sme_me_mask) swiotlb = 1; return swiotlb; > diff --git a/init/main.c b/init/main.c > @@ -598,6 +602,15 @@ asmlinkage __visible void __init start_kernel(void) >*/ > locking_selftest(); > > + /* > + * This needs to be called before any devices perform DMA > + * operations that might use the swiotlb bounce buffers. > + * This call will mark the bounce buffers as un-encrypted so > + * that their usage will not cause "plain-text" data to be > + * decrypted when accessed. > + */ > + mem_encrypt_init(); (Comments below are connected to the reason why we call this.) > diff --git a/lib/swiotlb.c b/lib/swiotlb.c > @@ -159,6 +171,31 @@ void swiotlb_print_info(void) > +/* > + * If memory encryption is active, the DMA address for an encrypted page may > + * be beyond the range of the device. If bounce buffers are required be sure > + * that they are not on an encrypted page. This should be called before the > + * iotlb area is used. > + */ > +void __init swiotlb_clear_encryption(void) > +{ > + void *vaddr; > + unsigned long bytes; > + > + if (no_iotlb_memory || !io_tlb_start || late_alloc) io_tlb_start seems redundant -- when can !no_iotlb_memory && !io_tlb_start happen? Is the order of calls 1) swiotlb init 2) SME init 3) swiotlb late init ? We setup encrypted swiotlb and then decrypt it, but sometimes set it up decrypted (late_alloc) ... why isn't the swiotlb set up decrypted directly? > + return; > + > + vaddr = phys_to_virt(io_tlb_start); > + bytes = PAGE_ALIGN(io_tlb_nslabs << IO_TLB_SHIFT); > + swiotlb_set_mem_unenc(vaddr, bytes); > + memset(vaddr, 0, bytes); > + > + vaddr = phys_to_virt(io_tlb_overflow_buffer); > + bytes = PAGE_ALIGN(io_tlb_overflow); > + swiotlb_set_mem_unenc(vaddr, bytes); > + memset(vaddr, 0, bytes); > +} > + > @@ -541,7 +583,7 @@ static phys_addr_t > map_single(struct device *hwdev, phys_addr_t phys, size_t size, > enum dma_data_direction dir) > { > - dma_addr_t start_dma_addr = phys_to_dma(hwdev, io_tlb_start); > + dma_addr_t start_dma_addr = swiotlb_phys_to_dma(hwdev, io_tlb_start); We have decrypted io_tlb_start before, so shouldn't its physical address be saved without the sme bit? (Which changes a lot ...) Thanks. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
2015-09-17 23:18+, Wu, Feng: >> From: Paolo Bonzini [mailto:pbonz...@redhat.com] >> On 17/09/2015 17:58, Radim Krčmář wrote: >>> xAPIC address are only 8 bit long so they always get delivered to x2APIC >>> cluster 0, where first 16 bits work like xAPIC flat logical mode. >> >> Ok, I was wondering whether this was the correct interpretation. Thanks! > > Paolo, I don't think Radim clarify your concern, right? Since mda is 8-bit, it > is wrong with mda >> 16, this is your concern, right? In case it was: mda is u32 so the bitshift is defined by C. (xAPIC destinations in KVM's x2APIC mode are stored in lowest 8 bits of mda, hence the cluster is always 0.) Or am I still missing the point? ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v8 03/13] KVM: Define a new interface kvm_intr_is_single_vcpu()
2015-09-17 16:24+0200, Paolo Bonzini: > I think kvm_apic_match_logical_addr for MSI and IOAPIC interrupts is > buggy in x2apic mode. > > It does: > > if (apic_x2apic_mode(apic)) > return ((logical_id >> 16) == (mda >> 16)) >&& (logical_id & mda & 0x) != 0; > > But mda is only 8-bits for MSI and IOAPIC interrupts. > > Radim, should kvm_apic_mda also handle the !ipi && x2apic_mda && dest_id > != APIC_BROADCAST case? It never triggers with Linux because it uses > only the physical mode (that's not super-easy to see; > ioapic_set_affinity looks for the RTEs in irq_data->chip_data and that > is allocated with kzalloc). KVM handles that case, it's just convoluted. (I wish we scrapped the IR-less x2APIC mode.) For interrupts from MSI and IOxAPIC: - Flat logical interrupts are delivered as if we had natural (CPU0<->bit0, CPU1<->bit1, ...) flat logical xAPIC for first 8 VCPUs. - Cluster logical doesn't work much, it's interpreted like flat logical. I didn't care about xAPIC cluster because Linux, the sole user of our paravirtualized x2APIC, doesn't configure it. I'll paste kvm_apic_mda() source for better explanation: static u32 kvm_apic_mda(unsigned int dest_id, struct kvm_lapic *source, struct kvm_lapic *target) { bool ipi = source != NULL; bool x2apic_mda = apic_x2apic_mode(ipi ? source : target); if (!ipi && dest_id == APIC_BROADCAST && x2apic_mda) return X2APIC_BROADCAST; return x2apic_mda ? dest_id : SET_APIC_DEST_FIELD(dest_id); } MSI/IOxAPIC interrupt means that source is NULL and if the target is in x2APIC mode, the original 'dest_id' is returned as mda => a flat logical xAPIC to 0x0f will get interpreted as (cluster) logical x2APIC 0xf in kvm_apic_match_logical_addr(). xAPIC address are only 8 bit long so they always get delivered to x2APIC cluster 0, where first 16 bits work like xAPIC flat logical mode. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
2015-01-09 16:18+0100, Paolo Bonzini: On 09/01/2015 16:12, Radim Krčmář wrote: The chipset doesn't support it. :( I meant that we need to recompute PI entries for lowest priority interrupts every time guest's TPR changes. Luckily, Linux doesn't use TPR, but other OS might be a reason to drop lowest priority from PI optimizations. (Or make it more complicated.) Doing vector hashing is a possibility as well. I would like to know what existing chipsets do in practice, then we can mimic it. When looking at /proc/interrupts from time to time, I have only seen interrupts landing on the first CPU of the set. We could also distinguish between AMD and Intel ... AMD should deliver to the highest APIC ID. (If we still need to decide after focus processor and APR checks.) I'll try to check using a more trustworthy approach. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [v3 13/26] KVM: Define a new interface kvm_find_dest_vcpu() for VT-d PI
2014-12-12 23:14+0800, Feng Wu: This patch defines a new interface kvm_find_dest_vcpu for VT-d PI, which can returns the destination vCPU of the interrupt for guests. Since VT-d PI cannot handle broadcast/multicast interrupt, Here we only handle Fixed and Lowest priority interrupts. The current method of handling guest lowest priority interrtups is to use a counter 'apic_arb_prio' for each vCPU, we choose the vCPU with smallest 'apic_arb_prio' and then increase it by 1. However, for VT-d PI, we cannot re-use this, since we no longer have control to 'apic_arb_prio' with posted interrupt direct delivery by Hardware. Here, we introduce a similar way with 'apic_arb_prio' to handle guest lowest priority interrtups when VT-d PI is used. Here is the ideas: - Each vCPU has a counter 'round_robin_counter'. - When guests sets an interrupts to lowest priority, we choose the vCPU with smallest 'round_robin_counter' as the destination, then increase it. There are two points relevant to this patch in new KVM's implementation, (KVM: x86: amend APIC lowest priority arbitration, https://lkml.org/lkml/2015/1/9/362) 1) lowest priority depends on TPR 2) there is no need for balancing (1) has to be considered with PI as well. I kept (2) to avoid whining from people building on that behaviour, but lowest priority backed by PI could be transparent without it. Patch below removes the balancing, but I am not sure this is a price we allowed ourselves to pay ... what are your opinions? ---8--- KVM: x86: don't balance lowest priority interrupts Balancing is not mandated by specification and real hardware most likely doesn't do it. We break backward compatibility to allow optimizations. (Posted interrupts can deliver to only one fixed destination.) Signed-off-by: Radim Krčmář rkrc...@redhat.com --- arch/x86/include/asm/kvm_host.h | 1 - arch/x86/kvm/lapic.c| 8 ++-- 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 97a5dd0222c8..aa4bd8286232 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -370,7 +370,6 @@ struct kvm_vcpu_arch { u64 apic_base; struct kvm_lapic *apic;/* kernel irqchip context */ unsigned long apic_attention; - int32_t apic_arb_prio; int mp_state; u64 ia32_misc_enable_msr; bool tpr_access_reporting; diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 5b9d8c589bba..eb85af8e8fc0 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -749,7 +749,6 @@ static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, trig_mode, vector); switch (delivery_mode) { case APIC_DM_LOWEST: - vcpu-arch.apic_arb_prio++; case APIC_DM_FIXED: /* FIXME add logic for vcpu on reset */ if (unlikely(!apic_enabled(apic))) @@ -837,11 +836,9 @@ int kvm_apic_compare_prio(struct kvm_vcpu *vcpu1, struct kvm_vcpu *vcpu2) * - uses the APR register (which also considers ISR and IRR), * - chooses the highest APIC ID when APRs are identical, * - and allows a focus processor. -* XXX: pseudo-balancing with apic_arb_prio is a KVM-specific feature */ - int tpr = kvm_apic_get_reg(vcpu1-arch.apic, APIC_TASKPRI) - - kvm_apic_get_reg(vcpu2-arch.apic, APIC_TASKPRI); - return tpr ? : vcpu1-arch.apic_arb_prio - vcpu2-arch.apic_arb_prio; + return kvm_apic_get_reg(vcpu1-arch.apic, APIC_TASKPRI) - + kvm_apic_get_reg(vcpu2-arch.apic, APIC_TASKPRI); } static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int vector) @@ -1595,7 +1592,6 @@ void kvm_lapic_reset(struct kvm_vcpu *vcpu) vcpu-arch.pv_eoi.msr_val = 0; apic_update_ppr(apic); - vcpu-arch.apic_arb_prio = 0; vcpu-arch.apic_attention = 0; apic_debug(%s: vcpu=%p, id=%d, base_msr= -- 2.2.0 ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu