Quoting Ram Pai (2019-12-12 00:45:02) > On Tue, Dec 10, 2019 at 07:43:24PM -0600, Michael Roth wrote: > > Quoting Ram Pai (2019-12-06 19:12:39) > > > Commit edea902c1c1e ("powerpc/pseries/iommu: Don't use dma_iommu_ops on > > > secure guests") > > > disabled dma_iommu_ops path, for secure VMs. Disabling dma_iommu_ops > > > path for secure VMs, helped enable dma_direct path. This enabled > > > support for bounce-buffering through SWIOTLB. However it fails to > > > operate when IOMMU is enabled, since I/O pages are not TCE mapped. > > > > > > Renable dma_iommu_ops path for pseries Secure VMs. It handles all > > > cases including, TCE mapping I/O pages, in the presence of a > > > IOMMU. > > > > Wasn't clear to me at first, but I guess the main gist of this series is > > that we want to continue to use SWIOTLB, but also need to create mappings > > of it's bounce buffers in the IOMMU, so we revert to using dma_iommu_ops > > and rely on the various dma_iommu_{map,alloc}_bypass() hooks throughout > > to call into dma_direct_* ops rather than relying on the dma_is_direct(ops) > > checks in DMA API functions to do the same. > > > > That makes sense, but one issue I see with that is that > > dma_iommu_map_bypass() only tests true if all the following are true: > > > > 1) the device requests a 64-bit DMA mask via > > dma_set_mask/dma_set_coherent_mask > > 2) DDW is enabled (i.e. we don't pass disable_ddw on command-line) > > > > dma_is_direct() checks don't have this limitation, so I think for > > anything cases, such as devices that use a smaller DMA mask, we'll > > end up falling back to the non-bypass functions in dma_iommu_ops, which > > will likely break for things like dma_alloc_coherent/dma_map_single > > since they won't use SWIOTLB pages and won't do the necessary calls to > > set_memory_unencrypted() to share those non-SWIOTLB buffers with > > hypervisor. > > > > Maybe that's ok, but I think we should be clearer about how to > > fail/handle these cases. > > Yes. makes sense. Device that cannot handle 64bit dma mask will not work. > > > > > Though I also agree with some concerns Alexey stated earlier: it seems > > wasteful to map the entire DDW window just so these bounce buffers can be > > mapped. Especially if you consider the lack of a mapping to be an > > additional > > safe-guard against things like buggy device implementations on the QEMU > > side. E.g. if we leaked pages to the hypervisor on accident, those pages > > wouldn't be immediately accessible to a device, and would still require > > additional work get past the IOMMU. > > Well, an accidental unintented page leak to the hypervisor, is a very > bad thing, regardless of any DMA mapping. The device may not be able to > access it, but the hypervisor still can access it.
Agreed, but if IOMMU can provide additional isolation we should make use of it when reasonable. > > > > > What would it look like if we try to make all this work with disable_ddw > > passed > > to kernel command-line (or forced for is_secure_guest())? > > > > 1) dma_iommu_{alloc,map}_bypass() would no longer get us to dma_direct_* > > ops, > > but an additional case or hook that considers is_secure_guest() might > > do > > it. > > > > 2) We'd also need to set up an IOMMU mapping for the bounce buffers via > > io_tlb_start/io_tlb_end. We could do it once, on-demand via > > dma_iommu_bypass_supported() like we do for the 64-bit DDW window, or > > maybe in some init function. > > Hmm... i not sure how to accomplish (2). we need use some DDW window > to setup the mappings. right? If disable_ddw is set, there wont be any > ddw. What am i missing? We have 2 windows, the default 32-bit window that normally covers DMA addresses 0..1GB, and an additional DDW window that's created on demand for 64-bit devices (since they can address a full linear mapping of all guest memory at DMA address 1<<59). Your current patch uses the latter, but we could potentially use the 32-bit window since we only need to map the SWIOTLB pages. Not saying that's necessarily better, but one upside is it only requires devices to support 32-bit DMA addressing, which is a larger class of devices than those that support 64-bit (since 64-bit device drivers generally have a 32-bit fallback). Required changes are a bit more pervasive though. It might make sense to get both scenarios working at some point, but maybe it's okay to handle 64-bit first. 64-bit does give us more legroom if we anticipate changes in where the SWIOTLB memory is allocated from, or expect it to become larger than 1GB. > > > > > That also has the benefit of not requiring devices to support 64-bit DMA. > > > > Alternatively, we could continue to rely on the 64-bit DDW window, but > > modify call to enable_ddw() to only map the io_tlb_start/end range in > > the case of is_secure_guest(). This is a little cleaner implementation-wise > > since we can rely on the existing dma_iommu_{alloc,map}_bypass() hooks. > > I have been experimenting with this. Trying to map only the memory > range from io_tlb_start/io_tlb_end though the 64-bit ddw window. But > due to some reason, it wants the io_tlb_start to be aligned to some > boundary. It looks like a 2^28 boundary. Not sure what dictates that > boundary. Not sure, but that might be related to 256MB LMB size. Could also be the page size of the DDW window, but seems large for that. In any case I think it would be okay if we needed to truncate io_tlb_start to a lower 256MB-boundary and the subsequent range. We have a few more mappings than strictly necessary, but it's still better than all guest memory. > > > > , but > > devices that don't support 64-bit will fail back to not using dma_direct_* > > ops > > and fail miserably. We'd probably want to handle that more gracefully. > > Yes i will put a warning message to indicate the failure. > > > > > Or we handle both cases gracefully. To me it makes more sense to enable > > non-DDW case, then consider adding DDW case later if there's some reason > > why 64-bit DMA is needed. But would be good to hear if there are other > > opinions. > > educate me a bit here. What is a non-DDW case? is it possible for a > device to acccess memory, in the presence of a IOMMU, without a > window-mapping? It's not indeed, but we have the default 32-bit window for the non-DDW case. See above. > > > > > > > > > Signed-off-by: Ram Pai <linux...@us.ibm.com> > > > --- > > > arch/powerpc/platforms/pseries/iommu.c | 11 +---------- > > > 1 file changed, 1 insertion(+), 10 deletions(-) > > > > > > diff --git a/arch/powerpc/platforms/pseries/iommu.c > > > b/arch/powerpc/platforms/pseries/iommu.c > > > index 67b5009..4e27d66 100644 > > > --- a/arch/powerpc/platforms/pseries/iommu.c > > > +++ b/arch/powerpc/platforms/pseries/iommu.c > > > @@ -36,7 +36,6 @@ > > > #include <asm/udbg.h> > > > #include <asm/mmzone.h> > > > #include <asm/plpar_wrappers.h> > > > -#include <asm/svm.h> > > > #include <asm/ultravisor.h> > > > > > > #include "pseries.h" > > > @@ -1346,15 +1345,7 @@ void iommu_init_early_pSeries(void) > > > of_reconfig_notifier_register(&iommu_reconfig_nb); > > > register_memory_notifier(&iommu_mem_nb); > > > > > > - /* > > > - * Secure guest memory is inacessible to devices so regular DMA > > > isn't > > > - * possible. > > > - * > > > - * In that case keep devices' dma_map_ops as NULL so that the > > > generic > > > - * DMA code path will use SWIOTLB to bounce buffers for DMA. > > > - */ > > > - if (!is_secure_guest()) > > > - set_pci_dma_ops(&dma_iommu_ops); > > > + set_pci_dma_ops(&dma_iommu_ops); > > > } > > > > > > static int __init disable_multitce(char *str) > > > -- > > > 1.8.3.1 > > > > > -- > Ram Pai