From: Christoph Hellwig <h...@infradead.org> Sent: Monday, May 16, 2022 12:35 AM
> 
> I don't really understand how 'childs' fit in here.  The code also
> doesn't seem to be usable without patch 2 and a caller of the
> new functions added in patch 2, so it is rather impossible to review.
> 
> Also:
> 
>  1) why is SEV/TDX so different from other cases that need bounce
>     buffering to treat it different and we can't work on a general
>     scalability improvement
>  2) per previous discussions at how swiotlb itself works, it is
>     clear that another option is to just make pages we DMA to
>     shared with the hypervisor.  Why don't we try that at least
>     for larger I/O?

Tianyu already responded, but let me offer an expanded view.
I have better knowledge of AMD's SEV-SNP than of Intel's TDX,
so my details might be off for TDX.

Taking your question (2) first, two things must be done when guest
memory pages transition between the "shared with the hypervisor"
and the "private to the guest" states:

A) Some bookkeeping between the guest and host, which requires
a hypercall.  Doing a hypercall isn't super-fast, but for large I/Os,
it could be a reasonable tradeoff if we could avoid bounce buffer
copying.

B) The contents of the memory buffer must transition between
encrypted and not encrypted.  The hardware doesn't provide
any mechanism to do such a transition "in place".  The only
way to transition is for the CPU to copy the contents between
an encrypted area and an unencrypted area of memory.

Because of (B), we're stuck needing bounce buffers.  There's no
way to avoid them with the current hardware.  Tianyu also pointed
out not wanting to expose uninitialized guest memory to the host,
so, for example, sharing a read buffer with the host requires that
it first be initialized to zero.

For your question (1), I think we all would agree that SEV-SNP and
TDX usage of bounce buffers isn't fundamentally different from other
uses -- they just put a lot more load on the bounce buffering
mechanism. If done well, general swiotlb scalability improvements
should be sufficient and are much preferred.

You made a recent comment about almost being done removing
all knowledge of swiotlb from drivers [1].  I agree with that goal.
However, Tianyu's recent patches for improving swiotlb scalability
don't align with that goal.  A while back, you suggested using
per-device swiotlb regions [2], and I think Tianyu's patch sets have
taken that approach.  But that approach requires going beyond the
existing per-device swiotlb regions to get scalability with multi-channel
devices, and that's leading us in the wrong direction.

We should reset and make sure we agree on the top-level approach.
1) We want general scalability improvements to swiotlb.  These
    improvements should scale to high CPUs counts (> 100) and for
    multiple NUMA nodes.
2) Drivers should not require any special knowledge of swiotlb to
    benefit from the improvements.  No new swiotlb APIs should be
    need to be used by drivers -- the swiotlb scalability improvements
    should be transparent.
3) The scalability improvements should not be based on device
    boundaries since a single device may have multiple channels
    doing bounce buffering on multiple CPUs in parallel.

Anything else?

The patch from Andi Kleen [3] (not submitted upstream, but referenced
by Tianyu as the basis for his patches) seems like a good starting point
for meeting the top-level approach.  Andi and Robin had some
back-and-forth about Andi's patch that I haven't delved into yet, but
getting that worked out seems like a better overall approach.  I had
an offline chat with Tianyu, and he would agree as well.

Agree?  Disagree?

Michael

[1]  https://lore.kernel.org/lkml/ymqonhkbt8fty...@infradead.org/
[2] https://lore.kernel.org/lkml/20220222080543.ga5...@lst.de/
[3] https://github.com/intel/tdx/commit/4529b578
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Reply via email to