On Thu, Oct 21, 2021 at 3:03 PM Eugenio Perez Martin <epere...@redhat.com> wrote: > > On Thu, Oct 21, 2021 at 4:34 AM Jason Wang <jasow...@redhat.com> wrote: > > > > On Wed, Oct 20, 2021 at 8:07 PM Eugenio Perez Martin > > <epere...@redhat.com> wrote: > > > > > > On Wed, Oct 20, 2021 at 11:01 AM Jason Wang <jasow...@redhat.com> wrote: > > > > > > > > On Wed, Oct 20, 2021 at 3:54 PM Eugenio Perez Martin > > > > <epere...@redhat.com> wrote: > > > > > > > > > > On Tue, Oct 19, 2021 at 11:23 AM Jason Wang <jasow...@redhat.com> > > > > > wrote: > > > > > > > > > > > > On Tue, Oct 19, 2021 at 4:32 PM Jason Wang <jasow...@redhat.com> > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > 在 2021/10/1 下午3:06, Eugenio Pérez 写道: > > > > > > > > This tree is able to look for a translated address from an IOVA > > > > > > > > address. > > > > > > > > > > > > > > > > At first glance is similar to util/iova-tree. However, SVQ > > > > > > > > working on > > > > > > > > devices with limited IOVA space need more capabilities, like > > > > > > > > allocating > > > > > > > > IOVA chunks or perform reverse translations (qemu addresses to > > > > > > > > iova). > > > > > > > > > > > > > > > > > > > > > I don't see any reverse translation is used in the shadow code. Or > > > > > > > anything I missed? > > > > > > > > > > > > Ok, it looks to me that it is used in the iova allocator. But I > > > > > > think > > > > > > it's better to decouple it to an independent allocator instead of > > > > > > vhost iova tree. > > > > > > > > > > > > > > > > Reverse translation is used every time a buffer is made available, > > > > > since buffers content are not copied, only the descriptors to SVQ > > > > > vring. > > > > > > > > I may miss something but I didn't see the code? Qemu knows the VA of > > > > virtqueue, and the VA of the VQ is stored in the VirtQueueElem? > > > > > > > > > > It's used in the patch 20/20, could that be the misunderstanding? The > > > function calling it is vhost_svq_translate_addr. > > > > > > Qemu knows the VA address of the buffer, but it must offer a valid SVQ > > > iova to the device. That is the translation I mean. > > > > Ok, I get you. So if I understand correctly, what you did is: > > > > 1) allocate IOVA during region_add > > 2) preform VA->IOVA reverse lookup in handle_kick > > > > This should be fine, but here're some suggestions: > > > > 1) remove the assert(map) in vhost_svq_translate_addr() since guest > > can add e.g BAR address > > Wouldn't VirtQueue block them in virtqueue_pop / address_space_read_* > functions? I'm fine to remove it but I would say it adds value against > coding error.
I think not. Though these addresses were excluded in vhost_vdpa_listener_skipped_section(). For Qemu memory core, they are valid addresses. Qemu emulate how hardware work (e.g pci p2p), so dma to bar is allowed. > > > 2) we probably need a better name vhost_iova_tree_alloc(), maybe > > "vhost_iova_tree_map_alloc()" > > > > Ok I will change for the next version. > > > There's actually another method. > > > > 1) don't do IOVA/map allocation in region_add() > > 2) do the allocation in handle_kick(), then we know the IOVA so no > > reverse lookup > > > > The advantage is that this can work for the case of vIOMMU. And they > > should perform the same: > > > > 1) you method avoid the iova allocation per sg > > 2) my method avoid the reverse lookup per sg > > > > It's somehow doable, but we are replacing a tree search with a linear > insertion at this moment. > > I would say that guest's IOVA -> qemu vaddr part works with no change > for vIOMMU, since VirtQueue's virtqueue_pop already gives us the vaddr > even in the case of vIOMMU. So in this case: 1) listener gives us GPA->host IOVA (host IOVA is allocated per GPA) 2) virtqueue_pop gives us guest IOVA -> VA We still need extra logic to lookup the vIOMMU to get the guest IOVA -> GPA then we can know the host IOVA. If we allocate after virtqueue_pop(), we can follow the same logic as without vIOMMU. Just allocate an host IOVA then all is done. > The only change I would add for that case > is the SVQ -> device map/unmapping part, so the device cannot access > random addresses but only the exposed ones. I'm assuming that part is > O(1). > > This way, we already have a tree with all the possible guest's > addresses, and we only need to look for it's SVQ iova -> vaddr > translation. This is a O(log(N)) operation, Yes, but it's requires traverse the vIOMMU page table which should be slower than our own iova tree? > and read only, so it's > easily parallelizable when we make each SVQ in it's own thread (if > needed). Yes, this is because the host IOVA was allocated before by the memory listener. > The only thing left is to expose that with an iommu miss > (O(1)) and unmap it on used buffers processing (also O(1)). The > domination operation keeps being VirtQueue's own code lookup for > guest's IOVA -> GPA, which I'm assuming is already well optimized and > will benefit from future optimizations since qemu's memory system is > frequently used. > > To optimize your use case we would need to add a custom (and smarter > than the currently used) allocator to SVQ. I've been looking for ways > to reuse glibc or similar in our own arenas but with no luck. It will > be code that SVQ needs to maintain by and for itself anyway. The benefit is to have separate iova allocation from the tree. > > In either case it should not be hard to switch to your method, just a > few call changes in the future, if we achieve a faster allocator. > > Would that make sense? Yes, feel free to choose any method you wish or feel simpler in the next series. > > > > > > > > > > > > > > At this point all the limits are copied to vhost iova tree in the next > > > > > revision I will send, defined at its creation at > > > > > vhost_iova_tree_new(). They are outside of util/iova-tree, only sent > > > > > to the latter at allocation time. > > > > > > > > > > Since vhost_iova_tree has its own vhost_iova_tree_alloc(), that wraps > > > > > the iova_tree_alloc() [1], limits could be kept in vhost-vdpa and make > > > > > them an argument of vhost_iova_tree_alloc. But I'm not sure if it's > > > > > what you are proposing or I'm missing something. > > > > > > > > If the reverse translation is only used in iova allocation, I meant to > > > > split the logic of IOVA allocation itself. > > > > > > > > > > Still don't understand it, sorry :). In SVQ setup we allocate an iova > > > address for every guest's GPA address its driver can use. After that > > > there should be no allocation unless memory is hotplugged. > > > > > > So the limits are only needed precisely at allocation time. Not sure > > > if that is what you mean here, but to first allocate and then check if > > > it is within the range could lead to false negatives, since there > > > could be a valid range *in* the address but the iova allocator > > > returned us another range that fell outside the range. How could we > > > know the cause if it is not using the range itself? > > > > See my above reply. And we can teach the iova allocator to return the > > IOVA in the range that vhost-vDPA supports. > > > > Ok, > > For the next series it will be that way. I'm pretty sure we are > aligned in this part, but the lack of code in this series makes it > very hard to discuss it :). Fine. Let's see. Thanks > > Thanks! > > > Thanks > > > > > > > > > > > > > > > Either way, I think it is harder to talk about this specific case > > > > > without code, since this one still does not address the limits. Would > > > > > you prefer me to send another RFC in WIP quality, with *not* all > > > > > comments addressed? I would say that there is not a lot of pending > > > > > work to send the next one, but it might be easier for all of us. > > > > > > > > I'd prefer to try to address them all, otherwise it's not easy to see > > > > what is missing. > > > > > > > > > > Got it, I will do it that way then! > > > > > > Thanks! > > > > > > > Thanks > > > > > > > > > > > > > > Thanks! > > > > > > > > > > [1] This util/iova-tree method will be proposed in the next series, > > > > > and vhost_iova_tree wraps it since it needs to keep in sync both > > > > > trees: iova->qemu vaddr for iova allocation and the reverse one to > > > > > translate available buffers. > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > >