On Thu, May 9, 2024 at 8:27 AM Jason Wang <jasow...@redhat.com> wrote: > > On Thu, May 9, 2024 at 1:16 AM Eugenio Perez Martin <epere...@redhat.com> > wrote: > > > > On Wed, May 8, 2024 at 4:29 AM Jason Wang <jasow...@redhat.com> wrote: > > > > > > On Tue, May 7, 2024 at 6:57 PM Eugenio Perez Martin <epere...@redhat.com> > > > wrote: > > > > > > > > On Tue, May 7, 2024 at 9:29 AM Jason Wang <jasow...@redhat.com> wrote: > > > > > > > > > > On Fri, Apr 12, 2024 at 3:56 PM Eugenio Perez Martin > > > > > <epere...@redhat.com> wrote: > > > > > > > > > > > > On Fri, Apr 12, 2024 at 8:47 AM Jason Wang <jasow...@redhat.com> > > > > > > wrote: > > > > > > > > > > > > > > On Wed, Apr 10, 2024 at 6:03 PM Eugenio Pérez > > > > > > > <epere...@redhat.com> wrote: > > > > > > > > > > > > > > > > The guest may have overlapped memory regions, where different > > > > > > > > GPA leads > > > > > > > > to the same HVA. This causes a problem when overlapped regions > > > > > > > > (different GPA but same translated HVA) exists in the tree, as > > > > > > > > looking > > > > > > > > them by HVA will return them twice. > > > > > > > > > > > > > > I think I don't understand if there's any side effect for shadow > > > > > > > virtqueue? > > > > > > > > > > > > > > > > > > > My bad, I totally forgot to put a reference to where this comes > > > > > > from. > > > > > > > > > > > > Si-Wei found that during initialization this sequences of maps / > > > > > > unmaps happens [1]: > > > > > > > > > > > > HVA GPA IOVA > > > > > > ------------------------------------------------------------------------------------------------------------------------- > > > > > > Map > > > > > > [0x7f7903e00000, 0x7f7983e00000) [0x0, 0x80000000) [0x1000, > > > > > > 0x80000000) > > > > > > [0x7f7983e00000, 0x7f9903e00000) [0x100000000, 0x2080000000) > > > > > > [0x80001000, 0x2000001000) > > > > > > [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) > > > > > > [0x2000001000, 0x2000021000) > > > > > > > > > > > > Unmap > > > > > > [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) > > > > > > [0x1000, > > > > > > 0x20000) ??? > > > > > > > > > > > > The third HVA range is contained in the first one, but exposed > > > > > > under a > > > > > > different GVA (aliased). This is not "flattened" by QEMU, as GPA > > > > > > does > > > > > > not overlap, only HVA. > > > > > > > > > > > > At the third chunk unmap, the current algorithm finds the first > > > > > > chunk, > > > > > > not the second one. This series is the way to tell the difference at > > > > > > unmap time. > > > > > > > > > > > > [1] > > > > > > https://lists.nongnu.org/archive/html/qemu-devel/2024-04/msg00079.html > > > > > > > > > > > > Thanks! > > > > > > > > > > Ok, I was wondering if we need to store GPA(GIOVA) to HVA mappings in > > > > > the iova tree to solve this issue completely. Then there won't be > > > > > aliasing issues. > > > > > > > > > > > > > I'm ok to explore that route but this has another problem. Both SVQ > > > > vrings and CVQ buffers also need to be addressable by VhostIOVATree, > > > > and they do not have GPA. > > > > > > > > At this moment vhost_svq_translate_addr is able to handle this > > > > transparently as we translate vaddr to SVQ IOVA. How can we store > > > > these new entries? Maybe a (hwaddr)-1 GPA to signal it has no GPA and > > > > then a list to go through other entries (SVQ vaddr and CVQ buffers). > > > > > > This seems to be tricky. > > > > > > As discussed, it could be another iova tree. > > > > > > > Yes but there are many ways to add another IOVATree. Let me expand & recap. > > > > Option 1 is to simply add another iova tree to VhostShadowVirtqueue. > > Let's call it gpa_iova_tree, as opposed to the current iova_tree that > > translates from vaddr to SVQ IOVA. To know which one to use is easy at > > adding or removing, like in the memory listener, but how to know at > > vhost_svq_translate_addr? > > Then we won't use virtqueue_pop() at all, we need a SVQ version of > virtqueue_pop() to translate GPA to SVQ IOVA directly? >
The problem is not virtqueue_pop, that's out of the vhost_svq_translate_addr. The problem is the need of adding conditionals / complexity in all the callers of > > > > The easiest way for me is to rely on memory_region_from_host(). When > > vaddr is from the guest, it returns a valid MemoryRegion. When it is > > not, it returns NULL. I'm not sure if this is a valid use case, it > > just worked in my tests so far. > > > > Now we have the second problem: The GPA values of the regions of the > > two IOVA tree must be unique. We need to be able to find unallocated > > regions in SVQ IOVA. At this moment there is only one IOVATree, so > > this is done easily by vhost_iova_tree_map_alloc. But it is very > > complicated with two trees. > > Would it be simpler if we decouple the IOVA allocator? For example, we > can have a dedicated gtree to track the allocated IOVA ranges. It is > shared by both > > 1) Guest memory (GPA) > 2) SVQ virtqueue and buffers > > And another gtree to track the GPA to IOVA. > > The SVQ code could use either > > 1) one linear mappings that contains both SVQ virtqueue and buffers > > or > > 2) dynamic IOVA allocation/deallocation helpers > > So we don't actually need the third gtree for SVQ HVA -> SVQ IOVA? > That's possible, but that scatters the IOVA handling code instead of keeping it self-contained in VhostIOVATree. > > > > Option 2a is to add another IOVATree in VhostIOVATree. I think the > > easiest way is to keep the GPA -> SVQ IOVA in one tree, let's call it > > iova_gpa_map, and the current vaddr -> SVQ IOVA tree in > > iova_taddr_map. This second tree should contain both vaddr memory that > > belongs to the guest and host-only vaddr like vrings and CVQ buffers. > > > > How to pass the GPA to VhostIOVATree API? To add it to DMAMap is > > complicated, as it is shared with intel_iommu. We can add new > > functions to VhostIOVATree that accepts vaddr plus GPA, or maybe it is > > enough with GPA only. It should be functions to add, remove, and > > allocate new entries. But vaddr ones must be kept, because buffers > > might be host-only. > > > > Then the caller can choose which version to call: for adding and > > removing guest memory from the memory listener, the GPA variant. > > Adding SVQ vrings and CVQ buffers should use the current vaddr > > versions. vhost_svq_translate_addr still needs to use > > memory_region_from_host() to know which one to call. > > So the idea is, for region_del/region_add use iova_gpa_map? For the > SVQ translation, use the iova_taddr_map? > > I suspect we still need to synchronize with those two trees so it > might be still problematic as iova_taddr_map contains the overlapped > regions. > Right. All adding / removing functions with GPA must also update the current iova_taddr_map too. > > > > Although I didn't like this approach because it complicates > > VhostIOVATree, I think it is the easier way except for option 4, I'll > > explain later. > > > > This has an extra advantage: currently, the lookup in > > vhost_svq_translate_addr is linear, O(1). This would allow us to use > > the tree properly. > > It uses g_tree_foreach() which I guess is not O(1). > I'm sorry I meant O(N). > > > > Option 2b could be to keep them totally separated. So current > > VhostIOVATree->iova_taddr_map only contains host-only entries, and the > > new iova_gpa_map containst the guest entries. I don't think this case > > adds anything except less memory usage, as the gpa map (which should > > be the fastest) will be the same size. Also, it makes it difficult to > > implement vhost_iova_tree_map_alloc. > > > > Option 3 is to not add new functions but extend the current ones. That > > would require special values of GPA values to indicate no GPA, like > > SVQ vrings. I think option 2a is better, but this may help to keep the > > interface simpler. > > > > Option 4 is what I'm proposing in this RFC. To store the GPA as map id > > so we can tell if the vaddr corresponds to one SVQ IOVA or another. > > Now I'm having trouble retrieving the memory section I see in the > > memory listener. It should not be so difficult but. The main advantage > > is not to duplicate data structs that are already in QEMU, but maybe > > it is not worth the effort. > > > > Going further with this option, we could add a flag to ignore the .id > > member added. But it adds more and more complexity to the API so I > > would prefer option 2a for this. > > Thanks > > > > > > Thanks > > > > > > > > > > > Thanks! > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > Thanks > > > > > > > > > > > > > > > > > > > > > > > To solve this, track GPA in the DMA entry that acs as unique > > > > > > > > identifiers > > > > > > > > to the maps. When the map needs to be removed, iova tree is > > > > > > > > able to > > > > > > > > find the right one. > > > > > > > > > > > > > > > > Users that does not go to this extra layer of indirection can > > > > > > > > use the > > > > > > > > iova tree as usual, with id = 0. > > > > > > > > > > > > > > > > This was found by Si-Wei Liu <si-wei....@oracle.com>, but I'm > > > > > > > > having a hard > > > > > > > > time to reproduce the issue. This has been tested only without > > > > > > > > overlapping > > > > > > > > maps. If it works with overlapping maps, it will be > > > > > > > > intergrated in the main > > > > > > > > series. > > > > > > > > > > > > > > > > Comments are welcome. Thanks! > > > > > > > > > > > > > > > > Eugenio Pérez (2): > > > > > > > > iova_tree: add an id member to DMAMap > > > > > > > > vdpa: identify aliased maps in iova_tree > > > > > > > > > > > > > > > > hw/virtio/vhost-vdpa.c | 2 ++ > > > > > > > > include/qemu/iova-tree.h | 5 +++-- > > > > > > > > util/iova-tree.c | 3 ++- > > > > > > > > 3 files changed, 7 insertions(+), 3 deletions(-) > > > > > > > > > > > > > > > > -- > > > > > > > > 2.44.0 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >