On Thu, May 2, 2024 at 12:09 AM Si-Wei Liu <si-wei....@oracle.com> wrote:
>
>
>
> On 4/30/2024 11:11 AM, Eugenio Perez Martin wrote:
> > On Mon, Apr 29, 2024 at 1:19 PM Jonah Palmer <jonah.pal...@oracle.com> 
> > wrote:
> >>
> >>
> >> On 4/29/24 4:14 AM, Eugenio Perez Martin wrote:
> >>> On Thu, Apr 25, 2024 at 7:44 PM Si-Wei Liu <si-wei....@oracle.com> wrote:
> >>>>
> >>>>
> >>>> On 4/24/2024 12:33 AM, Eugenio Perez Martin wrote:
> >>>>> On Wed, Apr 24, 2024 at 12:21 AM Si-Wei Liu <si-wei....@oracle.com> 
> >>>>> wrote:
> >>>>>>
> >>>>>> On 4/22/2024 1:49 AM, Eugenio Perez Martin wrote:
> >>>>>>> On Sat, Apr 20, 2024 at 1:50 AM Si-Wei Liu <si-wei....@oracle.com> 
> >>>>>>> wrote:
> >>>>>>>> On 4/19/2024 1:29 AM, Eugenio Perez Martin wrote:
> >>>>>>>>> On Thu, Apr 18, 2024 at 10:46 PM Si-Wei Liu <si-wei....@oracle.com> 
> >>>>>>>>> wrote:
> >>>>>>>>>> On 4/10/2024 3:03 AM, Eugenio Pérez wrote:
> >>>>>>>>>>> IOVA tree is also used to track the mappings of virtio-net shadow
> >>>>>>>>>>> virtqueue.  This mappings may not match with the GPA->HVA ones.
> >>>>>>>>>>>
> >>>>>>>>>>> This causes a problem when overlapped regions (different GPA but 
> >>>>>>>>>>> same
> >>>>>>>>>>> translated HVA) exists in the tree, as looking them by HVA will 
> >>>>>>>>>>> return
> >>>>>>>>>>> them twice.  To solve this, create an id member so we can assign 
> >>>>>>>>>>> unique
> >>>>>>>>>>> identifiers (GPA) to the maps.
> >>>>>>>>>>>
> >>>>>>>>>>> Signed-off-by: Eugenio Pérez <epere...@redhat.com>
> >>>>>>>>>>> ---
> >>>>>>>>>>>        include/qemu/iova-tree.h | 5 +++--
> >>>>>>>>>>>        util/iova-tree.c         | 3 ++-
> >>>>>>>>>>>        2 files changed, 5 insertions(+), 3 deletions(-)
> >>>>>>>>>>>
> >>>>>>>>>>> diff --git a/include/qemu/iova-tree.h b/include/qemu/iova-tree.h
> >>>>>>>>>>> index 2a10a7052e..34ee230e7d 100644
> >>>>>>>>>>> --- a/include/qemu/iova-tree.h
> >>>>>>>>>>> +++ b/include/qemu/iova-tree.h
> >>>>>>>>>>> @@ -36,6 +36,7 @@ typedef struct DMAMap {
> >>>>>>>>>>>            hwaddr iova;
> >>>>>>>>>>>            hwaddr translated_addr;
> >>>>>>>>>>>            hwaddr size;                /* Inclusive */
> >>>>>>>>>>> +    uint64_t id;
> >>>>>>>>>>>            IOMMUAccessFlags perm;
> >>>>>>>>>>>        } QEMU_PACKED DMAMap;
> >>>>>>>>>>>        typedef gboolean (*iova_tree_iterator)(DMAMap *map);
> >>>>>>>>>>> @@ -100,8 +101,8 @@ const DMAMap *iova_tree_find(const IOVATree 
> >>>>>>>>>>> *tree, const DMAMap *map);
> >>>>>>>>>>>         * @map: the mapping to search
> >>>>>>>>>>>         *
> >>>>>>>>>>>         * Search for a mapping in the iova tree that 
> >>>>>>>>>>> translated_addr overlaps with the
> >>>>>>>>>>> - * mapping range specified.  Only the first found mapping will be
> >>>>>>>>>>> - * returned.
> >>>>>>>>>>> + * mapping range specified and map->id is equal.  Only the first 
> >>>>>>>>>>> found
> >>>>>>>>>>> + * mapping will be returned.
> >>>>>>>>>>>         *
> >>>>>>>>>>>         * Return: DMAMap pointer if found, or NULL if not found.  
> >>>>>>>>>>> Note that
> >>>>>>>>>>>         * the returned DMAMap pointer is maintained internally.  
> >>>>>>>>>>> User should
> >>>>>>>>>>> diff --git a/util/iova-tree.c b/util/iova-tree.c
> >>>>>>>>>>> index 536789797e..0863e0a3b8 100644
> >>>>>>>>>>> --- a/util/iova-tree.c
> >>>>>>>>>>> +++ b/util/iova-tree.c
> >>>>>>>>>>> @@ -97,7 +97,8 @@ static gboolean 
> >>>>>>>>>>> iova_tree_find_address_iterator(gpointer key, gpointer value,
> >>>>>>>>>>>
> >>>>>>>>>>>            needle = args->needle;
> >>>>>>>>>>>            if (map->translated_addr + map->size < 
> >>>>>>>>>>> needle->translated_addr ||
> >>>>>>>>>>> -        needle->translated_addr + needle->size < 
> >>>>>>>>>>> map->translated_addr) {
> >>>>>>>>>>> +        needle->translated_addr + needle->size < 
> >>>>>>>>>>> map->translated_addr ||
> >>>>>>>>>>> +        needle->id != map->id) {
> >>>>>>>>>> It looks this iterator can also be invoked by SVQ from
> >>>>>>>>>> vhost_svq_translate_addr() -> iova_tree_find_iova(), where guest 
> >>>>>>>>>> GPA
> >>>>>>>>>> space will be searched on without passing in the ID (GPA), and 
> >>>>>>>>>> exact
> >>>>>>>>>> match for the same GPA range is not actually needed unlike the 
> >>>>>>>>>> mapping
> >>>>>>>>>> removal case. Could we create an API variant, for the SVQ lookup 
> >>>>>>>>>> case
> >>>>>>>>>> specifically? Or alternatively, add a special flag, say 
> >>>>>>>>>> skip_id_match to
> >>>>>>>>>> DMAMap, and the id match check may look like below:
> >>>>>>>>>>
> >>>>>>>>>> (!needle->skip_id_match && needle->id != map->id)
> >>>>>>>>>>
> >>>>>>>>>> I think vhost_svq_translate_addr() could just call the API variant 
> >>>>>>>>>> or
> >>>>>>>>>> pass DMAmap with skip_id_match set to true to 
> >>>>>>>>>> svq_iova_tree_find_iova().
> >>>>>>>>>>
> >>>>>>>>> I think you're totally right. But I'd really like to not complicate
> >>>>>>>>> the API of the iova_tree more.
> >>>>>>>>>
> >>>>>>>>> I think we can look for the hwaddr using memory_region_from_host and
> >>>>>>>>> then get the hwaddr. It is another lookup though...
> >>>>>>>> Yeah, that will be another means of doing translation without having 
> >>>>>>>> to
> >>>>>>>> complicate the API around iova_tree. I wonder how the lookup through
> >>>>>>>> memory_region_from_host() may perform compared to the iova tree one, 
> >>>>>>>> the
> >>>>>>>> former looks to be an O(N) linear search on a linked list while the
> >>>>>>>> latter would be roughly O(log N) on an AVL tree?
> >>>>>>> Even worse, as the reverse lookup (from QEMU vaddr to SVQ IOVA) is
> >>>>>>> linear too. It is not even ordered.
> >>>>>> Oh Sorry, I misread the code and I should look for g_tree_foreach ()
> >>>>>> instead of g_tree_search_node(). So the former is indeed linear
> >>>>>> iteration, but it looks to be ordered?
> >>>>>>
> >>>>>> https://urldefense.com/v3/__https://github.com/GNOME/glib/blob/main/glib/gtree.c*L1115__;Iw!!ACWV5N9M2RV99hQ!Ng2rLfRd9tLyNTNocW50Mf5AcxSt0uF0wOdv120djff-z_iAdbujYK-jMi5UC1DZLxb1yLUv2vV0j3wJo8o$
> >>>>> The GPA / IOVA are ordered but we're looking by QEMU's vaddr.
> >>>>>
> >>>>> If we have these translations:
> >>>>> [0x1000, 0x2000] -> [0x10000, 0x11000]
> >>>>> [0x2000, 0x3000] -> [0x6000, 0x7000]
> >>>>>
> >>>>> We will see them in this order, so we cannot stop the search at the 
> >>>>> first node.
> >>>> Yeah, reverse lookup is unordered indeed, anyway.
> >>>>
> >>>>>>> But apart from this detail you're right, I have the same concerns with
> >>>>>>> this solution too. If we see a hard performance regression we could go
> >>>>>>> to more complicated solutions, like maintaining a reverse IOVATree in
> >>>>>>> vhost-iova-tree too. First RFCs of SVQ did that actually.
> >>>>>> Agreed, yeap we can use memory_region_from_host for now.  Any reason 
> >>>>>> why
> >>>>>> reverse IOVATree was dropped, lack of users? But now we have one!
> >>>>>>
> >>>>> No, it is just simplicity. We already have an user in the hot patch in
> >>>>> the master branch, vhost_svq_vring_write_descs. But I never profiled
> >>>>> enough to find if it is a bottleneck or not to be honest.
> >>>> Right, without vIOMMU or a lot of virtqueues / mappings, it's hard to
> >>>> profile and see the difference.
> >>>>> I'll send the new series by today, thank you for finding these issues!
> >>>> Thanks! In case you don't have bandwidth to add back reverse IOVA tree,
> >>>> Jonah (cc'ed) may have interest in looking into it.
> >>>>
> >>> Actually, yes. I've tried to solve it using:
> >>> memory_region_get_ram_ptr -> It's hard to get this pointer to work
> >>> without messing a lot with IOVATree.
> >>> memory_region_find -> I'm totally unable to make it return sections
> >>> that make sense
> >>> flatview_for_each_range -> It does not return the same
> >>> MemoryRegionsection as the listener, not sure why.
> >>>
> >>> The only advance I have is that memory_region_from_host is able to
> >>> tell if the vaddr is from the guest or not.
> >>>
> >>> So I'm convinced there must be a way to do it with the memory
> >>> subsystem, but I think the best way to do it ATM is to store a
> >>> parallel tree with GPA-> SVQ IOVA translations. At removal time, if we
> >>> find the entry in this new tree, we can directly remove it by GPA. If
> >>> not, assume it is a host-only address like SVQ vrings, and remove by
> >>> iterating on vaddr as we do now. It is guaranteed the guest does not
> >>> translate to that vaddr and that that vaddr is unique in the tree
> >>> anyway.
> >>>
> >>> Does it sound reasonable? Jonah, would you be interested in moving this 
> >>> forward?
> >>>
> >>> Thanks!
> >>>
> >> Sure, I'd be more than happy to work on this stuff! I can probably get
> >> started on this either today or tomorrow.
> >>
> >> Si-Wei mentioned something about these "reverse IOVATree" patches that
> >> were dropped;
> > The patches implementing the reverse IOVA tree were never created /
> > posted, just in case you try to look for them.
> >
> >
> >> is this relevant to what you're asking here? Is it
> >> something I should base my work off of?
> >>
> > So these patches work ok for adding and removing maps. We assign ids,
> > which is the gpa of the memory region that the listener receives. The
> > bad news is that SVQ also needs this id to look for the right
> > translation at vhost_svq_translate_addr, so this series is not
> > complete.
> I have a fundamental question to ask here. Are we sure SVQ really needs
> this id (GPA) to identify the right translation? Or we're just
> concerning much about the aliased map where there could be one single
> HVA mapped to multiple IOVAs / GPAs, which (the overlapped) is almost
> transient mapping that usually goes away very soon after guest memory
> layout is stabilized?

Are we sure all of the overlaps go away after the memory layout is
stabilized in all conditions? I think it is worth not making two
different ways to ask the tree depending on what part of QEMU we are.

> For what I can tell, the caller in SVQ datapath
> code (vhost_svq_vring_write_descs) just calls into
> vhost_iova_tree_find_iova to look for IOVA translation rather than
> identify a specific section on the memory region, the latter of which
> would need the id (GPA) to perform an exact match. The removal case
> would definitely need perfect match on GPA with the additional id, but I
> don't find it necessary for the vhost_svq_vring_write_descs code to pass
> in the id / GPA? Do I miss something?
>

Expanding on the other thread, as there are more concrete points
there. Please let me know if I missed something.

> Thanks,
> -Siwei
>
> > You can find the
> > vhost_iova_tree_find_iova()->iova_tree_find_iova() call there.
> >
> > The easiest solution is the reverse IOVA tree of HVA -> SVQ IOVA. It
> > is also the less elegant and (potentially) the less performant, as it
> > includes duplicating information that QEMU already has, and a
> > potentially linear search.
> >
> > David Hildenbrand (CCed) proposed to try iterating through RAMBlocks.
> > I guess qemu_ram_block_from_host() could return a block where
> > block->offset is the id of the map?
> >
> > It would be great to try this approach. If you don't have the bandwith
> > for this, going directly for the reverse iova tree is also a valid
> > solution.
> >
> > Thanks!
> >
> >> If there's any other relevant information about this issue that you
> >> think I should know, let me know. I'll start digging into this ASAP and
> >> will reach out if I need any guidance. :)
> >>
> >> Jonah
> >>
> >>>> -Siwei
> >>>>
> >>>>
> >>>>>> Thanks,
> >>>>>> -Siwei
> >>>>>>> Thanks!
> >>>>>>>
> >>>>>>>> Of course,
> >>>>>>>> memory_region_from_host() won't search out of the guest memory space 
> >>>>>>>> for
> >>>>>>>> sure. As this could be on the hot data path I have a little bit
> >>>>>>>> hesitance over the potential cost or performance regression this 
> >>>>>>>> change
> >>>>>>>> could bring in, but maybe I'm overthinking it too much...
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> -Siwei
> >>>>>>>>
> >>>>>>>>>> Thanks,
> >>>>>>>>>> -Siwei
> >>>>>>>>>>>                return false;
> >>>>>>>>>>>            }
> >>>>>>>>>>>
>


Reply via email to