On Tue, Apr 2, 2024 at 8:19 AM Si-Wei Liu <si-wei....@oracle.com> wrote:
>
>
>
> On 2/14/2024 11:11 AM, Eugenio Perez Martin wrote:
> > On Wed, Feb 14, 2024 at 7:29 PM Si-Wei Liu <si-wei....@oracle.com> wrote:
> >> Hi Michael,
> >>
> >> On 2/13/2024 2:22 AM, Michael S. Tsirkin wrote:
> >>> On Mon, Feb 05, 2024 at 05:10:36PM -0800, Si-Wei Liu wrote:
> >>>> Hi Eugenio,
> >>>>
> >>>> I thought this new code looks good to me and the original issue I saw 
> >>>> with
> >>>> x-svq=on should be gone. However, after rebase my tree on top of this,
> >>>> there's a new failure I found around setting up guest mappings at early
> >>>> boot, please see attached the specific QEMU config and corresponding 
> >>>> event
> >>>> traces. Haven't checked into the detail yet, thinking you would need to 
> >>>> be
> >>>> aware of ahead.
> >>>>
> >>>> Regards,
> >>>> -Siwei
> >>> Eugenio were you able to reproduce? Siwei did you have time to
> >>> look into this?
> >> Didn't get a chance to look into the detail yet in the past week, but
> >> thought it may have something to do with the (internals of) iova tree
> >> range allocation and the lookup routine. It started to fall apart at the
> >> first vhost_vdpa_dma_unmap call showing up in the trace events, where it
> >> should've gotten IOVA=0x2000001000,  but an incorrect IOVA address
> >> 0x1000 was ended up returning from the iova tree lookup routine.
> >>
> >> HVA                    GPA                IOVA
> >> -------------------------------------------------------------------------------------------------------------------------
> >> Map
> >> [0x7f7903e00000, 0x7f7983e00000)    [0x0, 0x80000000) [0x1000, 0x80000000)
> >> [0x7f7983e00000, 0x7f9903e00000)    [0x100000000, 0x2080000000)
> >> [0x80001000, 0x2000001000)
> >> [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000)
> >> [0x2000001000, 0x2000021000)
> >>
> >> Unmap
> >> [0x7f7903ea0000, 0x7f7903ec0000)    [0xfeda0000, 0xfedc0000) [0x1000,
> >> 0x20000) ???
> >>                                       shouldn't it be [0x2000001000,
> >> 0x2000021000) ???
> >>
> It looks the SVQ iova tree lookup routine vhost_iova_tree_find_iova(),
> which is called from vhost_vdpa_listener_region_del(), can't properly
> deal with overlapped region. Specifically, q35's mch_realize() has the
> following:
>
> 579     memory_region_init_alias(&mch->open_high_smram, OBJECT(mch),
> "smram-open-high",
> 580                              mch->ram_memory,
> MCH_HOST_BRIDGE_SMRAM_C_BASE,
> 581                              MCH_HOST_BRIDGE_SMRAM_C_SIZE);
> 582     memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000,
> 583 &mch->open_high_smram, 1);
> 584     memory_region_set_enabled(&mch->open_high_smram, false);
>
> #0  0x0000564c30bf6980 in iova_tree_find_address_iterator
> (key=0x564c331cf8e0, value=0x564c331cf8e0, data=0x7fffb6d749b0) at
> ../util/iova-tree.c:96
> #1  0x00007f5f66479654 in g_tree_foreach () at /lib64/libglib-2.0.so.0
> #2  0x0000564c30bf6b53 in iova_tree_find_iova (tree=<optimized out>,
> map=map@entry=0x7fffb6d74a00) at ../util/iova-tree.c:114
> #3  0x0000564c309da0a9 in vhost_iova_tree_find_iova (tree=<optimized
> out>, map=map@entry=0x7fffb6d74a00) at ../hw/virtio/vhost-iova-tree.c:70
> #4  0x0000564c3085e49d in vhost_vdpa_listener_region_del
> (listener=0x564c331024c8, section=0x7fffb6d74aa0) at
> ../hw/virtio/vhost-vdpa.c:444
> #5  0x0000564c309f4931 in address_space_update_topology_pass
> (as=as@entry=0x564c31ab1840 <address_space_memory>,
> old_view=old_view@entry=0x564c33364cc0,
> new_view=new_view@entry=0x564c333640f0, adding=adding@entry=false) at
> ../system/memory.c:977
> #6  0x0000564c309f4dcd in address_space_set_flatview (as=0x564c31ab1840
> <address_space_memory>) at ../system/memory.c:1079
> #7  0x0000564c309f86d0 in memory_region_transaction_commit () at
> ../system/memory.c:1132
> #8  0x0000564c309f86d0 in memory_region_transaction_commit () at
> ../system/memory.c:1117
> #9  0x0000564c307cce64 in mch_realize (d=<optimized out>,
> errp=<optimized out>) at ../hw/pci-host/q35.c:584
>
> However, it looks like iova_tree_find_address_iterator() only check if
> the translated address (HVA) falls in to the range when trying to locate
> the desired IOVA, causing the first DMAMap that happens to overlap in
> the translated address (HVA) space to be returned prematurely:
>
>   89 static gboolean iova_tree_find_address_iterator(gpointer key,
> gpointer value,
>   90                                                 gpointer data)
>   91 {
>   :
>   :
>   99     if (map->translated_addr + map->size < needle->translated_addr ||
> 100         needle->translated_addr + needle->size < map->translated_addr) {
> 101         return false;
> 102     }
> 103
> 104     args->result = map;
> 105     return true;
> 106 }
>
> In the QEMU trace file, it reveals that the first DMAMap as below gets
> returned incorrectly instead the second, the latter of which is what the
> actual IOVA corresponds to:
>
> HVA                                                             GPA           
>                                   IOVA
> [0x7f7903e00000, 0x7f7983e00000)        [0x0, 0x80000000)                     
>   [0x1000, 0x80001000)
> [0x7f7903ea0000, 0x7f7903ec0000)        [0xfeda0000, 0xfedc0000)        
> [0x2000001000, 0x2000021000)
>

I think the analysis is totally accurate as no code expects to unmap /
map overlapping regions. In particular, vdpa kernel does not expect it
either.

Since it is issued at _realize, it should be ok to unmap all the
region range and then map the right range again, even if that implies
a lot of unpin / pin.

>
> Maybe other than check the HVA range, we should also match GPA, or at
> least the size should exactly match?
>

The safe actions here would be to unmap all the memory chunk and then
map the overlap memory? Or am I missing something?

Another thing I don't get, is this reproducible in previous versions?
As far as I understand, this bug was never found before. I guess this
path of q35's mch_realize is recent?

Thanks!

> > Yes, I'm still not able to reproduce. In particular, I don't know how
> > how the memory listener adds a region and then release a region with a
> > different size. I'm talking about these log entries:
> >
> > 1706854838.154394:vhost_vdpa_listener_region_add vdpa: 0x556d45c75140
> > iova 0x0 llend 0x80000000 vaddr: 0x7f7903e00000 read-only: 0
> > 452:vhost_vdpa_listener_region_del vdpa: 0x556d45c75140 iova 0x0 llend
> > 0x7fffffff
> Didn't see a different size here, though if you referred to the
> discrepancy in the traces around llend, I thought the two between _add()
> and _del() would have to be interpreted differently due to:
>
> 3d1e4d34 "vhost_vdpa: fix the input in
> trace_vhost_vdpa_listener_region_del()"
>
> Regards,
> -Siwei
> > Is it possible for you to also trace the skipped regions? We should
> > add a debug trace there too...
> >
> > Thanks!
> >
> >> PS, I will be taking off from today and for the next two weeks. Will try
> >> to help out looking more closely after I get back.
> >>
> >> -Siwei
> >>>    Can't merge patches which are known to break things ...
>


Reply via email to