On Tue, Apr 2, 2024 at 8:19 AM Si-Wei Liu <si-wei....@oracle.com> wrote: > > > > On 2/14/2024 11:11 AM, Eugenio Perez Martin wrote: > > On Wed, Feb 14, 2024 at 7:29 PM Si-Wei Liu <si-wei....@oracle.com> wrote: > >> Hi Michael, > >> > >> On 2/13/2024 2:22 AM, Michael S. Tsirkin wrote: > >>> On Mon, Feb 05, 2024 at 05:10:36PM -0800, Si-Wei Liu wrote: > >>>> Hi Eugenio, > >>>> > >>>> I thought this new code looks good to me and the original issue I saw > >>>> with > >>>> x-svq=on should be gone. However, after rebase my tree on top of this, > >>>> there's a new failure I found around setting up guest mappings at early > >>>> boot, please see attached the specific QEMU config and corresponding > >>>> event > >>>> traces. Haven't checked into the detail yet, thinking you would need to > >>>> be > >>>> aware of ahead. > >>>> > >>>> Regards, > >>>> -Siwei > >>> Eugenio were you able to reproduce? Siwei did you have time to > >>> look into this? > >> Didn't get a chance to look into the detail yet in the past week, but > >> thought it may have something to do with the (internals of) iova tree > >> range allocation and the lookup routine. It started to fall apart at the > >> first vhost_vdpa_dma_unmap call showing up in the trace events, where it > >> should've gotten IOVA=0x2000001000, but an incorrect IOVA address > >> 0x1000 was ended up returning from the iova tree lookup routine. > >> > >> HVA GPA IOVA > >> ------------------------------------------------------------------------------------------------------------------------- > >> Map > >> [0x7f7903e00000, 0x7f7983e00000) [0x0, 0x80000000) [0x1000, 0x80000000) > >> [0x7f7983e00000, 0x7f9903e00000) [0x100000000, 0x2080000000) > >> [0x80001000, 0x2000001000) > >> [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) > >> [0x2000001000, 0x2000021000) > >> > >> Unmap > >> [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) [0x1000, > >> 0x20000) ??? > >> shouldn't it be [0x2000001000, > >> 0x2000021000) ??? > >> > It looks the SVQ iova tree lookup routine vhost_iova_tree_find_iova(), > which is called from vhost_vdpa_listener_region_del(), can't properly > deal with overlapped region. Specifically, q35's mch_realize() has the > following: > > 579 memory_region_init_alias(&mch->open_high_smram, OBJECT(mch), > "smram-open-high", > 580 mch->ram_memory, > MCH_HOST_BRIDGE_SMRAM_C_BASE, > 581 MCH_HOST_BRIDGE_SMRAM_C_SIZE); > 582 memory_region_add_subregion_overlap(mch->system_memory, 0xfeda0000, > 583 &mch->open_high_smram, 1); > 584 memory_region_set_enabled(&mch->open_high_smram, false); > > #0 0x0000564c30bf6980 in iova_tree_find_address_iterator > (key=0x564c331cf8e0, value=0x564c331cf8e0, data=0x7fffb6d749b0) at > ../util/iova-tree.c:96 > #1 0x00007f5f66479654 in g_tree_foreach () at /lib64/libglib-2.0.so.0 > #2 0x0000564c30bf6b53 in iova_tree_find_iova (tree=<optimized out>, > map=map@entry=0x7fffb6d74a00) at ../util/iova-tree.c:114 > #3 0x0000564c309da0a9 in vhost_iova_tree_find_iova (tree=<optimized > out>, map=map@entry=0x7fffb6d74a00) at ../hw/virtio/vhost-iova-tree.c:70 > #4 0x0000564c3085e49d in vhost_vdpa_listener_region_del > (listener=0x564c331024c8, section=0x7fffb6d74aa0) at > ../hw/virtio/vhost-vdpa.c:444 > #5 0x0000564c309f4931 in address_space_update_topology_pass > (as=as@entry=0x564c31ab1840 <address_space_memory>, > old_view=old_view@entry=0x564c33364cc0, > new_view=new_view@entry=0x564c333640f0, adding=adding@entry=false) at > ../system/memory.c:977 > #6 0x0000564c309f4dcd in address_space_set_flatview (as=0x564c31ab1840 > <address_space_memory>) at ../system/memory.c:1079 > #7 0x0000564c309f86d0 in memory_region_transaction_commit () at > ../system/memory.c:1132 > #8 0x0000564c309f86d0 in memory_region_transaction_commit () at > ../system/memory.c:1117 > #9 0x0000564c307cce64 in mch_realize (d=<optimized out>, > errp=<optimized out>) at ../hw/pci-host/q35.c:584 > > However, it looks like iova_tree_find_address_iterator() only check if > the translated address (HVA) falls in to the range when trying to locate > the desired IOVA, causing the first DMAMap that happens to overlap in > the translated address (HVA) space to be returned prematurely: > > 89 static gboolean iova_tree_find_address_iterator(gpointer key, > gpointer value, > 90 gpointer data) > 91 { > : > : > 99 if (map->translated_addr + map->size < needle->translated_addr || > 100 needle->translated_addr + needle->size < map->translated_addr) { > 101 return false; > 102 } > 103 > 104 args->result = map; > 105 return true; > 106 } > > In the QEMU trace file, it reveals that the first DMAMap as below gets > returned incorrectly instead the second, the latter of which is what the > actual IOVA corresponds to: > > HVA GPA > IOVA > [0x7f7903e00000, 0x7f7983e00000) [0x0, 0x80000000) > [0x1000, 0x80001000) > [0x7f7903ea0000, 0x7f7903ec0000) [0xfeda0000, 0xfedc0000) > [0x2000001000, 0x2000021000) >
I think the analysis is totally accurate as no code expects to unmap / map overlapping regions. In particular, vdpa kernel does not expect it either. Since it is issued at _realize, it should be ok to unmap all the region range and then map the right range again, even if that implies a lot of unpin / pin. > > Maybe other than check the HVA range, we should also match GPA, or at > least the size should exactly match? > The safe actions here would be to unmap all the memory chunk and then map the overlap memory? Or am I missing something? Another thing I don't get, is this reproducible in previous versions? As far as I understand, this bug was never found before. I guess this path of q35's mch_realize is recent? Thanks! > > Yes, I'm still not able to reproduce. In particular, I don't know how > > how the memory listener adds a region and then release a region with a > > different size. I'm talking about these log entries: > > > > 1706854838.154394:vhost_vdpa_listener_region_add vdpa: 0x556d45c75140 > > iova 0x0 llend 0x80000000 vaddr: 0x7f7903e00000 read-only: 0 > > 452:vhost_vdpa_listener_region_del vdpa: 0x556d45c75140 iova 0x0 llend > > 0x7fffffff > Didn't see a different size here, though if you referred to the > discrepancy in the traces around llend, I thought the two between _add() > and _del() would have to be interpreted differently due to: > > 3d1e4d34 "vhost_vdpa: fix the input in > trace_vhost_vdpa_listener_region_del()" > > Regards, > -Siwei > > Is it possible for you to also trace the skipped regions? We should > > add a debug trace there too... > > > > Thanks! > > > >> PS, I will be taking off from today and for the next two weeks. Will try > >> to help out looking more closely after I get back. > >> > >> -Siwei > >>> Can't merge patches which are known to break things ... >