> -----Original Message----- > From: Don Slutz [mailto:dsl...@verizon.com] > Sent: 09 June 2015 15:28 > To: Paul Durrant; Slutz, Donald Christopher; qemu-devel@nongnu.org; xen- > de...@lists.xen.org > Cc: Michael S. Tsirkin; Stefano Stabellini; Don Slutz > Subject: Re: [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server > > On 06/09/15 10:03, Paul Durrant wrote: > >> -----Original Message----- > >> From: Don Slutz [mailto:dsl...@verizon.com] > >> Sent: 09 June 2015 14:51 > >> To: Paul Durrant; Slutz, Donald Christopher; qemu-devel@nongnu.org; > xen- > >> de...@lists.xen.org > >> Cc: Michael S. Tsirkin; Stefano Stabellini; Don Slutz > >> Subject: Re: [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server > >> > >> On 06/09/15 05:05, Paul Durrant wrote: > >>>> -----Original Message----- > >>>> From: Don Slutz [mailto:dsl...@verizon.com] > >>>> Sent: 08 June 2015 22:19 > >>>> To: qemu-devel@nongnu.org; xen-de...@lists.xen.org > >>>> Cc: Michael S. Tsirkin; Paul Durrant; Stefano Stabellini; Don Slutz; Don > Slutz > >>>> Subject: [PATCH v2 4/4] xen: Fix map/unmap of pcidev to ioreq server > >>>> > >>>> The commit 707ff80021ccd7a68f4b3d2c44eebf87efbb41c4 added usage > of > >>>> xc_hvm_map_pcidev_from_ioreq_server() and > >>>> xc_hvm_unmap_pcidev_from_ioreq_server(). These routines only > >>>> correctly work if the PCI BDF of a PCI device is unique. The 3 > >>>> parts (bus, device, and function) of a PCI BDF are not required to > >>>> be unique. > >>>> > >>>> The PCI BDF is accessed in QEMU: > >>>> bus pci_bus_num() > >>>> device PCI_SLOT() > >>>> function PCI_FUNC() > >>>> > >>>> Add a hash table to track the current set of PCI BDFs and only call > >>>> on xc_hvm_map_pcidev_from_ioreq_server() and > >>>> xc_hvm_unmap_pcidev_from_ioreq_server() when needed. > >>>> > >>> > >>> This seems to imply that the devices are being realized multiple times > >> without being unrealized? > >> > >> Not directly. The devices are not being "realized" in QEMU terms. The > >> PCI to PCI brige is changing state, and the BDF for all PCI devices on > >> the secondary bus changes when the PCI config named > >> PCI_SECONDARY_BUS on > >> the PCI to PCI bridge is changed. This is what patch #2 is all about. > >> > >>> Is that really the case? > >>> > >> > >> That is what it looks like to Xen. But the device listener callbacks > >> are not called. > > > > Then what's calling xc_hvm_map_pcidev_from_ioreq_server()? > > > > Currently (without these patches) the calls on > xc_hvm_map_pcidev_from_ioreq_server() are the ones you added. These > are > from QEMU's device realize. However since on PCI reset (aka start up) > the seconary bus number is zero. So the call on > pci_bus_num(pci_dev->bus) will return zero in that case. So using the > subset trace output (copied from below) and adjusted to be the old way: > > xen_map_pcidev id: 1 bdf: 00.01.00 > xen_map_pcidev id: 1 bdf: 00.01.00 > xen_map_pcidev id: 1 bdf: 00.01.00 > > Is part of what you see. It may help to have the Xen config that matters: > > device_model_args_hvm = [ > "-monitor", > "pty", > "-boot", > "menu=on", > "-device", > "pci-bridge,chassis_nr=1,msi=on,id=pciBridge1.0,addr=0x11.0", > "-device", > "pci- > bridge,chassis_nr=2,msi=on,id=pciBridge5.0,multifunction=on,addr=0x15.0", > "-device", > "pci- > bridge,chassis_nr=3,msi=on,id=pciBridge6.0,multifunction=on,addr=0x16.0", > "-device", > "pci- > bridge,chassis_nr=4,msi=on,id=pciBridge7.0,multifunction=on,bus=pciBridge > 1.0,addr=0x17.0", > "-device", > "pci- > bridge,chassis_nr=5,msi=on,id=pciBridge8.0,multifunction=on,addr=0x18.0", > "-device", > "pvscsi,id=scsi0,bus=pciBridge5.0,addr=0x1.0x0", > "-drive", > "if=none,id=disk0-0,format=raw,file=/dev/etherd/e500.1", > "-device", > "scsi-disk,vendor=VMware,ver=1.0,product=Virtual > disk,bus=scsi0.0,scsi-id=0,drive=disk0-0", > "-device", > "pvscsi,id=sas1,bus=pciBridge7.0,addr=0x1.0x0", > "-drive", > "if=none,id=disk1-1,format=raw,file=/dev/etherd/e500.2", > "-device", > "scsi-disk,vendor=VMware,ver=1.0,product=Virtual > disk,bus=sas1.0,scsi-id=1,drive=disk1-1", > "-device", > "pvscsi,id=scsi2,bus=pciBridge1.0,addr=0x3.0x0", > "-drive", > "if=none,id=disk2-2,format=raw,file=/dev/etherd/e500.3", > "-device", > "scsi-disk,vendor=VMware,ver=1.0,product=Virtual > disk,bus=scsi2.0,scsi-id=2,drive=disk2-2", > "-device", > > "e1000,id=nic2,netdev=net2,mac=00:0c:29:86:44:b4,bus=pciBridge5.0,addr=0 > x3.0x0", > "-netdev", > "type=tap,id=net2,ifname=vif.2-emu,script=/etc/qemu- > ifup,downscript=no", > "-device", > > "vmxnet3,id=nic3,netdev=net3,mac=00:0c:29:86:44:be,bus=pciBridge5.0,add > r=0x4.0x0", > "-netdev", > "type=tap,id=net3,ifname=vif.3-emu,script=/etc/qemu- > ifup,downscript=no", > ] > > > If you want to try this, you would also need a bootable disk listed in > disk= > > > I don't understand why you are having to introduce this to avoid > > duplicate calls? > > > > That is because in general a PCI device's BDF is not fixed. The only > fixed ones are the ones on the root (host bus).
Yes, the code was written with that assumption, which is correct if you use the libxl toolstack. You are bypassing the toolstack and hence running into problems. > It might help to look > at the code for pci_bus_num(): > > int pci_bus_num(PCIBus *s) > { > if (pci_bus_is_root(s)) > return 0; /* pci host bridge */ > return s->parent_dev->config[PCI_SECONDARY_BUS]; > } > > So the bus number this routine returns depends on what the guest has set > in the PCI config space of the PCI to PCI bridge. > So, if the ioreq server registration interface is to remain based on SBDF (and I'm not sure what alternative there is), I think you are going to have to make Xen aware of PCI<->PCI bridges so it can track the bus number changes to already-registered devices. The alternative of broadcasting config. space accesses is somewhat unpalatable and also means you'd need a fundamental change in the way ioreqs are handled. Paul > The starting (and reset) value is 0. So for the config above at QEMU > device realized time there are 3 PCI devices (PIIX4, 2 pvscsi disk > controllers). > > -Don Slutz > > > Paul > > > > >> > >> -Don Slutz > >> > >>> Paul > >>> > >>>> Also fix the PCI BDF default stderr trace output: bus is seperated > >>>> by ':', and function is only 1 digit. > >>>> > >>>> Here is an example of stderr trace output: > >>>> > >>>> xen_map_pcidev id: 1 bdf: 00:00.0 > >>>> xen_map_pcidev id: 1 bdf: 00:01.0 > >>>> xen_map_pcidev id: 1 bdf: 00:01.1 > >>>> xen_map_pcidev id: 1 bdf: 00:01.3 > >>>> xen_map_pcidev id: 1 bdf: 00:02.0 > >>>> xen_map_pcidev id: 1 bdf: 00:03.0 > >>>> xen_map_pcidev id: 1 bdf: 00:04.0 > >>>> xen_map_pcidev id: 1 bdf: 00:05.0 > >>>> xen_map_pcidev id: 1 bdf: 00:11.0 > >>>> xen_map_pcidev id: 1 bdf: 00:15.0 > >>>> xen_map_pcidev id: 1 bdf: 00:16.0 > >>>> xen_map_pcidev id: 1 bdf: 00:17.0 > >>>> xen_map_pcidev id: 1 bdf: 00:18.0 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 2 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 3 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 2 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 3 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:04.0 dup: 2 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 2 > >>>> xen_map_pcidev id: 1 bdf: ff:03.0 > >>>> xen_unmap_pcidev id: 1 bdf: 00:17.0 > >>>> xen_map_pcidev id: 1 bdf: ff:17.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 2 > >>>> xen_map_pcidev id: 1 bdf: ff:01.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 1 > >>>> xen_map_pcidev_dup id: 1 bdf: ff:03.0 dup: 2 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:04.0 dup: 1 > >>>> xen_map_pcidev id: 1 bdf: ff:04.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: ff:03.0 dup: 1 > >>>> xen_map_pcidev id: 1 bdf: 01:03.0 > >>>> xen_unmap_pcidev id: 1 bdf: ff:17.0 > >>>> xen_map_pcidev id: 1 bdf: 01:17.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 1 > >>>> xen_map_pcidev_dup id: 1 bdf: ff:01.0 dup: 2 > >>>> xen_unmap_pcidev_dup id: 1 bdf: ff:01.0 dup: 1 > >>>> xen_map_pcidev id: 1 bdf: 02:01.0 > >>>> xen_unmap_pcidev id: 1 bdf: ff:01.0 > >>>> xen_map_pcidev id: 1 bdf: 03:01.0 > >>>> xen_unmap_pcidev id: 1 bdf: ff:03.0 > >>>> xen_map_pcidev id: 1 bdf: 03:03.0 > >>>> xen_unmap_pcidev id: 1 bdf: ff:04.0 > >>>> xen_map_pcidev id: 1 bdf: 03:04.0 > >>>> xen_unmap_pcidev id: 1 bdf: 00:04.0 > >>>> xen_unmap_pcidev id: 1 bdf: 00:05.0 > >>>> xen_unmap_pcidev id: 1 bdf: 01:03.0 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 2 > >>>> xen_unmap_pcidev id: 1 bdf: 01:17.0 > >>>> xen_map_pcidev id: 1 bdf: 00:17.0 > >>>> xen_unmap_pcidev id: 1 bdf: 03:01.0 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 2 > >>>> xen_unmap_pcidev id: 1 bdf: 03:03.0 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:03.0 dup: 3 > >>>> xen_unmap_pcidev id: 1 bdf: 03:04.0 > >>>> xen_map_pcidev id: 1 bdf: 00:04.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 2 > >>>> xen_map_pcidev id: 1 bdf: 01:03.0 > >>>> xen_unmap_pcidev id: 1 bdf: 00:17.0 > >>>> xen_map_pcidev id: 1 bdf: 01:17.0 > >>>> xen_unmap_pcidev id: 1 bdf: 02:01.0 > >>>> xen_map_pcidev_dup id: 1 bdf: 00:01.0 dup: 3 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 2 > >>>> xen_map_pcidev id: 1 bdf: 02:01.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:01.0 dup: 1 > >>>> xen_map_pcidev id: 1 bdf: 03:01.0 > >>>> xen_unmap_pcidev_dup id: 1 bdf: 00:03.0 dup: 1 > >>>> xen_map_pcidev id: 1 bdf: 03:03.0 > >>>> xen_unmap_pcidev id: 1 bdf: 00:04.0 > >>>> xen_map_pcidev id: 1 bdf: 03:04.0 > >>>> > >>>> Signed-off-by: Don Slutz <dsl...@verizon.com> > >>>> CC: Don Slutz <don.sl...@gmail.com> > >>>> --- > >>>> include/hw/xen/xen_common.h | 53 > >>>> +++++++++++++++++++++++++++++++++++---------- > >>>> trace-events | 6 +++-- > >>>> xen-hvm.c | 15 ++++++++----- > >>>> 3 files changed, 55 insertions(+), 19 deletions(-) > >>>> > >>>> diff --git a/include/hw/xen/xen_common.h > >>>> b/include/hw/xen/xen_common.h > >>>> index 6579b78..260ee58 100644 > >>>> --- a/include/hw/xen/xen_common.h > >>>> +++ b/include/hw/xen/xen_common.h > >>>> @@ -223,12 +223,14 @@ static inline void > xen_unmap_io_section(XenXC > >> xc, > >>>> domid_t dom, > >>>> > >>>> static inline void xen_map_pcidev(XenXC xc, domid_t dom, > >>>> ioservid_t ioservid, > >>>> + GHashTable *pci_bdf, > >>>> PCIDevice *pci_dev) > >>>> { > >>>> } > >>>> > >>>> static inline void xen_unmap_pcidev(XenXC xc, domid_t dom, > >>>> ioservid_t ioservid, > >>>> + GHashTable *pci_bdf, > >>>> PCIDevice *pci_dev, > >>>> uint8_t oldbus) > >>>> { > >>>> @@ -346,27 +348,54 @@ static inline void > xen_unmap_io_section(XenXC > >> xc, > >>>> domid_t dom, > >>>> > >>>> static inline void xen_map_pcidev(XenXC xc, domid_t dom, > >>>> ioservid_t ioservid, > >>>> + GHashTable *realizedpci_bdf, > >>>> PCIDevice *pci_dev) > >>>> { > >>>> - trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus), > >>>> - PCI_SLOT(pci_dev->devfn), > >>>> PCI_FUNC(pci_dev->devfn)); > >>>> - xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid, > >>>> - 0, pci_bus_num(pci_dev->bus), > >>>> - PCI_SLOT(pci_dev->devfn), > >>>> - PCI_FUNC(pci_dev->devfn)); > >>>> + gpointer bdf_key = GINT_TO_POINTER((pci_bus_num(pci_dev- > >bus) > >> << > >>>> 8) | > >>>> + pci_dev->devfn); > >>>> + gint bdf_cnt = GPOINTER_TO_INT(g_hash_table_lookup(pci_bdf, > >>>> bdf_key)); > >>>> + > >>>> + if (!bdf_cnt) { > >>>> + trace_xen_map_pcidev(ioservid, pci_bus_num(pci_dev->bus), > >>>> + PCI_SLOT(pci_dev->devfn), > >>>> + PCI_FUNC(pci_dev->devfn)); > >>>> + g_hash_table_insert(pci_bdf, bdf_key, GINT_TO_POINTER(1)); > >>>> + xc_hvm_map_pcidev_to_ioreq_server(xc, dom, ioservid, > >>>> + 0, pci_bus_num(pci_dev->bus), > >>>> + PCI_SLOT(pci_dev->devfn), > >>>> + PCI_FUNC(pci_dev->devfn)); > >>>> + } else { > >>>> + trace_xen_map_pcidev_dup(ioservid, pci_bus_num(pci_dev- > >bus), > >>>> + PCI_SLOT(pci_dev->devfn), > >>>> + PCI_FUNC(pci_dev->devfn), > >>>> + bdf_cnt + 1); > >>>> + g_hash_table_replace(pci_bdf, bdf_key, > >> GINT_TO_POINTER(bdf_cnt + > >>>> 1)); > >>>> + } > >>>> } > >>>> > >>>> static inline void xen_unmap_pcidev(XenXC xc, domid_t dom, > >>>> ioservid_t ioservid, > >>>> + GHashTable *pci_bdf, > >>>> PCIDevice *pci_dev, > >>>> uint8_t oldbus) > >>>> { > >>>> - trace_xen_unmap_pcidev(ioservid, oldbus, > >>>> - PCI_SLOT(pci_dev->devfn), PCI_FUNC(pci_dev- > >devfn)); > >>>> - xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid, > >>>> - 0, oldbus, > >>>> - PCI_SLOT(pci_dev->devfn), > >>>> - PCI_FUNC(pci_dev->devfn)); > >>>> + gpointer bdf_key = GINT_TO_POINTER((oldbus << 8) | pci_dev- > >>> devfn); > >>>> + gint bdf_cnt = GPOINTER_TO_INT(g_hash_table_lookup(pci_bdf, > >>>> bdf_key)); > >>>> + > >>>> + if (bdf_cnt == 1) { > >>>> + trace_xen_unmap_pcidev(ioservid, oldbus, > >>>> + PCI_SLOT(pci_dev->devfn), > >>>> + PCI_FUNC(pci_dev->devfn)); > >>>> + g_hash_table_remove(pci_bdf, bdf_key); > >>>> + xc_hvm_unmap_pcidev_from_ioreq_server(xc, dom, ioservid, > >>>> + 0, oldbus, > >>>> + PCI_SLOT(pci_dev->devfn), > >>>> + PCI_FUNC(pci_dev->devfn)); > >>>> + } else { > >>>> + trace_xen_unmap_pcidev_dup(ioservid, oldbus, > PCI_SLOT(pci_dev- > >>>>> devfn), > >>>> + PCI_FUNC(pci_dev->devfn), bdf_cnt - > >>>> 1); > >>>> + g_hash_table_replace(pci_bdf, bdf_key, > >> GINT_TO_POINTER(bdf_cnt - > >>>> 1)); > >>>> + } > >>>> } > >>>> > >>>> static inline int xen_create_ioreq_server(XenXC xc, domid_t dom, > >>>> diff --git a/trace-events b/trace-events > >>>> index a589650..58deeaf 100644 > >>>> --- a/trace-events > >>>> +++ b/trace-events > >>>> @@ -930,8 +930,10 @@ xen_map_mmio_range(uint32_t id, uint64_t > >>>> start_addr, uint64_t end_addr) "id: %u > >>>> xen_unmap_mmio_range(uint32_t id, uint64_t start_addr, uint64_t > >>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64 > >>>> xen_map_portio_range(uint32_t id, uint64_t start_addr, uint64_t > >> end_addr) > >>>> "id: %u start: %#"PRIx64" end: %#"PRIx64 > >>>> xen_unmap_portio_range(uint32_t id, uint64_t start_addr, uint64_t > >>>> end_addr) "id: %u start: %#"PRIx64" end: %#"PRIx64 > >>>> -xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) > "id: > >> %u > >>>> bdf: %02x.%02x.%02x" > >>>> -xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t > func) > >> "id: > >>>> %u bdf: %02x.%02x.%02x" > >>>> +xen_map_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t func) > "id: > >> %u > >>>> bdf: %02x:%02x.%x" > >>>> +xen_map_pcidev_dup(uint32_t id, uint8_t bus, uint8_t dev, uint8_t > >> func, > >>>> int dup) "id: %u bdf: %02x:%02x.%x dup: %d" > >>>> +xen_unmap_pcidev(uint32_t id, uint8_t bus, uint8_t dev, uint8_t > func) > >> "id: > >>>> %u bdf: %02x:%02x.%x" > >>>> +xen_unmap_pcidev_dup(uint32_t id, uint8_t bus, uint8_t dev, > uint8_t > >> func, > >>>> int dup) "id: %u bdf: %02x:%02x.%x dup: %d" > >>>> > >>>> # xen-mapcache.c > >>>> xen_map_cache(uint64_t phys_addr) "want %#"PRIx64 > >>>> diff --git a/xen-hvm.c b/xen-hvm.c > >>>> index 7b6d8f1..f041909 100644 > >>>> --- a/xen-hvm.c > >>>> +++ b/xen-hvm.c > >>>> @@ -123,6 +123,7 @@ typedef struct XenIOState { > >>>> MemoryListener memory_listener; > >>>> MemoryListener io_listener; > >>>> DeviceListener device_listener; > >>>> + GHashTable *pci_bdf; > >>>> QLIST_HEAD(, XenPhysmap) physmap; > >>>> hwaddr free_phys_offset; > >>>> const XenPhysmap *log_for_dirtybit; > >>>> @@ -578,7 +579,8 @@ static void xen_device_realize(DeviceListener > >>>> *listener, > >>>> if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > >>>> PCIDevice *pci_dev = PCI_DEVICE(dev); > >>>> > >>>> - xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev); > >>>> + xen_map_pcidev(xen_xc, xen_domid, state->ioservid, state- > >>> pci_bdf, > >>>> + pci_dev); > >>>> } > >>>> } > >>>> > >>>> @@ -590,8 +592,8 @@ static void xen_device_unrealize(DeviceListener > >>>> *listener, > >>>> if (object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) { > >>>> PCIDevice *pci_dev = PCI_DEVICE(dev); > >>>> > >>>> - xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev, > >>>> - pci_bus_num(pci_dev->bus)); > >>>> + xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, state- > >>>>> pci_bdf, > >>>> + pci_dev, pci_bus_num(pci_dev->bus)); > >>>> } > >>>> } > >>>> > >>>> @@ -600,8 +602,10 @@ static void > >>>> xen_device_change_pci_bus_num(DeviceListener *listener, > >>>> { > >>>> XenIOState *state = container_of(listener, XenIOState, > >> device_listener); > >>>> > >>>> - xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev, > >>>> oldbus); > >>>> - xen_map_pcidev(xen_xc, xen_domid, state->ioservid, pci_dev); > >>>> + xen_unmap_pcidev(xen_xc, xen_domid, state->ioservid, state- > >>> pci_bdf, > >>>> + pci_dev, oldbus); > >>>> + xen_map_pcidev(xen_xc, xen_domid, state->ioservid, state- > >>> pci_bdf, > >>>> + pci_dev); > >>>> } > >>>> > >>>> static void xen_sync_dirty_bitmap(XenIOState *state, > >>>> @@ -1318,6 +1322,7 @@ int xen_hvm_init(ram_addr_t > >>>> *below_4g_mem_size, ram_addr_t *above_4g_mem_size, > >>>> memory_listener_register(&state->io_listener, > &address_space_io); > >>>> > >>>> state->device_listener = xen_device_listener; > >>>> + state->pci_bdf = g_hash_table_new(NULL, NULL); > >>>> device_listener_register(&state->device_listener); > >>>> > >>>> /* Initialize backend core & drivers */ > >>>> -- > >>>> 1.8.4 > >>>