Re: [PATCH 0/3] pci: Fix ARI next function numbers

2023-07-02 Thread Ani Sinha



> On 02-Jul-2023, at 2:03 PM, Akihiko Odaki  wrote:
> 
> The ARI next function number field is undefined for VF. The PF should
> end the linked list formed with the field by specifying 0.
> 
> Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
> ("[PATCH 0/4] pci: Compare function number and ARI next function number")

Normally we do this using patch version numbers because otherwise over email, 
its hard to track the various iterations of the patches trying to solve the 
same thing.

> 
> Akihiko Odaki (3):
>  docs: Fix next function numbers in SR/IOV documentation
>  hw/nvme: Fix ARI next function numbers
>  igb: Fix ARI next function numbers
> 
> docs/pcie_sriov.txt | 4 ++--
> hw/net/igb.c| 2 +-
> hw/nvme/ctrl.c  | 2 +-
> 3 files changed, 4 insertions(+), 4 deletions(-)
> 
> -- 
> 2.41.0
> 




Re: [PATCH v3 0/2] pcie: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki

On 2023/07/03 13:52, Michael S. Tsirkin wrote:

On Mon, Jul 03, 2023 at 12:17:16PM +0900, Akihiko Odaki wrote:

On 2023/07/02 21:43, Michael S. Tsirkin wrote:

On Sun, Jul 02, 2023 at 09:02:25PM +0900, Akihiko Odaki wrote:

The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
("[PATCH 0/4] pci: Compare function number and ARI next function number")


Thanks! How was this patch tested?


I brought VFs up with igb and performed link up for a fresh VM and a VM
migrated from 8.0.2.


Which guest?


I use Fedora 38.







V2 -> V3:
Moved the logic to PCI common infrastucture (Michael S. Tsirkin)

V1 -> V2:
Fixed migration. (Michael S. Tsirkin)
Added a caveat comment. (Michael S. Tsirkin)

Akihiko Odaki (2):
pcie: Use common ARI next function number
pcie: Specify 0 for ARI next function numbers

   docs/pcie_sriov.txt   | 4 ++--
   include/hw/pci/pci.h  | 2 ++
   include/hw/pci/pcie.h | 2 +-
   hw/core/machine.c | 1 +
   hw/net/igb.c  | 2 +-
   hw/net/igbvf.c| 2 +-
   hw/nvme/ctrl.c| 2 +-
   hw/pci/pci.c  | 2 ++
   hw/pci/pcie.c | 4 +++-
   9 files changed, 14 insertions(+), 7 deletions(-)

--
2.41.0








Re: [PATCH v3 0/2] pcie: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Mon, Jul 03, 2023 at 12:17:16PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 21:43, Michael S. Tsirkin wrote:
> > On Sun, Jul 02, 2023 at 09:02:25PM +0900, Akihiko Odaki wrote:
> > > The ARI next function number field is undefined for VF. The PF should
> > > end the linked list formed with the field by specifying 0.
> > > 
> > > Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
> > > ("[PATCH 0/4] pci: Compare function number and ARI next function number")
> > 
> > Thanks! How was this patch tested?
> 
> I brought VFs up with igb and performed link up for a fresh VM and a VM
> migrated from 8.0.2.

Which guest?

> > 
> > 
> > > V2 -> V3:
> > >Moved the logic to PCI common infrastucture (Michael S. Tsirkin)
> > > 
> > > V1 -> V2:
> > >Fixed migration. (Michael S. Tsirkin)
> > >Added a caveat comment. (Michael S. Tsirkin)
> > > 
> > > Akihiko Odaki (2):
> > >pcie: Use common ARI next function number
> > >pcie: Specify 0 for ARI next function numbers
> > > 
> > >   docs/pcie_sriov.txt   | 4 ++--
> > >   include/hw/pci/pci.h  | 2 ++
> > >   include/hw/pci/pcie.h | 2 +-
> > >   hw/core/machine.c | 1 +
> > >   hw/net/igb.c  | 2 +-
> > >   hw/net/igbvf.c| 2 +-
> > >   hw/nvme/ctrl.c| 2 +-
> > >   hw/pci/pci.c  | 2 ++
> > >   hw/pci/pcie.c | 4 +++-
> > >   9 files changed, 14 insertions(+), 7 deletions(-)
> > > 
> > > -- 
> > > 2.41.0
> > 




Re: [PATCH v3 0/2] pcie: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki

On 2023/07/02 21:43, Michael S. Tsirkin wrote:

On Sun, Jul 02, 2023 at 09:02:25PM +0900, Akihiko Odaki wrote:

The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
("[PATCH 0/4] pci: Compare function number and ARI next function number")


Thanks! How was this patch tested?


I brought VFs up with igb and performed link up for a fresh VM and a VM 
migrated from 8.0.2.






V2 -> V3:
   Moved the logic to PCI common infrastucture (Michael S. Tsirkin)

V1 -> V2:
   Fixed migration. (Michael S. Tsirkin)
   Added a caveat comment. (Michael S. Tsirkin)

Akihiko Odaki (2):
   pcie: Use common ARI next function number
   pcie: Specify 0 for ARI next function numbers

  docs/pcie_sriov.txt   | 4 ++--
  include/hw/pci/pci.h  | 2 ++
  include/hw/pci/pcie.h | 2 +-
  hw/core/machine.c | 1 +
  hw/net/igb.c  | 2 +-
  hw/net/igbvf.c| 2 +-
  hw/nvme/ctrl.c| 2 +-
  hw/pci/pci.c  | 2 ++
  hw/pci/pcie.c | 4 +++-
  9 files changed, 14 insertions(+), 7 deletions(-)

--
2.41.0






Re: [PATCH v2] hw/ide/piix: properly initialize the BMIBA register

2023-07-02 Thread Bernhard Beschow



Am 1. Juli 2023 17:46:59 UTC schrieb Olaf Hering :
>According to the 82371FB documentation (82371FB.pdf, 2.3.9. BMIBA—BUS
>MASTER INTERFACE BASE ADDRESS REGISTER, April 1997), the register is
>32bit wide. To properly reset it to default values, all 32bit need to be
>cleared. Bit #0 "Resource Type Indicator (RTE)" needs to be enabled.
>
>The initial change wrote just the lower 8 bit, leaving parts of the "Bus
>Master Interface Base Address" address at bit 15:4 unchanged.
>
>This bug went unnoticed until commit ee358e919e38 ("hw/ide/piix: Convert
>reset handler to DeviceReset"). After this change, piix_ide_reset is
>exercised after the "unplug" command from a Xen HVM domU, 

Do you know if that command calls pci_device_reset() (which would eventually 
call piix_ide_reset())? Or does it call the DeviceReset handler directly? I'm 
asking because pci_device_reset() would take care of resetting the BMIBA 
register as well as disabling the BAR. IOW I'm wondering if the Xen command 
does the right thing.

Best regards,
Bernhard

>which was not
>the case prior that commit. This function resets the command register.
>As a result the ata_piix driver inside the domU will see a disabled PCI
>device. The generic PCI code will reenable the PCI device. On the qemu
>side, this runs pci_default_write_config/pci_update_mappings. Here a
>changed address is returned by pci_bar_address, this is the address
>which was truncated in piix_ide_reset. In case of a Xen HVM domU, the
>address changes from 0xc120 to 0xc100.
>
>While the unplug is supposed to hide the IDE disks, the changed BMIBA
>address breaks the UHCI device. In case the domU has an USB tablet
>configured, to recive absolute pointer coordinates for the GUI, it will
>cause a hang during device discovery of the partly discovered USB hid
>device. Reading the USBSTS word size register will fail. The access ends
>up in the QEMU piix-bmdma device, instead of the expected uhci device.
>Here a byte size request is expected, and a value of ~0 is returned. As
>a result the UCHI driver sees an error state in the register, and turns
>off the UHCI controller.
>
>Fixes: e6a71ae327 ("Add support for 82371FB (Step A1) and Improved support for 
>82371SB (Function 1)")
>
>Signed-off-by: Olaf Hering 
>---
> hw/ide/piix.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/hw/ide/piix.c b/hw/ide/piix.c
>index 41d60921e3..1e346b1b1d 100644
>--- a/hw/ide/piix.c
>+++ b/hw/ide/piix.c
>@@ -118,7 +118,7 @@ static void piix_ide_reset(DeviceState *dev)
> pci_set_word(pci_conf + PCI_COMMAND, 0x);
> pci_set_word(pci_conf + PCI_STATUS,
>  PCI_STATUS_DEVSEL_MEDIUM | PCI_STATUS_FAST_BACK);
>-pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
>+pci_set_long(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
> }
> 
> static bool pci_piix_init_bus(PCIIDEState *d, unsigned i, Error **errp)
>
>base-commit: d145c0da22cde391d8c6672d33146ce306e8bf75
>



Re: [PATCH v2 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 08:19:43PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 19:40, Michael S. Tsirkin wrote:
> > On Sun, Jul 02, 2023 at 06:46:25PM +0900, Akihiko Odaki wrote:
> > > The ARI next function number field is undefined for VF so the PF should
> > > end the linked list formed with the field by specifying 0.
> > > 
> > > This also changes the value of the field for VF; it seems to imply the
> > > value has some meaning if it differs from one of the PF, but it doesn't.
> > > 
> > > Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
> > > docs/pcie_sriov.txt")
> > > Signed-off-by: Akihiko Odaki 
> > > ---
> > >   docs/pcie_sriov.txt | 4 ++--
> > >   1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> > > index 7eff7f2703..2b7094dc47 100644
> > > --- a/docs/pcie_sriov.txt
> > > +++ b/docs/pcie_sriov.txt
> > > @@ -48,7 +48,7 @@ setting up a BAR for a VF.
> > > ...
> > > int ret = pcie_endpoint_cap_init(d, 0x70);
> > > ...
> > > -  pcie_ari_init(d, 0x100, 1);
> > > +  pcie_ari_init(d, 0x100, 0);
> > > ...
> > > /* Add and initialize the SR/IOV capability */
> > > @@ -78,7 +78,7 @@ setting up a BAR for a VF.
> > > ...
> > > int ret = pcie_endpoint_cap_init(d, 0x60);
> > > ...
> > > -  pcie_ari_init(d, 0x100, 1);
> > > +  pcie_ari_init(d, 0x100, 0);
> > > ...
> > > memory_region_init(mr, ... )
> > > pcie_sriov_vf_register_bar(d, bar_nr, mr);
> > 
> > 
> > So now code does not match docs.
> 
> Can you elaborate more?

your new revision addresses this.

-- 
MST




Re: [PATCH v3 0/2] pcie: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 09:02:25PM +0900, Akihiko Odaki wrote:
> The ARI next function number field is undefined for VF. The PF should
> end the linked list formed with the field by specifying 0.
> 
> Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
> ("[PATCH 0/4] pci: Compare function number and ARI next function number")

Thanks! How was this patch tested?


> V2 -> V3:
>   Moved the logic to PCI common infrastucture (Michael S. Tsirkin)
> 
> V1 -> V2:
>   Fixed migration. (Michael S. Tsirkin)
>   Added a caveat comment. (Michael S. Tsirkin)
> 
> Akihiko Odaki (2):
>   pcie: Use common ARI next function number
>   pcie: Specify 0 for ARI next function numbers
> 
>  docs/pcie_sriov.txt   | 4 ++--
>  include/hw/pci/pci.h  | 2 ++
>  include/hw/pci/pcie.h | 2 +-
>  hw/core/machine.c | 1 +
>  hw/net/igb.c  | 2 +-
>  hw/net/igbvf.c| 2 +-
>  hw/nvme/ctrl.c| 2 +-
>  hw/pci/pci.c  | 2 ++
>  hw/pci/pcie.c | 4 +++-
>  9 files changed, 14 insertions(+), 7 deletions(-)
> 
> -- 
> 2.41.0




[PATCH v3 1/2] pcie: Use common ARI next function number

2023-07-02 Thread Akihiko Odaki
Currently the only implementers of ARI is SR-IOV devices, and they
behave similar. Share the ARI next function number.

Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt   | 4 ++--
 include/hw/pci/pcie.h | 2 +-
 hw/net/igb.c  | 2 +-
 hw/net/igbvf.c| 2 +-
 hw/nvme/ctrl.c| 2 +-
 hw/pci/pcie.c | 4 +++-
 6 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..a47aad0bfa 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x70);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100);
   ...
 
   /* Add and initialize the SR/IOV capability */
@@ -78,7 +78,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x60);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100);
   ...
   memory_region_init(mr, ... )
   pcie_sriov_vf_register_bar(d, bar_nr, mr);
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 3cc2b15957..bf7dc5d685 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -134,7 +134,7 @@ void pcie_sync_bridge_lnk(PCIDevice *dev);
 void pcie_acs_init(PCIDevice *dev, uint16_t offset);
 void pcie_acs_reset(PCIDevice *dev);
 
-void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+void pcie_ari_init(PCIDevice *dev, uint16_t offset);
 void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 void pcie_ats_init(PCIDevice *dev, uint16_t offset, bool aligned);
 
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..8ff832acfc 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -431,7 +431,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(pci_dev, 0x150, 1);
+pcie_ari_init(pci_dev, 0x150);
 
 pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
index 284ea61184..d55e1e8a6a 100644
--- a/hw/net/igbvf.c
+++ b/hw/net/igbvf.c
@@ -270,7 +270,7 @@ static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(dev, 0x150, 1);
+pcie_ari_init(dev, 0x150);
 }
 
 static void igbvf_pci_uninit(PCIDevice *dev)
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index fd917fcda1..8b7168a266 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8088,7 +8088,7 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
-pcie_ari_init(pci_dev, 0x100, 1);
+pcie_ari_init(pci_dev, 0x100);
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index b8c24cf45f..9a3f6430e8 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1028,8 +1028,10 @@ void pcie_sync_bridge_lnk(PCIDevice *bridge_dev)
  */
 
 /* ARI */
-void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn)
+void pcie_ari_init(PCIDevice *dev, uint16_t offset)
 {
+uint16_t nextfn = 1;
+
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ARI, PCI_ARI_VER,
 offset, PCI_ARI_SIZEOF);
 pci_set_long(dev->config + offset + PCI_ARI_CAP, (nextfn & 0xff) << 8);
-- 
2.41.0




[PATCH v3 2/2] pcie: Specify 0 for ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The current implementers of ARI are all SR-IOV devices. The ARI next
function number field is undefined for VF. The PF should end the linked
list formed with the field by specifying 0.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pci.h | 2 ++
 hw/core/machine.c| 1 +
 hw/pci/pci.c | 2 ++
 hw/pci/pcie.c| 2 +-
 4 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index e6d0574a29..9c5b5eb206 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -209,6 +209,8 @@ enum {
 QEMU_PCIE_CAP_CXL = (1 << QEMU_PCIE_CXL_BITNR),
 #define QEMU_PCIE_ERR_UNC_MASK_BITNR 11
 QEMU_PCIE_ERR_UNC_MASK = (1 << QEMU_PCIE_ERR_UNC_MASK_BITNR),
+#define QEMU_PCIE_ARI_NEXTFN_1_BITNR 12
+QEMU_PCIE_ARI_NEXTFN_1 = (1 << QEMU_PCIE_ARI_NEXTFN_1_BITNR),
 };
 
 typedef struct PCIINTxRoute {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 46f8f9a2b0..f0d35c6401 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,6 +41,7 @@
 
 GlobalProperty hw_compat_8_0[] = {
 { "migration", "multifd-flush-after-each-section", "on"},
+{ TYPE_PCI_DEVICE, "x-pcie-ari-nextfn-1", "on" },
 };
 const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
 
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..f2ce1dbfd0 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -82,6 +82,8 @@ static Property pci_props[] = {
 DEFINE_PROP_UINT32("acpi-index",  PCIDevice, acpi_index, 0),
 DEFINE_PROP_BIT("x-pcie-err-unc-mask", PCIDevice, cap_present,
 QEMU_PCIE_ERR_UNC_MASK_BITNR, true),
+DEFINE_PROP_BIT("x-pcie-ari-nextfn-1", PCIDevice, cap_present,
+QEMU_PCIE_ARI_NEXTFN_1_BITNR, true),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 9a3f6430e8..cf09e03a10 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1030,7 +1030,7 @@ void pcie_sync_bridge_lnk(PCIDevice *bridge_dev)
 /* ARI */
 void pcie_ari_init(PCIDevice *dev, uint16_t offset)
 {
-uint16_t nextfn = 1;
+uint16_t nextfn = dev->cap_present & QEMU_PCIE_ARI_NEXTFN_1 ? 1 : 0;
 
 pcie_add_capability(dev, PCI_EXT_CAP_ID_ARI, PCI_ARI_VER,
 offset, PCI_ARI_SIZEOF);
-- 
2.41.0




[PATCH v3 0/2] pcie: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
("[PATCH 0/4] pci: Compare function number and ARI next function number")

V2 -> V3:
  Moved the logic to PCI common infrastucture (Michael S. Tsirkin)

V1 -> V2:
  Fixed migration. (Michael S. Tsirkin)
  Added a caveat comment. (Michael S. Tsirkin)

Akihiko Odaki (2):
  pcie: Use common ARI next function number
  pcie: Specify 0 for ARI next function numbers

 docs/pcie_sriov.txt   | 4 ++--
 include/hw/pci/pci.h  | 2 ++
 include/hw/pci/pcie.h | 2 +-
 hw/core/machine.c | 1 +
 hw/net/igb.c  | 2 +-
 hw/net/igbvf.c| 2 +-
 hw/nvme/ctrl.c| 2 +-
 hw/pci/pci.c  | 2 ++
 hw/pci/pcie.c | 4 +++-
 9 files changed, 14 insertions(+), 7 deletions(-)

-- 
2.41.0




Re: [PATCH v2 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-02 Thread Akihiko Odaki

On 2023/07/02 19:40, Michael S. Tsirkin wrote:

On Sun, Jul 02, 2023 at 06:46:25PM +0900, Akihiko Odaki wrote:

The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

This also changes the value of the field for VF; it seems to imply the
value has some meaning if it differs from one of the PF, but it doesn't.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Signed-off-by: Akihiko Odaki 
---
  docs/pcie_sriov.txt | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..2b7094dc47 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
...
int ret = pcie_endpoint_cap_init(d, 0x70);
...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
...
  
/* Add and initialize the SR/IOV capability */

@@ -78,7 +78,7 @@ setting up a BAR for a VF.
...
int ret = pcie_endpoint_cap_init(d, 0x60);
...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
...
memory_region_init(mr, ... )
pcie_sriov_vf_register_bar(d, bar_nr, mr);



So now code does not match docs.


Can you elaborate more?



Re: [PATCH v2 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 06:46:25PM +0900, Akihiko Odaki wrote:
> The ARI next function number field is undefined for VF so the PF should
> end the linked list formed with the field by specifying 0.
> 
> This also changes the value of the field for VF; it seems to imply the
> value has some meaning if it differs from one of the PF, but it doesn't.
> 
> Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
> docs/pcie_sriov.txt")
> Signed-off-by: Akihiko Odaki 
> ---
>  docs/pcie_sriov.txt | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
> index 7eff7f2703..2b7094dc47 100644
> --- a/docs/pcie_sriov.txt
> +++ b/docs/pcie_sriov.txt
> @@ -48,7 +48,7 @@ setting up a BAR for a VF.
>...
>int ret = pcie_endpoint_cap_init(d, 0x70);
>...
> -  pcie_ari_init(d, 0x100, 1);
> +  pcie_ari_init(d, 0x100, 0);
>...
>  
>/* Add and initialize the SR/IOV capability */
> @@ -78,7 +78,7 @@ setting up a BAR for a VF.
>...
>int ret = pcie_endpoint_cap_init(d, 0x60);
>...
> -  pcie_ari_init(d, 0x100, 1);
> +  pcie_ari_init(d, 0x100, 0);
>...
>memory_region_init(mr, ... )
>pcie_sriov_vf_register_bar(d, bar_nr, mr);


So now code does not match docs.


> -- 
> 2.41.0




Re: [PATCH 3/3] igb: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 06:49:50PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 18:00, Michael S. Tsirkin wrote:
> > On Sun, Jul 02, 2023 at 05:33:56PM +0900, Akihiko Odaki wrote:
> > > The ARI next function number field is undefined for VF so the PF should
> > > end the linked list formed with the field by specifying 0.
> > > 
> > > Fixes: 3a977deebe ("Intrdocue igb device emulation")
> > > Signed-off-by: Akihiko Odaki 
> > 
> > 
> > I would also change it for the VF just so people don't wonder
> > what's the magic value. Do document in commit log though.
> > 
> > Maybe just drop this parameter from pcie_ari_init completely
> > for now?
> 
> I sent v2, but it doesn't change the field for VFs either to save code for
> migration. The parameter for pcie_ari_init() also remains to migrate from
> older versions.

For migration, stick the boolean in PCIDevice::cap_present. Use 
QEMU_PCIE_ERR_UNC_MASK_BITNR as
a template.


> > 
> > 
> > > ---
> > >   hw/net/igb.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/net/igb.c b/hw/net/igb.c
> > > index 1c989d7677..897386fc09 100644
> > > --- a/hw/net/igb.c
> > > +++ b/hw/net/igb.c
> > > @@ -431,7 +431,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
> > > **errp)
> > >   hw_error("Failed to initialize AER capability");
> > >   }
> > > -pcie_ari_init(pci_dev, 0x150, 1);
> > > +pcie_ari_init(pci_dev, 0x150, 0);
> > >   pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
> > >   IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> > > -- 
> > > 2.41.0
> > 




Re: [PATCH v2 3/4] igb: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 06:46:27PM +0900, Akihiko Odaki wrote:
> The ARI next function number field is undefined for VF so the PF should
> end the linked list formed with the field by specifying 0.
> 
> Fixes: 3a977deebe ("Intrdocue igb device emulation")
> Signed-off-by: Akihiko Odaki 
> ---
>  hw/core/machine.c | 3 ++-
>  hw/net/igb.c  | 5 -
>  2 files changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/core/machine.c b/hw/core/machine.c
> index f984a767a2..1f5aacd1dc 100644
> --- a/hw/core/machine.c
> +++ b/hw/core/machine.c
> @@ -41,7 +41,8 @@
>  
>  GlobalProperty hw_compat_8_0[] = {
>  { "migration", "multifd-flush-after-each-section", "on"},
> -{ "nvme", "ari-nextfn-1", "on"},
> +{ "igb", "ari-nextfn-1", "on" },
> +{ "nvme", "ari-nextfn-1", "on" },
>  };
>  const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
>  
> diff --git a/hw/net/igb.c b/hw/net/igb.c
> index 1c989d7677..d37d43c155 100644
> --- a/hw/net/igb.c
> +++ b/hw/net/igb.c
> @@ -78,6 +78,8 @@ struct IGBState {
>  uint32_t ioaddr;
>  
>  IGBCore core;
> +
> +bool ari_nextfn_1;

Document this field please, explaining why it's there.

>  };
>  
>  #define IGB_CAP_SRIOV_OFFSET(0x160)
> @@ -431,7 +433,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
> **errp)
>  hw_error("Failed to initialize AER capability");
>  }
>  
> -pcie_ari_init(pci_dev, 0x150, 1);
> +pcie_ari_init(pci_dev, 0x150, s->ari_nextfn_1 ? 1 : 0);

Why don't we move the logic to pci core, and drop code duplication
completely?

>  
>  pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
>  IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> @@ -582,6 +584,7 @@ static const VMStateDescription igb_vmstate = {
>  
>  static Property igb_properties[] = {
>  DEFINE_NIC_PROPERTIES(IGBState, conf),
> +DEFINE_PROP_BOOL("ari-nextfn-1", IGBState, ari_nextfn_1, false),
>  DEFINE_PROP_END_OF_LIST(),
>  };

This really needs to be a non stable property (prefix with "x-").



> -- 
> 2.41.0




Re: [PATCH 3/3] igb: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki

On 2023/07/02 18:00, Michael S. Tsirkin wrote:

On Sun, Jul 02, 2023 at 05:33:56PM +0900, Akihiko Odaki wrote:

The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 



I would also change it for the VF just so people don't wonder
what's the magic value. Do document in commit log though.

Maybe just drop this parameter from pcie_ari_init completely
for now?


I sent v2, but it doesn't change the field for VFs either to save code 
for migration. The parameter for pcie_ari_init() also remains to migrate 
from older versions.






---
  hw/net/igb.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..897386fc09 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -431,7 +431,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
  hw_error("Failed to initialize AER capability");
  }
  
-pcie_ari_init(pci_dev, 0x150, 1);

+pcie_ari_init(pci_dev, 0x150, 0);
  
  pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,

  IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
--
2.41.0






[PATCH v2 2/4] hw/nvme: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Signed-off-by: Akihiko Odaki 
---
 hw/nvme/nvme.h| 1 +
 hw/core/machine.c | 1 +
 hw/nvme/ctrl.c| 3 ++-
 3 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/nvme/nvme.h b/hw/nvme/nvme.h
index 209e8f5b4c..c2ba6755ab 100644
--- a/hw/nvme/nvme.h
+++ b/hw/nvme/nvme.h
@@ -515,6 +515,7 @@ typedef struct NvmeParams {
 uint16_t sriov_vi_flexible;
 uint8_t  sriov_max_vq_per_vf;
 uint8_t  sriov_max_vi_per_vf;
+bool ari_nextfn_1;
 } NvmeParams;
 
 typedef struct NvmeCtrl {
diff --git a/hw/core/machine.c b/hw/core/machine.c
index 46f8f9a2b0..f984a767a2 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,6 +41,7 @@
 
 GlobalProperty hw_compat_8_0[] = {
 { "migration", "multifd-flush-after-each-section", "on"},
+{ "nvme", "ari-nextfn-1", "on"},
 };
 const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
 
diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index fd917fcda1..4bbafc66b5 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8088,7 +8088,7 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
-pcie_ari_init(pci_dev, 0x100, 1);
+pcie_ari_init(pci_dev, 0x100, n->params.ari_nextfn_1 ? 1 : 0);
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
@@ -8406,6 +8406,7 @@ static Property nvme_props[] = {
   params.sriov_max_vi_per_vf, 0),
 DEFINE_PROP_UINT8("sriov_max_vq_per_vf", NvmeCtrl,
   params.sriov_max_vq_per_vf, 0),
+DEFINE_PROP_BOOL("ari-nextfn-1", NvmeCtrl, params.ari_nextfn_1, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.41.0




[PATCH v2 3/4] igb: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/core/machine.c | 3 ++-
 hw/net/igb.c  | 5 -
 2 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index f984a767a2..1f5aacd1dc 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,7 +41,8 @@
 
 GlobalProperty hw_compat_8_0[] = {
 { "migration", "multifd-flush-after-each-section", "on"},
-{ "nvme", "ari-nextfn-1", "on"},
+{ "igb", "ari-nextfn-1", "on" },
+{ "nvme", "ari-nextfn-1", "on" },
 };
 const size_t hw_compat_8_0_len = G_N_ELEMENTS(hw_compat_8_0);
 
diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..d37d43c155 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -78,6 +78,8 @@ struct IGBState {
 uint32_t ioaddr;
 
 IGBCore core;
+
+bool ari_nextfn_1;
 };
 
 #define IGB_CAP_SRIOV_OFFSET(0x160)
@@ -431,7 +433,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(pci_dev, 0x150, 1);
+pcie_ari_init(pci_dev, 0x150, s->ari_nextfn_1 ? 1 : 0);
 
 pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
@@ -582,6 +584,7 @@ static const VMStateDescription igb_vmstate = {
 
 static Property igb_properties[] = {
 DEFINE_NIC_PROPERTIES(IGBState, conf),
+DEFINE_PROP_BOOL("ari-nextfn-1", IGBState, ari_nextfn_1, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.41.0




[PATCH v2 4/4] pcie: Note a caveat regarding ARI next function number

2023-07-02 Thread Akihiko Odaki
Suggested-by: Michael S. Tsirkin 
Signed-off-by: Akihiko Odaki 
---
 include/hw/pci/pcie.h | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index 3cc2b15957..16860e2216 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -134,7 +134,14 @@ void pcie_sync_bridge_lnk(PCIDevice *dev);
 void pcie_acs_init(PCIDevice *dev, uint16_t offset);
 void pcie_acs_reset(PCIDevice *dev);
 
+/*
+ * Note: for non-VFs, nextfn must be the Function Number of the next higher
+ * numbered Function in the Device, or 00h if there are no higher numbered
+ * Functions.
+ * TODO: validate this.
+ */
 void pcie_ari_init(PCIDevice *dev, uint16_t offset, uint16_t nextfn);
+
 void pcie_dev_ser_num_init(PCIDevice *dev, uint16_t offset, uint64_t ser_num);
 void pcie_ats_init(PCIDevice *dev, uint16_t offset, bool aligned);
 
-- 
2.41.0




[PATCH v2 0/4] pcie: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
("[PATCH 0/4] pci: Compare function number and ARI next function number")

V1 -> V2:
  Fixed migration. (Michael S. Tsirkin)
  Added a caveat comment. (Michael S. Tsirkin)

Akihiko Odaki (4):
  docs: Fix next function numbers in SR/IOV documentation
  hw/nvme: Fix ARI next function numbers
  igb: Fix ARI next function numbers
  pcie: Note a caveat regarding ARI next function number

 docs/pcie_sriov.txt   | 4 ++--
 hw/nvme/nvme.h| 1 +
 include/hw/pci/pcie.h | 7 +++
 hw/core/machine.c | 2 ++
 hw/net/igb.c  | 5 -
 hw/nvme/ctrl.c| 3 ++-
 6 files changed, 18 insertions(+), 4 deletions(-)

-- 
2.41.0




[PATCH v2 1/4] docs: Fix next function numbers in SR/IOV documentation

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

This also changes the value of the field for VF; it seems to imply the
value has some meaning if it differs from one of the PF, but it doesn't.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..2b7094dc47 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x70);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
   ...
 
   /* Add and initialize the SR/IOV capability */
@@ -78,7 +78,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x60);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
   ...
   memory_region_init(mr, ... )
   pcie_sriov_vf_register_bar(d, bar_nr, mr);
-- 
2.41.0




Re: [PATCH 3/3] igb: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 05:33:56PM +0900, Akihiko Odaki wrote:
> The ARI next function number field is undefined for VF so the PF should
> end the linked list formed with the field by specifying 0.
> 
> Fixes: 3a977deebe ("Intrdocue igb device emulation")
> Signed-off-by: Akihiko Odaki 


I would also change it for the VF just so people don't wonder
what's the magic value. Do document in commit log though.

Maybe just drop this parameter from pcie_ari_init completely
for now?


> ---
>  hw/net/igb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/net/igb.c b/hw/net/igb.c
> index 1c989d7677..897386fc09 100644
> --- a/hw/net/igb.c
> +++ b/hw/net/igb.c
> @@ -431,7 +431,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
> **errp)
>  hw_error("Failed to initialize AER capability");
>  }
>  
> -pcie_ari_init(pci_dev, 0x150, 1);
> +pcie_ari_init(pci_dev, 0x150, 0);
>  
>  pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
>  IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> -- 
> 2.41.0




Re: [PATCH 4/4] pci: Compare function number and ARI next function number

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 04:55:48AM -0400, Michael S. Tsirkin wrote:
> On Sun, Jul 02, 2023 at 05:46:38PM +0900, Akihiko Odaki wrote:
> > On 2023/07/02 13:58, Michael S. Tsirkin wrote:
> > > On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> > > > The function number must be lower than the next function number
> > > > advertised with ARI.
> > > > 
> > > > Signed-off-by: Akihiko Odaki 
> > > 
> > > I don't get this logic at all - where is the limitation coming from?
> > > 
> > > All I see in the spec is:
> > >   Next Function Number - With non-VFs, this field indicates the Function 
> > > Number of the next higher
> > >   numbered Function in the Device, or 00h if there are no higher numbered 
> > > Functions. Function 0 starts
> > >   this linked list of Functions.
> > >   The presence of Shadow Functions does not affect this field.
> > >   For VFs, this field is undefined since VFs are located using First VF 
> > > Offset (see § Section 9.3.3.9 ) and VF
> > >   Stride (see § Section 9.3.3.10 ).
> > > 
> > > and
> > > 
> > >To improve the enumeration performance and create a more deterministic 
> > > solution, software can
> > >   enumerate Functions through a linked list of Function Numbers. The next 
> > > linked list element is
> > >   communicated through each Function’s ARI Capability Register.
> > >   i. Function 0 acts as the head of a linked list of Function Numbers. 
> > > Software detects a
> > >   non-Zero Next Function Number field within the ARI Capability Register 
> > > as the next
> > >   Function within the linked list. Software issues a configuration probe 
> > > using the Bus Number
> > >   captured by the Device and the Function Number derived from the ARI 
> > > Capability Register
> > >   to locate the next associated Function’s configuration space.
> > >   ii. Function Numbers may be sparse and non-sequential in their 
> > > consumption by an ARI
> > >   Device.
> > 
> > The statement "With non-VFs, this field indicates the Function Number of the
> > next higher numbered Function in the Device, or 00h if there are no higher
> > numbered Functions." implies the Function Number of the device should be
> > lower than the value advertised by the field (for non-VFs; this patch does
> > not check if it's VF or not.)
> 
> 
> Now I get it. Good point! I'd say if we want this check we should add
> it in pcie_ari_init, making that return int.
> But for now it's dead code since your are changing it to 0.
> So maybe a comment in pcie_ari_init is enough:
> 
> /*
>  * Note: nextfn must be the Function Number of the
>  * next higher numbered Function in the Device, or 00h if there are no higher
>  * numbered Functions.
>  * TODO: validate this.
>  */

Or add an assert, and
TODO: in case this can ever come from command line, we'll have
to replace the assert below with a runtime check.


> > > 
> > > 
> > > 
> > > 
> > > 
> > > > ---
> > > >   hw/pci/pci.c | 15 +++
> > > >   1 file changed, 15 insertions(+)
> > > > 
> > > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > > index e2eb4c3b4a..568665ee42 100644
> > > > --- a/hw/pci/pci.c
> > > > +++ b/hw/pci/pci.c
> > > > @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, 
> > > > Error **errp)
> > > >   Error *local_err = NULL;
> > > >   bool is_default_rom;
> > > >   uint16_t class_id;
> > > > +uint16_t ari;
> > > > +uint16_t nextfn;
> > > >   /*
> > > >* capped by systemd (see: udev-builtin-net_id.c)
> > > > @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, 
> > > > Error **errp)
> > > >   }
> > > >   }
> > > > +if (pci_is_express(pci_dev)) {
> > > > +ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> > > > +if (ari) {
> > > > +nextfn = (pci_get_long(pci_dev->config + ari + 
> > > > PCI_ARI_CAP) >> 8) & 0xff;
> > > > +if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> > > > +error_setg(errp, "PCI: function number %u is not lower 
> > > > than ARI next function number %u",
> > > > +   pci_dev->devfn & 0xff, nextfn);
> > > > +pci_qdev_unrealize(DEVICE(pci_dev));
> > > > +return;
> > > > +}
> > > > +}
> > > > +}
> > > > +
> > > >   if (pci_dev->failover_pair_id) {
> > > >   if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> > > >   error_setg(errp, "failover primary device must be on "
> > > > -- 
> > > > 2.41.0
> > > 




Re: [PATCH 4/4] pci: Compare function number and ARI next function number

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 05:46:38PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 13:58, Michael S. Tsirkin wrote:
> > On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:
> > > The function number must be lower than the next function number
> > > advertised with ARI.
> > > 
> > > Signed-off-by: Akihiko Odaki 
> > 
> > I don't get this logic at all - where is the limitation coming from?
> > 
> > All I see in the spec is:
> > Next Function Number - With non-VFs, this field indicates the Function 
> > Number of the next higher
> > numbered Function in the Device, or 00h if there are no higher numbered 
> > Functions. Function 0 starts
> > this linked list of Functions.
> > The presence of Shadow Functions does not affect this field.
> > For VFs, this field is undefined since VFs are located using First VF 
> > Offset (see § Section 9.3.3.9 ) and VF
> > Stride (see § Section 9.3.3.10 ).
> > 
> > and
> > 
> >  To improve the enumeration performance and create a more deterministic 
> > solution, software can
> > enumerate Functions through a linked list of Function Numbers. The next 
> > linked list element is
> > communicated through each Function’s ARI Capability Register.
> > i. Function 0 acts as the head of a linked list of Function Numbers. 
> > Software detects a
> > non-Zero Next Function Number field within the ARI Capability Register 
> > as the next
> > Function within the linked list. Software issues a configuration probe 
> > using the Bus Number
> > captured by the Device and the Function Number derived from the ARI 
> > Capability Register
> > to locate the next associated Function’s configuration space.
> > ii. Function Numbers may be sparse and non-sequential in their 
> > consumption by an ARI
> > Device.
> 
> The statement "With non-VFs, this field indicates the Function Number of the
> next higher numbered Function in the Device, or 00h if there are no higher
> numbered Functions." implies the Function Number of the device should be
> lower than the value advertised by the field (for non-VFs; this patch does
> not check if it's VF or not.)


Now I get it. Good point! I'd say if we want this check we should add
it in pcie_ari_init, making that return int.
But for now it's dead code since your are changing it to 0.
So maybe a comment in pcie_ari_init is enough:

/*
 * Note: nextfn must be the Function Number of the
 * next higher numbered Function in the Device, or 00h if there are no higher
 * numbered Functions.
 * TODO: validate this.
 */

> > 
> > 
> > 
> > 
> > 
> > > ---
> > >   hw/pci/pci.c | 15 +++
> > >   1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> > > index e2eb4c3b4a..568665ee42 100644
> > > --- a/hw/pci/pci.c
> > > +++ b/hw/pci/pci.c
> > > @@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, 
> > > Error **errp)
> > >   Error *local_err = NULL;
> > >   bool is_default_rom;
> > >   uint16_t class_id;
> > > +uint16_t ari;
> > > +uint16_t nextfn;
> > >   /*
> > >* capped by systemd (see: udev-builtin-net_id.c)
> > > @@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, 
> > > Error **errp)
> > >   }
> > >   }
> > > +if (pci_is_express(pci_dev)) {
> > > +ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
> > > +if (ari) {
> > > +nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) 
> > > >> 8) & 0xff;
> > > +if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
> > > +error_setg(errp, "PCI: function number %u is not lower 
> > > than ARI next function number %u",
> > > +   pci_dev->devfn & 0xff, nextfn);
> > > +pci_qdev_unrealize(DEVICE(pci_dev));
> > > +return;
> > > +}
> > > +}
> > > +}
> > > +
> > >   if (pci_dev->failover_pair_id) {
> > >   if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
> > >   error_setg(errp, "failover primary device must be on "
> > > -- 
> > > 2.41.0
> > 




Re: [PATCH 4/4] pci: Compare function number and ARI next function number

2023-07-02 Thread Akihiko Odaki

On 2023/07/02 13:58, Michael S. Tsirkin wrote:

On Sat, Jul 01, 2023 at 04:01:22PM +0900, Akihiko Odaki wrote:

The function number must be lower than the next function number
advertised with ARI.

Signed-off-by: Akihiko Odaki 


I don't get this logic at all - where is the limitation coming from?

All I see in the spec is:
Next Function Number - With non-VFs, this field indicates the Function 
Number of the next higher
numbered Function in the Device, or 00h if there are no higher numbered 
Functions. Function 0 starts
this linked list of Functions.
The presence of Shadow Functions does not affect this field.
For VFs, this field is undefined since VFs are located using First VF 
Offset (see § Section 9.3.3.9 ) and VF
Stride (see § Section 9.3.3.10 ).

and

 To improve the enumeration performance and create a more deterministic 
solution, software can
enumerate Functions through a linked list of Function Numbers. The next 
linked list element is
communicated through each Function’s ARI Capability Register.
i. Function 0 acts as the head of a linked list of Function Numbers. 
Software detects a
non-Zero Next Function Number field within the ARI Capability Register 
as the next
Function within the linked list. Software issues a configuration probe 
using the Bus Number
captured by the Device and the Function Number derived from the ARI 
Capability Register
to locate the next associated Function’s configuration space.
ii. Function Numbers may be sparse and non-sequential in their 
consumption by an ARI
Device.


The statement "With non-VFs, this field indicates the Function Number of 
the next higher numbered Function in the Device, or 00h if there are no 
higher numbered Functions." implies the Function Number of the device 
should be lower than the value advertised by the field (for non-VFs; 
this patch does not check if it's VF or not.)









---
  hw/pci/pci.c | 15 +++
  1 file changed, 15 insertions(+)

diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index e2eb4c3b4a..568665ee42 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -2059,6 +2059,8 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
  Error *local_err = NULL;
  bool is_default_rom;
  uint16_t class_id;
+uint16_t ari;
+uint16_t nextfn;
  
  /*

   * capped by systemd (see: udev-builtin-net_id.c)
@@ -2121,6 +2123,19 @@ static void pci_qdev_realize(DeviceState *qdev, Error 
**errp)
  }
  }
  
+if (pci_is_express(pci_dev)) {

+ari = pcie_find_capability(pci_dev, PCI_EXT_CAP_ID_ARI);
+if (ari) {
+nextfn = (pci_get_long(pci_dev->config + ari + PCI_ARI_CAP) >> 8) 
& 0xff;
+if (nextfn && (pci_dev->devfn & 0xff) >= nextfn) {
+error_setg(errp, "PCI: function number %u is not lower than ARI 
next function number %u",
+   pci_dev->devfn & 0xff, nextfn);
+pci_qdev_unrealize(DEVICE(pci_dev));
+return;
+}
+}
+}
+
  if (pci_dev->failover_pair_id) {
  if (!pci_bus_is_express(pci_get_bus(pci_dev))) {
  error_setg(errp, "failover primary device must be on "
--
2.41.0






Re: [PATCH 0/3] pci: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 05:33:53PM +0900, Akihiko Odaki wrote:
> The ARI next function number field is undefined for VF. The PF should
> end the linked list formed with the field by specifying 0.
> 
> Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
> ("[PATCH 0/4] pci: Compare function number and ARI next function number")

changelog?

motivation?

effect on migration?

testing?


> Akihiko Odaki (3):
>   docs: Fix next function numbers in SR/IOV documentation
>   hw/nvme: Fix ARI next function numbers
>   igb: Fix ARI next function numbers
> 
>  docs/pcie_sriov.txt | 4 ++--
>  hw/net/igb.c| 2 +-
>  hw/nvme/ctrl.c  | 2 +-
>  3 files changed, 4 insertions(+), 4 deletions(-)
> 
> -- 
> 2.41.0




Re: [PATCH 3/4] igb: Fix ARI next function numbers

2023-07-02 Thread Michael S. Tsirkin
On Sun, Jul 02, 2023 at 05:38:42PM +0900, Akihiko Odaki wrote:
> On 2023/07/02 14:05, Michael S. Tsirkin wrote:
> > On Sat, Jul 01, 2023 at 04:01:21PM +0900, Akihiko Odaki wrote:
> > > The next function numbers are expected to form a linked list ending with
> > > 0.
> > > 
> > > Fixes: 3a977deebe ("Intrdocue igb device emulation")
> > > Signed-off-by: Akihiko Odaki 
> > > ---
> > >   hw/net/igb_core.h | 3 +++
> > >   hw/net/igb.c  | 4 +---
> > >   hw/net/igbvf.c| 5 -
> > >   3 files changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/hw/net/igb_core.h b/hw/net/igb_core.h
> > > index 9cbbfd516b..e1dab76995 100644
> > > --- a/hw/net/igb_core.h
> > > +++ b/hw/net/igb_core.h
> > > @@ -49,6 +49,9 @@
> > >   #define IGB_NUM_QUEUES  (16)
> > >   #define IGB_NUM_VM_POOLS(8)
> > > +#define IGB_VF_OFFSET   (0x80)
> > > +#define IGB_VF_STRIDE   (2)
> > > +
> > >   typedef struct IGBCore IGBCore;
> > >   enum { PHY_R = BIT(0),
> > > diff --git a/hw/net/igb.c b/hw/net/igb.c
> > > index 1c989d7677..543ca0114a 100644
> > > --- a/hw/net/igb.c
> > > +++ b/hw/net/igb.c
> > > @@ -81,8 +81,6 @@ struct IGBState {
> > >   };
> > >   #define IGB_CAP_SRIOV_OFFSET(0x160)
> > > -#define IGB_VF_OFFSET   (0x80)
> > > -#define IGB_VF_STRIDE   (2)
> > >   #define E1000E_MMIO_IDX 0
> > >   #define E1000E_FLASH_IDX1
> > > @@ -431,7 +429,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
> > > **errp)
> > >   hw_error("Failed to initialize AER capability");
> > >   }
> > > -pcie_ari_init(pci_dev, 0x150, 1);
> > > +pcie_ari_init(pci_dev, 0x150, IGB_VF_OFFSET);
> > >   pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
> > >   IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
> > 
> > 
> > I think this change would break migrations from 8.0. No?
> 
> Well, I don't have a reason to concern more for this change than other bug
> fixes with behavioral changes observable from the guest.

Try it and see it fail.


> > 
> > 
> > More importantly your commit log says linked list should end
> > with 0, but you make it point at a VF instead.
> > 
> > 
> > > diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
> > > index 284ea61184..bf2f237ab5 100644
> > > --- a/hw/net/igbvf.c
> > > +++ b/hw/net/igbvf.c
> > > @@ -240,6 +240,9 @@ static const MemoryRegionOps mmio_ops = {
> > >   static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
> > >   {
> > >   IgbVfState *s = IGBVF(dev);
> > > +uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
> > > +uint16_t nextfn = nextvfn < IGB_MAX_VF_FUNCTIONS ?
> > > +  IGB_VF_OFFSET + nextvfn * IGB_VF_STRIDE : 0;
> > >   int ret;
> > >   int i;
> > > @@ -270,7 +273,7 @@ static void igbvf_pci_realize(PCIDevice *dev, Error 
> > > **errp)
> > >   hw_error("Failed to initialize AER capability");
> > >   }
> > > -pcie_ari_init(dev, 0x150, 1);
> > > +pcie_ari_init(dev, 0x150, nextfn);
> > 
> > 
> > 
> > For this one I don't see why it matters at all:
> > 
> > The presence of Shadow Functions does not affect this field.
> > For VFs, this field is undefined since VFs are located using First VF 
> > Offset (see § Section 9.3.3.9 ) and VF
> > Stride (see § Section 9.3.3.10 ).
> 
> I missed the statements saying the field is undefined for VFs. I posted an
> alternative series ("[PATCH 0/3] pci: Fix ARI next function numbers") so
> please review it.
> 
> > 
> > 
> > 
> > 
> > >   }
> > >   static void igbvf_pci_uninit(PCIDevice *dev)
> > > -- 
> > > 2.41.0
> > 




Re: [PATCH 3/4] igb: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki

On 2023/07/02 14:05, Michael S. Tsirkin wrote:

On Sat, Jul 01, 2023 at 04:01:21PM +0900, Akihiko Odaki wrote:

The next function numbers are expected to form a linked list ending with
0.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
  hw/net/igb_core.h | 3 +++
  hw/net/igb.c  | 4 +---
  hw/net/igbvf.c| 5 -
  3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/hw/net/igb_core.h b/hw/net/igb_core.h
index 9cbbfd516b..e1dab76995 100644
--- a/hw/net/igb_core.h
+++ b/hw/net/igb_core.h
@@ -49,6 +49,9 @@
  #define IGB_NUM_QUEUES  (16)
  #define IGB_NUM_VM_POOLS(8)
  
+#define IGB_VF_OFFSET   (0x80)

+#define IGB_VF_STRIDE   (2)
+
  typedef struct IGBCore IGBCore;
  
  enum { PHY_R = BIT(0),

diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..543ca0114a 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -81,8 +81,6 @@ struct IGBState {
  };
  
  #define IGB_CAP_SRIOV_OFFSET(0x160)

-#define IGB_VF_OFFSET   (0x80)
-#define IGB_VF_STRIDE   (2)
  
  #define E1000E_MMIO_IDX 0

  #define E1000E_FLASH_IDX1
@@ -431,7 +429,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
  hw_error("Failed to initialize AER capability");
  }
  
-pcie_ari_init(pci_dev, 0x150, 1);

+pcie_ari_init(pci_dev, 0x150, IGB_VF_OFFSET);
  
  pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,

  IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,



I think this change would break migrations from 8.0. No?


Well, I don't have a reason to concern more for this change than other 
bug fixes with behavioral changes observable from the guest.





More importantly your commit log says linked list should end
with 0, but you make it point at a VF instead.



diff --git a/hw/net/igbvf.c b/hw/net/igbvf.c
index 284ea61184..bf2f237ab5 100644
--- a/hw/net/igbvf.c
+++ b/hw/net/igbvf.c
@@ -240,6 +240,9 @@ static const MemoryRegionOps mmio_ops = {
  static void igbvf_pci_realize(PCIDevice *dev, Error **errp)
  {
  IgbVfState *s = IGBVF(dev);
+uint16_t nextvfn = pcie_sriov_vf_number(dev) + 1;
+uint16_t nextfn = nextvfn < IGB_MAX_VF_FUNCTIONS ?
+  IGB_VF_OFFSET + nextvfn * IGB_VF_STRIDE : 0;
  int ret;
  int i;
  
@@ -270,7 +273,7 @@ static void igbvf_pci_realize(PCIDevice *dev, Error **errp)

  hw_error("Failed to initialize AER capability");
  }
  
-pcie_ari_init(dev, 0x150, 1);

+pcie_ari_init(dev, 0x150, nextfn);




For this one I don't see why it matters at all:

The presence of Shadow Functions does not affect this field.
For VFs, this field is undefined since VFs are located using First VF Offset 
(see § Section 9.3.3.9 ) and VF
Stride (see § Section 9.3.3.10 ).


I missed the statements saying the field is undefined for VFs. I posted 
an alternative series ("[PATCH 0/3] pci: Fix ARI next function numbers") 
so please review it.








  }
  
  static void igbvf_pci_uninit(PCIDevice *dev)

--
2.41.0






[PATCH 3/3] igb: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

Fixes: 3a977deebe ("Intrdocue igb device emulation")
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/igb.c b/hw/net/igb.c
index 1c989d7677..897386fc09 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -431,7 +431,7 @@ static void igb_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 hw_error("Failed to initialize AER capability");
 }
 
-pcie_ari_init(pci_dev, 0x150, 1);
+pcie_ari_init(pci_dev, 0x150, 0);
 
 pcie_sriov_pf_init(pci_dev, IGB_CAP_SRIOV_OFFSET, TYPE_IGBVF,
 IGB_82576_VF_DEV_ID, IGB_MAX_VF_FUNCTIONS, IGB_MAX_VF_FUNCTIONS,
-- 
2.41.0




[PATCH 1/3] docs: Fix next function numbers in SR/IOV documentation

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF so the PF should
end the linked list formed with the field by specifying 0.

This also changes the value of the field for VF; it seems to imply the
value has some meaning if it differs from one of the PF, but it doesn't.

Fixes: 2503461691 ("pcie: Add some SR/IOV API documentation in 
docs/pcie_sriov.txt")
Signed-off-by: Akihiko Odaki 
---
 docs/pcie_sriov.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/pcie_sriov.txt b/docs/pcie_sriov.txt
index 7eff7f2703..2b7094dc47 100644
--- a/docs/pcie_sriov.txt
+++ b/docs/pcie_sriov.txt
@@ -48,7 +48,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x70);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
   ...
 
   /* Add and initialize the SR/IOV capability */
@@ -78,7 +78,7 @@ setting up a BAR for a VF.
   ...
   int ret = pcie_endpoint_cap_init(d, 0x60);
   ...
-  pcie_ari_init(d, 0x100, 1);
+  pcie_ari_init(d, 0x100, 0);
   ...
   memory_region_init(mr, ... )
   pcie_sriov_vf_register_bar(d, bar_nr, mr);
-- 
2.41.0




[PATCH 2/3] hw/nvme: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Fixes: 44c2c09488 ("hw/nvme: Add support for SR-IOV")
Signed-off-by: Akihiko Odaki 
---
 hw/nvme/ctrl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index fd917fcda1..6c4809b4f6 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -8088,7 +8088,7 @@ static bool nvme_init_pci(NvmeCtrl *n, PCIDevice 
*pci_dev, Error **errp)
 pcie_endpoint_cap_init(pci_dev, 0x80);
 pcie_cap_flr_init(pci_dev);
 if (n->params.sriov_max_vfs) {
-pcie_ari_init(pci_dev, 0x100, 1);
+pcie_ari_init(pci_dev, 0x100, 0);
 }
 
 /* add one to max_ioqpairs to account for the admin queue pair */
-- 
2.41.0




[PATCH 0/3] pci: Fix ARI next function numbers

2023-07-02 Thread Akihiko Odaki
The ARI next function number field is undefined for VF. The PF should
end the linked list formed with the field by specifying 0.

Supersedes: <20230701070133.24877-1-akihiko.od...@daynix.com>
("[PATCH 0/4] pci: Compare function number and ARI next function number")

Akihiko Odaki (3):
  docs: Fix next function numbers in SR/IOV documentation
  hw/nvme: Fix ARI next function numbers
  igb: Fix ARI next function numbers

 docs/pcie_sriov.txt | 4 ++--
 hw/net/igb.c| 2 +-
 hw/nvme/ctrl.c  | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

-- 
2.41.0