RE: [PATCH v2 0/4] Support dynamic MSI-X allocation

2023-09-24 Thread Liu, Jing2
Hi Alex,

> On Sat, 9/23/2023 4:57AM, Alex Williamson  wrote:
> 
> On Mon, 18 Sep 2023 05:45:03 -0400
> Jing Liu  wrote:
> 
> > Changes since v1:
> > - v1:
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg982842.html
> > - Revise Qemu to QEMU. (Cédric)
> > - Add g_free when failure of getting MSI-X irq info. (Cédric)
> > - Apply Cédric's Reviewed-by. (Cédric)
> > - Use g_autofree to automatically release. (Cédric)
> > - Remove the failure message in vfio_enable_msix_no_vec(). (Cédric)
> >
> > Changes since RFC v1:
> > - RFC v1:
> > https://www.mail-archive.com/qemu-devel@nongnu.org/msg978637.html
> > - Revise the comments. (Alex)
> > - Report error of getting irq info and remove the trace of failure
> >   case. (Alex, Cédric)
> > - Only store dynamic allocation flag as a bool type and test
> >   accordingly. (Alex)
> > - Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
> > - Change the condition logic in vfio_msix_vector_do_use() that moving
> >   the defer_kvm_irq_routing test out and create a common place to update
> >   nr_vectors. (Alex)
> > - Consolidate the way of MSI-X enabling during device initialization and
> >   interrupt restoring that uses fd = -1 trick. Create a function doing
> >   that. (Alex)
> >
> > Before kernel v6.5, dynamic allocation of MSI-X interrupts was not
> > supported. QEMU therefore when allocating a new interrupt, should
> > first release all previously allocated interrupts (including disable
> > of MSI-X) and re-allocate all interrupts that includes the new one.
> >
> > The kernel series [1] adds the support of dynamic MSI-X allocation to
> > vfio-pci and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide
> > user space, that when dynamic MSI-X is supported the flag is cleared.
> >
> > This series makes the behavior for VFIO PCI devices when dynamic MSI-X
> > allocation is supported. When guest unmasks an interrupt, QEMU can
> > directly allocate an interrupt on host for this and has nothing to do
> > with the previously allocated ones. Therefore, host only allocates
> > interrupts for those unmasked (enabled) interrupts inside guest when
> > dynamic MSI-X allocation is supported by device.
> >
> > When guests enable MSI-X with all of the vectors masked, QEMU need
> > match the state to enable MSI-X with no vector enabled. During
> > migration restore, QEMU also need enable MSI-X first in dynamic
> > allocation mode, to avoid the guest unused vectors being allocated on
> > host. To consolidate them, we use vector 0 with an invalid fd to get
> > MSI-X enabled and create a common function for this. This is cleaner
> > than setting userspace triggering and immediately release.
> >
> > Any feedback is appreciated.
> >
> > Jing
> >
> > [1] https://lwn.net/Articles/931679/
> >
> > Jing Liu (4):
> >   vfio/pci: detect the support of dynamic MSI-X allocation
> >   vfio/pci: enable vector on dynamic MSI-X allocation
> >   vfio/pci: use an invalid fd to enable MSI-X
> >   vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
> >
> >  hw/vfio/pci.c| 121 +--
> >  hw/vfio/pci.h|   1 +
> >  hw/vfio/trace-events |   2 +-
> >  3 files changed, 96 insertions(+), 28 deletions(-)
> >
> 
> Some minor comments on 2/ but otherwise:
> 
> Reviewed-by: Alex Williamson 

Thank you very much for the feedback. Will apply on v3 with fix for 2/4. 

Jing



RE: [PATCH v2 2/4] vfio/pci: enable vector on dynamic MSI-X allocation

2023-09-24 Thread Liu, Jing2
Hi Alex,

> On Sat, 9/23/2023 4:55 AM, Alex Williamson  wrote:
> On Mon, 18 Sep 2023 05:45:05 -0400
> Jing Liu  wrote:
> 
> > The vector_use callback is used to enable vector that is unmasked in
> > guest. The kernel used to only support static MSI-X allocation. When
> > allocating a new interrupt using "static MSI-X allocation" kernels,
> > QEMU first disables all previously allocated vectors and then
> > re-allocates all including the new one. The nr_vectors of
> > VFIOPCIDevice indicates that all vectors from 0 to nr_vectors are
> > allocated (and may be enabled), which is used to to loop all the
> > possibly used vectors
>  ^^ ^^
> 
> s/to to/to/

Will change.
> 
> > When, e.g., disabling MSI-X interrupts.
> >
> > Extend the vector_use function to support dynamic MSI-X allocation
> > when host supports the capability. QEMU therefore can individually
> > allocate and enable a new interrupt without affecting others or
> > causing interrupts lost during runtime.
> >
> > Utilize nr_vectors to calculate the upper bound of enabled vectors in
> > dynamic MSI-X allocation mode since looping all msix_entries_nr is not
> > efficient and unnecessary.
> >
> > Signed-off-by: Jing Liu 
> > Tested-by: Reinette Chatre 
> > ---
> > Changes since v1:
> > - Revise Qemu to QEMU.
> >
> > Changes since RFC v1:
> > - Test vdev->msix->noresize to identify the allocation mode. (Alex)
> > - Move defer_kvm_irq_routing test out and update nr_vectors in a
> >   common place before vfio_enable_vectors(). (Alex)
> > - Revise the comments. (Alex)
> > ---
> >  hw/vfio/pci.c | 44 +++-
> >  1 file changed, 27 insertions(+), 17 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 60654ca28ab8..84987e46fd7a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -470,6 +470,7 @@ static int vfio_msix_vector_do_use(PCIDevice *pdev,
> unsigned int nr,
> >  VFIOPCIDevice *vdev = VFIO_PCI(pdev);
> >  VFIOMSIVector *vector;
> >  int ret;
> > +int old_nr_vecs = vdev->nr_vectors;
> 
> Minor suggestion, it reads slightly better below if this were something
> like:
> 
> bool resizing = !!(vdev->nr_vectors < nr + 1);
> 
> Then use the bool in place of the nr+1 tests below.  Thanks,
> 
Got it. This change makes it nice to read. Thanks for the advice. Will send v3 
later. 

Thanks,
Jing

> Alex
> 
> >
> >  trace_vfio_msix_vector_do_use(vdev->vbasedev.name, nr);
> >
> > @@ -512,33 +513,42 @@ static int vfio_msix_vector_do_use(PCIDevice
> *pdev, unsigned int nr,
> >  }
> >
> >  /*
> > - * We don't want to have the host allocate all possible MSI vectors
> > - * for a device if they're not in use, so we shutdown and incrementally
> > - * increase them as needed.
> > + * When dynamic allocation is not supported, we don't want to have the
> > + * host allocate all possible MSI vectors for a device if they're not
> > + * in use, so we shutdown and incrementally increase them as needed.
> > + * nr_vectors represents the total number of vectors allocated.
> > + *
> > + * When dynamic allocation is supported, let the host only allocate
> > + * and enable a vector when it is in use in guest. nr_vectors 
> > represents
> > + * the upper bound of vectors being enabled (but not all of the ranges
> > + * is allocated or enabled).
> >   */
> >  if (vdev->nr_vectors < nr + 1) {
> >  vdev->nr_vectors = nr + 1;
> > -if (!vdev->defer_kvm_irq_routing) {
> > +}
> > +
> > +if (!vdev->defer_kvm_irq_routing) {
> > +if (vdev->msix->noresize && (old_nr_vecs < nr + 1)) {
> >  vfio_disable_irqindex(&vdev->vbasedev, 
> > VFIO_PCI_MSIX_IRQ_INDEX);
> >  ret = vfio_enable_vectors(vdev, true);
> >  if (ret) {
> >  error_report("vfio: failed to enable vectors, %d", ret);
> >  }
> > -}
> > -} else {
> > -Error *err = NULL;
> > -int32_t fd;
> > -
> > -if (vector->virq >= 0) {
> > -fd = event_notifier_get_fd(&vector->kvm_interrupt);
> >  } else {
> > -fd = event_notifier_get_fd(&vector->interrupt);
> > -}
> > +Error *err = NULL;
> > +int32_t fd;
> >
> > -if (vfio_set_irq_signaling(&vdev->vbasedev,
> > - VFIO_PCI_MSIX_IRQ_INDEX, nr,
> > - VFIO_IRQ_SET_ACTION_TRIGGER, fd, 
> > &err)) {
> > -error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> > +if (vector->virq >= 0) {
> > +fd = event_notifier_get_fd(&vector->kvm_interrupt);
> > +} else {
> > +fd = event_notifier_get_fd(&vector->interrupt);
> > +}
> > +
> > +if (vfio_set_irq_signaling(&vdev->vbasedev,
> > +   VFIO_PCI_MSIX_IRQ_INDEX, nr,
> > +  

RE: [PATCH v2 4/4] vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation

2023-09-20 Thread Liu, Jing2
Hi Cédric,
> On 9/19/2023 11:21 PM, Cédric Le Goater wrote:
> 
> On 9/18/23 11:45, Jing Liu wrote:
> > During migration restoring, vfio_enable_vectors() is called to restore
> > enabling MSI-X interrupts for assigned devices. It sets the range from
> > 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
> > guest. During the MSI-X enabling, all the vectors within the range are
> > allocated according to the VFIO_DEVICE_SET_IRQS ioctl.
> >
> > When dynamic MSI-X allocation is supported, we only want the guest
> > unmasked vectors being allocated and enabled. Use vector 0 with an
> > invalid fd to get MSI-X enabled, after that, all the vectors can be
> > allocated in need.
> >
> > Signed-off-by: Jing Liu 
> 
> 
> Reviewed-by: Cédric Le Goater 

Thanks very much for your feedback.

Jing

> 
> Thanks,
> 
> C.
> 
> 
> > ---
> > Changes since v1:
> > - No change.
> >
> > Changes since RFC v1:
> > - Revise the comments. (Alex)
> > - Call the new helper function in previous patch to enable MSI-X.
> > (Alex)
> > ---
> >   hw/vfio/pci.c | 17 +
> >   1 file changed, 17 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 0117f230e934..f5f891dc0792 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -402,6 +402,23 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev,
> bool msix)
> >   int ret = 0, i, argsz;
> >   int32_t *fds;
> >
> > +/*
> > + * If dynamic MSI-X allocation is supported, the vectors to be 
> > allocated
> > + * and enabled can be scattered. Before kernel enabling MSI-X, setting
> > + * nr_vectors causes all these vectors to be allocated on host.
> > + *
> > + * To keep allocation as needed, use vector 0 with an invalid fd to get
> > + * MSI-X enabled first, then set vectors with a potentially sparse set 
> > of
> > + * eventfds to enable interrupts only when enabled in guest.
> > + */
> > +if (msix && !vdev->msix->noresize) {
> > +ret = vfio_enable_msix_no_vec(vdev);
> > +
> > +if (ret) {
> > +return ret;
> > +}
> > +}
> > +
> >   argsz = sizeof(*irq_set) + (vdev->nr_vectors * sizeof(*fds));
> >
> >   irq_set = g_malloc0(argsz);



RE: [PATCH v1 0/4] Support dynamic MSI-X allocation

2023-09-15 Thread Liu, Jing2
Hi Cédric,
Thanks for this information. I'll send v2 later.

Jing

> On 9/15/2023 3:42 PM, Cédric Le Goater  wrote:
> 
> On 9/15/23 09:40, Liu, Jing2 wrote:
> > Friendly ping to have your valuable inputs and comments.
> > Thanks very much.
> 
> I think that was done. We are waiting for the v2.
> 
> Thanks,
> 
> C.
> 
> 
> >
> > BRs,
> > Jing
> >
> >> On 8/22/2023 3:29 PM, Jing Liu wrote:
> >> Changes since RFC v1:
> >> - RFC v1: https://www.mail-archive.com/qemu-
> >> de...@nongnu.org/msg978637.html
> >> - Revise the comments. (Alex)
> >> - Report error of getting irq info and remove the trace of failure
> >>case. (Alex, Cédric)
> >> - Only store dynamic allocation flag as a bool type and test
> >>accordingly. (Alex)
> >> - Move dynamic allocation detection to vfio_msix_early_setup().
> >> (Alex)
> >> - Change the condition logic in vfio_msix_vector_do_use() that moving
> >>the defer_kvm_irq_routing test out and create a common place to update
> >>nr_vectors. (Alex)
> >> - Consolidate the way of MSI-X enabling during device initialization and
> >>interrupt restoring that uses fd = -1 trick. Create a function doing
> >>that. (Alex)
> >>
> >> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not 
> >> supported.
> >> Qemu therefore when allocating a new interrupt, should first release
> >> all previously allocated interrupts (including disable of MSI-X) and
> >> re-allocate all interrupts that includes the new one.
> >>
> >> The kernel series [1] adds the support of dynamic MSI-X allocation to
> >> vfio-pci and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide
> >> user space, that when dynamic MSI-X is supported the flag is cleared.
> >>
> >> This series makes the behavior for VFIO PCI devices when dynamic
> >> MSI-X allocation is supported. When guest unmasks an interrupt, Qemu
> >> can directly allocate an interrupt on host for this and has nothing
> >> to do with the previously allocated ones. Therefore, host only
> >> allocates interrupts for those unmasked
> >> (enabled) interrupts inside guest when dynamic MSI-X allocation is
> >> supported by device.
> >>
> >> When guests enable MSI-X with all of the vectors masked, Qemu need
> >> match the state to enable MSI-X with no vector enabled. During
> >> migration restore, Qemu also need enable MSI-X first in dynamic
> >> allocation mode, to avoid the guest unused vectors being allocated on
> >> host. To consolidate them, we use vector 0 with an invalid fd to get MSI-X
> enabled and create a common function for this.
> >> This is cleaner than setting userspace triggering and immediately release.
> >>
> >> Any feedback is appreciated.
> >>
> >> Jing
> >>
> >> [1] https://lwn.net/Articles/931679/
> >>
> >> Jing Liu (4):
> >>vfio/pci: detect the support of dynamic MSI-X allocation
> >>vfio/pci: enable vector on dynamic MSI-X allocation
> >>vfio/pci: use an invalid fd to enable MSI-X
> >>vfio/pci: enable MSI-X in interrupt restoring on dynamic
> >> allocation
> >>
> >>   hw/vfio/pci.c| 126 +--
> >>   hw/vfio/pci.h|   1 +
> >>   hw/vfio/trace-events |   2 +-
> >>   3 files changed, 101 insertions(+), 28 deletions(-)
> >>
> >> --
> >> 2.27.0
> >



RE: [PATCH v1 0/4] Support dynamic MSI-X allocation

2023-09-15 Thread Liu, Jing2
Friendly ping to have your valuable inputs and comments. 
Thanks very much.

BRs,
Jing

> On 8/22/2023 3:29 PM, Jing Liu wrote:
> Changes since RFC v1:
> - RFC v1: https://www.mail-archive.com/qemu-
> de...@nongnu.org/msg978637.html
> - Revise the comments. (Alex)
> - Report error of getting irq info and remove the trace of failure
>   case. (Alex, Cédric)
> - Only store dynamic allocation flag as a bool type and test
>   accordingly. (Alex)
> - Move dynamic allocation detection to vfio_msix_early_setup(). (Alex)
> - Change the condition logic in vfio_msix_vector_do_use() that moving
>   the defer_kvm_irq_routing test out and create a common place to update
>   nr_vectors. (Alex)
> - Consolidate the way of MSI-X enabling during device initialization and
>   interrupt restoring that uses fd = -1 trick. Create a function doing
>   that. (Alex)
> 
> Before kernel v6.5, dynamic allocation of MSI-X interrupts was not supported.
> Qemu therefore when allocating a new interrupt, should first release all
> previously allocated interrupts (including disable of MSI-X) and re-allocate 
> all
> interrupts that includes the new one.
> 
> The kernel series [1] adds the support of dynamic MSI-X allocation to vfio-pci
> and uses the existing flag VFIO_IRQ_INFO_NORESIZE to guide user space, that
> when dynamic MSI-X is supported the flag is cleared.
> 
> This series makes the behavior for VFIO PCI devices when dynamic MSI-X
> allocation is supported. When guest unmasks an interrupt, Qemu can directly
> allocate an interrupt on host for this and has nothing to do with the 
> previously
> allocated ones. Therefore, host only allocates interrupts for those unmasked
> (enabled) interrupts inside guest when dynamic MSI-X allocation is supported 
> by
> device.
> 
> When guests enable MSI-X with all of the vectors masked, Qemu need match the
> state to enable MSI-X with no vector enabled. During migration restore, Qemu
> also need enable MSI-X first in dynamic allocation mode, to avoid the guest
> unused vectors being allocated on host. To consolidate them, we use vector 0
> with an invalid fd to get MSI-X enabled and create a common function for this.
> This is cleaner than setting userspace triggering and immediately release.
> 
> Any feedback is appreciated.
> 
> Jing
> 
> [1] https://lwn.net/Articles/931679/
> 
> Jing Liu (4):
>   vfio/pci: detect the support of dynamic MSI-X allocation
>   vfio/pci: enable vector on dynamic MSI-X allocation
>   vfio/pci: use an invalid fd to enable MSI-X
>   vfio/pci: enable MSI-X in interrupt restoring on dynamic allocation
> 
>  hw/vfio/pci.c| 126 +--
>  hw/vfio/pci.h|   1 +
>  hw/vfio/trace-events |   2 +-
>  3 files changed, 101 insertions(+), 28 deletions(-)
> 
> --
> 2.27.0



RE: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X

2023-09-04 Thread Liu, Jing2
Hi Igor,

On Wed, August 30, 2023 6:49 PM, Igor Mammedov  wrote:
> 
> On Wed, 30 Aug 2023 10:03:33 +0000
> "Liu, Jing2"  wrote:
> 
...
> > > > +/*
> > > > + * Get MSI-X enabled, but no vector enabled, by setting vector 0
> > > > +with an invalid
> > > > + * fd to kernel.
> > > > + */
> > > > +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> > > > +struct vfio_irq_set *irq_set;
> > >
> > > This could be a 'g_autofree' variable.
> >
> > Thanks for pointing this to me. I just realized QEMU doc recommends
> > g_autofree/g_autoptr to do automatic memory deallocation.
> >
> > I will replace to g_autofree style in next version.
> >
> > I have a question for a specific example (not related to this patch),
> > and I failed to find an example in current QEMU code to understand it.
> > If one g_autofree structure includes a pointer that we also allocate
> > space for the inside pointer, could g_autofree release the inside space?
> 
> it might be too cumbersome for 1-off use see following for how 'auto' works
> https://docs.gtk.org/glib/auto-cleanup.html

Thank you for the guide. It's now clear to me that for such type, there need 
define 
specific free function by macros and g_auto can help call it automatically. 

Jing

> 
> > struct dummy1 {
> > int data;
> > struct *p;
> > }
> > struct p {
> > char member[];
> > }
> > void func() {
> > g_autofree struct *dummy1 = NULL;
> >
> > dummy1 = g_malloc();
> > dummy1->p = g_malloc();
> > ...
> > return; // is this correct or need g_free(p)?
> > }
> >
> > >
> > > > +int ret = 0, argsz;
> > > > +int32_t *fd;
> > > > +
> > > > +argsz = sizeof(*irq_set) + sizeof(*fd);
> > > > +
> > > > +irq_set = g_malloc0(argsz);
> > > > +irq_set->argsz = argsz;
> > > > +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > > > + VFIO_IRQ_SET_ACTION_TRIGGER;
> > > > +irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > > > +irq_set->start = 0;
> > > > +irq_set->count = 1;
> > > > +fd = (int32_t *)&irq_set->data;
> > > > +*fd = -1;
> > > > +
> > > > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > > > +if (ret) {
> > > > +error_report("vfio: failed to enable MSI-X with vector 0 
> > > > trick, %d",
> > > > + ret);
> > >
> > > The above message seems redundant. I would simply return 'ret' and
> > > let the caller report the error. Same as vfio_enable_vectors().
> >
> > Understood. Will change.
> >
> > Thanks,
> > Jing
> >
> > > Thanks,
> > >
> > > C.



RE: [PATCH v1 3/4] vfio/pci: use an invalid fd to enable MSI-X

2023-08-30 Thread Liu, Jing2
Hi Cédric,

On 8/29/2023 10:04 PM, Cédric Le Goater wrote:
> On 8/22/23 09:29, Jing Liu wrote:
> > Guests typically enable MSI-X with all of the vectors masked in the
> > MSI-X vector table. To match the guest state of device, Qemu enables
> > MSI-X by
> 
> QEMU is preferred to Qemu.
Got it. 

> 
> > enabling vector 0 with userspace triggering and immediately release.
> > However the release function actually does not release it due to
> > already using userspace mode.
> >
> > It is no need to enable triggering on host and rely on the mask bit to
> > avoid spurious interrupts. Use an invalid fd (i.e. fd = -1) is enough
> > to get MSI-X enabled.
> >
> > After dynamic MSI-X allocation is supported, the interrupt restoring
> > also need use such way to enable MSI-X, therefore, create a function
> > for that.
> >
> > Suggested-by: Alex Williamson 
> > Signed-off-by: Jing Liu 
> > ---
> > Changes since RFC v1:
> > - A new patch. Use an invalid fd to get MSI-X enabled instead of using
> >userspace triggering. (Alex)
> > ---
> >   hw/vfio/pci.c | 50 ++
> >   1 file changed, 42 insertions(+), 8 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 31f36d68bb19..e24c21241a0c 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -369,6 +369,39 @@ static void vfio_msi_interrupt(void *opaque)
> >   notify(&vdev->pdev, nr);
> >   }
> >
> > +/*
> > + * Get MSI-X enabled, but no vector enabled, by setting vector 0 with
> > +an invalid
> > + * fd to kernel.
> > + */
> > +static int vfio_enable_msix_no_vec(VFIOPCIDevice *vdev)> +{
> > +struct vfio_irq_set *irq_set;
> 
> This could be a 'g_autofree' variable.

Thanks for pointing this to me. I just realized QEMU doc recommends 
g_autofree/g_autoptr to do automatic memory deallocation.

I will replace to g_autofree style in next version.

I have a question for a specific example (not related to this patch), and
I failed to find an example in current QEMU code to understand it.
If one g_autofree structure includes a pointer that we also allocate
space for the inside pointer, could g_autofree release the inside space?

struct dummy1 {
int data;
struct *p;
}
struct p {
char member[];
}
void func() {
g_autofree struct *dummy1 = NULL;

dummy1 = g_malloc();
dummy1->p = g_malloc();
...
return; // is this correct or need g_free(p)?
}

> 
> > +int ret = 0, argsz;
> > +int32_t *fd;
> > +
> > +argsz = sizeof(*irq_set) + sizeof(*fd);
> > +
> > +irq_set = g_malloc0(argsz);
> > +irq_set->argsz = argsz;
> > +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > + VFIO_IRQ_SET_ACTION_TRIGGER;
> > +irq_set->index = VFIO_PCI_MSIX_IRQ_INDEX;
> > +irq_set->start = 0;
> > +irq_set->count = 1;
> > +fd = (int32_t *)&irq_set->data;
> > +*fd = -1;
> > +
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS, irq_set);
> > +if (ret) {
> > +error_report("vfio: failed to enable MSI-X with vector 0 trick, 
> > %d",
> > + ret);
> 
> The above message seems redundant. I would simply return 'ret' and let the
> caller report the error. Same as vfio_enable_vectors().

Understood. Will change.

Thanks,
Jing

> Thanks,
> 
> C.
> 
> 
> > +}
> > +
> > +g_free(irq_set);
> > +
> > +return ret;
> > +}
> > +
> >   static int vfio_enable_vectors(VFIOPCIDevice *vdev, bool msix)
> >   {
> >   struct vfio_irq_set *irq_set;
> > @@ -618,6 +651,8 @@ static void
> > vfio_commit_kvm_msi_virq_batch(VFIOPCIDevice *vdev)
> >
> >   static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >   {
> > +int ret;
> > +
> >   vfio_disable_interrupts(vdev);
> >
> >   vdev->msi_vectors = g_new0(VFIOMSIVector, vdev->msix->entries);
> > @@ -640,8 +675,6 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >   vfio_commit_kvm_msi_virq_batch(vdev);
> >
> >   if (vdev->nr_vectors) {
> > -int ret;
> > -
> >   ret = vfio_enable_vectors(vdev, true);
> >   if (ret) {
> >   error_report("vfio: failed to enable vectors, %d", ret);
> > @@ -655,13 +688,14 @@ static void vfio_msix_enable(VFIOPCIDevice *vdev)
> >* MSI-X capability, but leaves the vector table masked.  We 
> > therefore
> >* can't rely on a vector_use callback (from request_irq() in the 
> > guest)
> >* to switch the physical device into MSI-X mode because that may 
> > come
> a
> > - * long time after pci_enable_msix().  This code enables vector 0 
> > with
> > - * triggering to userspace, then immediately release the vector, 
> > leaving
> > - * the physical device with no vectors enabled, but MSI-X enabled, 
> > just
> > - * like the guest view.
> > + * long time after pci_enable_msix().  This code sets vector 0 
> > with an
> > + * invalid fd to make the physical device MSI-X enabled, but with 
> > no
> > + 

RE: [PATCH v1 1/4] vfio/pci: detect the support of dynamic MSI-X allocation

2023-08-30 Thread Liu, Jing2
Hi Cédric,

Thank you for your reviewing.

On 8/29/2023 9:33 PM, Cédric Le Goater wrote:
> Hello Jing,
> 
> On 8/22/23 09:29, Jing Liu wrote:
> > Kernel provides the guidance of dynamic MSI-X allocation support of
> > passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
> > guide user space.
> >
> > Fetch the flags from host to determine if dynamic MSI-X allocation is
> > supported.
> >
> > Originally-by: Reinette Chatre 
> > Signed-off-by: Jing Liu 
> > ---
> > Changes since RFC v1:
> > - Filter the dynamic MSI-X allocation flag and store as a bool type.
> >(Alex)
> > - Move the detection to vfio_msix_early_setup(). (Alex)
> > - Report error of getting irq info and remove the trace of failure
> >case. (Alex, Cédric)
> > ---
> >   hw/vfio/pci.c| 15 +--
> >   hw/vfio/pci.h|  1 +
> >   hw/vfio/trace-events |  2 +-
> >   3 files changed, 15 insertions(+), 3 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > a205c6b1130f..8a3b34f3c196 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1493,7 +1493,9 @@ static void vfio_msix_early_setup(VFIOPCIDevice
> *vdev, Error **errp)
> >   uint8_t pos;
> >   uint16_t ctrl;
> >   uint32_t table, pba;
> > -int fd = vdev->vbasedev.fd;
> > +int ret, fd = vdev->vbasedev.fd;
> > +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info),
> > +  .index =
> > + VFIO_PCI_MSIX_IRQ_INDEX };
> >   VFIOMSIXInfo *msix;
> >
> >   pos = pci_find_capability(&vdev->pdev, PCI_CAP_ID_MSIX); @@
> > -1530,6 +1532,14 @@ static void vfio_msix_early_setup(VFIOPCIDevice *vdev,
> Error **errp)
> >   msix->pba_offset = pba & ~PCI_MSIX_FLAGS_BIRMASK;
> >   msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> >
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> > +if (ret < 0) {
> > +error_setg_errno(errp, -ret, "failed to get MSI-X irq info");
> 
> Missing :
>  g_free(msix);
> 
Oh, yes. 

> 
> With this fixed,
> 
> Reviewed-by: Cédric Le Goater 
> 
Thank you. I'll apply it in next version with the fix. 

Jing

> Thanks,
> 
> C.
> 
> 
> 
> > +return;
> > +}
> > +
> > +msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
> > +
> >   /*
> >* Test the size of the pba_offset variable and catch if it extends 
> > outside
> >* of the specified BAR. If it is the case, we need to apply a
> > hardware @@ -1562,7 +1572,8 @@ static void
> vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
> >   }
> >
> >   trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
> > -msix->table_offset, msix->entries);
> > +msix->table_offset, msix->entries,
> > +msix->noresize);
> >   vdev->msix = msix;
> >
> >   vfio_pci_fixup_msix_region(vdev); diff --git a/hw/vfio/pci.h
> > b/hw/vfio/pci.h index a2771b9ff3cc..0717574d79e9 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
> >   uint32_t table_offset;
> >   uint32_t pba_offset;
> >   unsigned long *pending;
> > +bool noresize;
> >   } VFIOMSIXInfo;
> >
> >   #define TYPE_VFIO_PCI "vfio-pci"
> > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index
> > ee7509e68e4f..6de5d9ba8e46 100644
> > --- a/hw/vfio/trace-events
> > +++ b/hw/vfio/trace-events
> > @@ -27,7 +27,7 @@ vfio_vga_read(uint64_t addr, int size, uint64_t data) "
> (0x%"PRIx64", %d) = 0x%"
> >   vfio_pci_read_config(const char *name, int addr, int len, int val) " (%s,
> @0x%x, len=0x%x) 0x%x"
> >   vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s,
> @0x%x, 0x%x, len=0x%x)"
> >   vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
> > -vfio_msix_early_setup(const char *name, int pos, int table_bar, int 
> > offset, int
> entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
> > +vfio_msix_early_setup(const char *name, int pos, int table_bar, int 
> > offset, int
> entries, bool noresize) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x,
> entries %d, noresize %d"
> >   vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
> >   vfio_check_pm_reset(const char *name) "%s Supports PM reset"
> >   vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"



RE: [PATCH RFC v1 3/3] vfio/pci: dynamic MSI-X allocation in interrupt restoring

2023-08-01 Thread Liu, Jing2
Hi Alex,

> On July 28, 2023 1:25 AM, Alex Williamson  wrote:
> 
> On Thu, 27 Jul 2023 03:24:10 -0400
> Jing Liu  wrote:
> 
> > During migration restoring, vfio_enable_vectors() is called to restore
> > enabling MSI-X interrupts for assigned devices. It sets the range from
> > 0 to nr_vectors to kernel to enable MSI-X and the vectors unmasked in
> > guest. During the MSI-X enabling, all the vectors within the range are
> > allocated according to the ioctl().
> >
> > When dynamic MSI-X allocation is supported, we only want the guest
> > unmasked vectors being allocated and enabled. Therefore, Qemu can
> > first set vector 0 to enable MSI-X and after that, all the vectors can
> > be allocated in need.
> >
> > Signed-off-by: Jing Liu 
> > ---
> >  hw/vfio/pci.c | 32 
> >  1 file changed, 32 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 8c485636445c..43ffacd5b36a 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -375,6 +375,38 @@ static int vfio_enable_vectors(VFIOPCIDevice *vdev,
> bool msix)
> >  int ret = 0, i, argsz;
> >  int32_t *fds;
> >
> > +/*
> > + * If dynamic MSI-X allocation is supported, the vectors to be 
> > allocated
> > + * and enabled can be scattered. Before kernel enabling MSI-X, setting
> > + * nr_vectors causes all these vectors being allocated on host.
> 
> s/being/to be/
Will change.

> 
> > + *
> > + * To keep allocation as needed, first setup vector 0 with an invalid
> > + * fd to make MSI-X enabled, then enable vectors by setting all so that
> > + * kernel allocates and enables interrupts only when enabled in guest.
> > + */
> > +if (msix && !(vdev->msix->irq_info_flags &
> > + VFIO_IRQ_INFO_NORESIZE)) {
> 
> !vdev->msix->noresize again seems cleaner.
Sure, will change.
> 
> > +argsz = sizeof(*irq_set) + sizeof(*fds);
> > +
> > +irq_set = g_malloc0(argsz);
> > +irq_set->argsz = argsz;
> > +irq_set->flags = VFIO_IRQ_SET_DATA_EVENTFD |
> > + VFIO_IRQ_SET_ACTION_TRIGGER;
> > +irq_set->index = msix ? VFIO_PCI_MSIX_IRQ_INDEX :
> > + VFIO_PCI_MSI_IRQ_INDEX;
> 
> Why are we testing msix again within a branch that requires msix?
Ah, yes. Will remove the test.

> 
> > +irq_set->start = 0;
> > +irq_set->count = 1;
> > +fds = (int32_t *)&irq_set->data;
> > +fds[0] = -1;
> > +
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_SET_IRQS,
> > + irq_set);
> > +
> > +g_free(irq_set);
> > +
> > +if (ret) {
> > +return ret;
> > +}
> > +}
> 
> So your goal here is simply to get the kernel to call vfio_msi_enable() with 
> nvec
> = 1 to get MSI-X enabled on the device, which then allows the kernel to use 
> the
> dynamic expansion when we call SET_IRQS again with a potentially sparse set of
> eventfds to vector mappings.  

Yes, that's what I can think out to get MSI-X enabled first. The only question 
is that,
when getting kernel to call vfio_msi_enable() with nvec=1, kernel will allocate 
one
interrupt along with enabling MSI-X, which cannot avoid. 

Therefore, if we set vector 0 for example, irq for vec 0 will be allocated in 
kernel.
And later if vector 0 is unmasked in guest, then enable it as normal; but if 
vector 0
is always masked in guest, then we leave an allocated irq there (unenabled 
though)
until MSI-X disable.
I'm not sure if this is okay, but cannot think out other cleaner way.
And I also wonder if it is possible, or vector 0 is always being enabled?


This seems very similar to the nr_vectors == 0
> branch of vfio_msix_enable() where it uses a do_use and release call to
> accomplish getting MSI-X enabled.  

They are similar. Use a do_use to setup userspace triggering also makes kernel
one allocated irq there. And my understanding is that, the following release 
function
actually won't release if it is a userspace trigger.

static void vfio_msix_vector_release(PCIDevice *pdev, unsigned int nr)
{
/*
 * There are still old guests that mask and unmask vectors on every
 * interrupt.  If we're using QEMU bypass with a KVM irqfd, leave all of
 * the KVM setup in place, simply switch VFIO to use the non-bypass
 * eventfd.  We'll then fire the interrupt through QEMU and the MSI-X
 * core will mask the interrupt and set pending bits, allowing it to
 * be re-asserted on unmask.  Nothing to do if already using QEMU mode.
 */
...
}

 
We should consolidate, probably by pulling
> this out into a function since it seems cleaner to use the fd = -1 trick than 
> to
> setup userspace triggering and immediately release.  Thanks,

Oh, yes, agree that uses fd=-1 trick is cleaner and we don't need depend on the 
maskable
bit in qemu. According to your suggestion, I will create a function e.g., 
vfio_enable_msix_no_vec(vdev), which only sets vector 0 with fd=-1 to kernel, 
and 
returns the r

RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-31 Thread Liu, Jing2
Hi C.

> On July 31, 2023 3:26 PM, Cédric Le Goater  wrote:
> 
> On 7/31/23 05:57, Liu, Jing2 wrote:
> > Hi C.
> >
> >> On July 28, 2023 4:44 PM, Cédric Le Goater  wrote:
> >>
> >> [ ... ]
> >>
> >>> Sorry I didn't quite understand "info.flags be tested against
> >> VFIO_IRQ_INFO_NORESIZE".
> >>> I saw kernel < 6.4 simply added NORESIZE to info.flags and latest
> >>> kernel adds
> >> if has_dyn_msix.
> >>> Would you please kindly describe more on your point?
> >>
> >> I was trying to find the conditions to detect safely that the kernel
> >> didn't have dynamic MSI-X support. Testing VFIO_IRQ_INFO_NORESIZE
> seems enough.
> >>
> > Oh, I see.
> >
> >>>> In that case, QEMU should report an error and the trace event is
> >>>> not
> >> needed.
> >>>
> >>> I replied an email with new error handling draft code based on my
> >>> understanding, which reports the error and need no trace. Could you
> >>> please
> >> help review if that is what we want?
> >>
> >> yes. It looked good. Please send a v1 !
> >
> > Thanks for reviewing that. I guess you mean v2 for next version 😊
> 
> Well, if you remove the RFC status, I think you should, this would still be a 
> v1.
> 
Oh, got it. Thanks for telling this.

Jing

> Thanks,
> 
> C.



RE: [PATCH RFC v1 2/3] vfio/pci: enable vector on dynamic MSI-X allocation

2023-07-31 Thread Liu, Jing2
Hi Alex,

> On July 28, 2023 1:25 AM,  Alex Williamson  wrote:
> 
> On Thu, 27 Jul 2023 03:24:09 -0400
> Jing Liu  wrote:
> 
> > The vector_use callback is used to enable vector that is unmasked in
> > guest. The kernel used to only support static MSI-X allocation. When
> > allocating a new interrupt using "static MSI-X allocation" kernels,
> > Qemu first disables all previously allocated vectors and then
> > re-allocates all including the new one. The nr_vectors of
> > VFIOPCIDevice indicates that all vectors from 0 to nr_vectors are
> > allocated (and may be enabled), which is used to to loop all the
> > possibly used vectors When, e.g., disabling MSI-X interrupts.
> >
> > Extend the vector_use function to support dynamic MSI-X allocation
> > when host supports the capability. Qemu therefore can individually
> > allocate and enable a new interrupt without affecting others or
> > causing interrupts lost during runtime.
> >
> > Utilize nr_vectors to calculate the upper bound of enabled vectors in
> > dynamic MSI-X allocation mode since looping all msix_entries_nr is not
> > efficient and unnecessary.
> >
> > Signed-off-by: Jing Liu 
> > Tested-by: Reinette Chatre 
> > ---
> >  hw/vfio/pci.c | 40 +++-
> >  1 file changed, 27 insertions(+), 13 deletions(-)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > 0c4ac0873d40..8c485636445c 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -512,12 +512,20 @@ static int vfio_msix_vector_do_use(PCIDevice
> *pdev, unsigned int nr,
> >  }
> >
> >  /*
> > - * We don't want to have the host allocate all possible MSI vectors
> > - * for a device if they're not in use, so we shutdown and incrementally
> > - * increase them as needed.
> > + * When dynamic allocation is not supported, we don't want to have the
> > + * host allocate all possible MSI vectors for a device if they're not
> > + * in use, so we shutdown and incrementally increase them as needed.
> > + * And nr_vectors stands for the number of vectors being allocated.
> 
> "nr_vectors represents the total number of vectors allocated."

Will change.

> 
> > + *
> > + * When dynamic allocation is supported, let the host only allocate
> > + * and enable a vector when it is in use in guest. nr_vectors stands
> > + * for the upper bound of vectors being enabled (but not all of the
> > + * ranges is allocated or enabled).
> 
> s/stands for/represents/
Will change.
> 
> >   */
> > -if (vdev->nr_vectors < nr + 1) {
> > +if ((vdev->msix->irq_info_flags & VFIO_IRQ_INFO_NORESIZE) &&
> 
> Testing vdev->msix->noresize would be cleaner.
> 
> > +(vdev->nr_vectors < nr + 1)) {
> >  vdev->nr_vectors = nr + 1;
> > +
> >  if (!vdev->defer_kvm_irq_routing) {
> >  vfio_disable_irqindex(&vdev->vbasedev, 
> > VFIO_PCI_MSIX_IRQ_INDEX);
> >  ret = vfio_enable_vectors(vdev, true); @@ -529,16 +537,22
> > @@ static int vfio_msix_vector_do_use(PCIDevice *pdev, unsigned int nr,
> >  Error *err = NULL;
> >  int32_t fd;
> >
> > -if (vector->virq >= 0) {
> > -fd = event_notifier_get_fd(&vector->kvm_interrupt);
> > -} else {
> > -fd = event_notifier_get_fd(&vector->interrupt);
> > -}
> > +if (!vdev->defer_kvm_irq_routing) {
> > +if (vector->virq >= 0) {
> > +fd = event_notifier_get_fd(&vector->kvm_interrupt);
> > +} else {
> > +fd = event_notifier_get_fd(&vector->interrupt);
> > +}
> >
> > -if (vfio_set_irq_signaling(&vdev->vbasedev,
> > - VFIO_PCI_MSIX_IRQ_INDEX, nr,
> > - VFIO_IRQ_SET_ACTION_TRIGGER, fd, 
> > &err)) {
> > -error_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> > +if (vfio_set_irq_signaling(&vdev->vbasedev,
> > +   VFIO_PCI_MSIX_IRQ_INDEX, nr,
> > +   VFIO_IRQ_SET_ACTION_TRIGGER, fd, 
> > &err)) {
> > +error_reportf_err(err, VFIO_MSG_PREFIX, 
> > vdev->vbasedev.name);
> > +}
> > +}
> > +/* Increase for dynamic allocation case. */
> > +if (vdev->nr_vectors < nr + 1) {
> > +vdev->nr_vectors = nr + 1;
> >  }
> 
> We now have two branches where the bulk of the code is skipped when
> defer_kvm_irq_routing is enabled and doing effectively the same update to
> nr_vectors otherwise.  This suggests we should move the
> defer_kvm_irq_routing test out and create a common place to update
> nr_vectors.  Thanks,

I make a new logic as follows that moves the defer_kvm_irq_routing test out.
Since the vfio_enable_vectors() function need an updated nr_vectors value
so need first update and test the different conditions using old value,
e.g. old_nr_vec.

int old_nr_vec = vdev->nr_vectors;
.

RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-30 Thread Liu, Jing2
Hi C.

> On July 28, 2023 4:44 PM, Cédric Le Goater  wrote:
> 
> [ ... ]
> 
> > Sorry I didn't quite understand "info.flags be tested against
> VFIO_IRQ_INFO_NORESIZE".
> > I saw kernel < 6.4 simply added NORESIZE to info.flags and latest kernel 
> > adds
> if has_dyn_msix.
> > Would you please kindly describe more on your point?
> 
> I was trying to find the conditions to detect safely that the kernel didn't 
> have
> dynamic MSI-X support. Testing VFIO_IRQ_INFO_NORESIZE seems enough.
> 
Oh, I see.

> >> In that case, QEMU should report an error and the trace event is not
> needed.
> >
> > I replied an email with new error handling draft code based on my
> > understanding, which reports the error and need no trace. Could you please
> help review if that is what we want?
> 
> yes. It looked good. Please send a v1 !

Thanks for reviewing that. I guess you mean v2 for next version 😊

Jing
> 
> Thanks,
> 
> Cédric.



RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-30 Thread Liu, Jing2
Hi Alex,

> On July 28, 2023 11:41 PM, Alex Williamson  wrote:
> 
> On Fri, 28 Jul 2023 10:27:17 +0200
> Cédric Le Goater  wrote:
> 
> > On 7/28/23 10:09, Liu, Jing2 wrote:
> > > Hi Alex,
> > >
> > > Thanks very much for reviewing the patches.
> > >
> > >> On July 28, 2023 1:25 AM, Alex Williamson
>  wrote:
> > >>
> > >> On Thu, 27 Jul 2023 03:24:08 -0400
> > >> Jing Liu  wrote:
> > >>
> > >>> From: Reinette Chatre 
> > >>>
> > >>> Kernel provides the guidance of dynamic MSI-X allocation support
> > >>> of passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag
> > >>> to guide user space.
> > >>>
> > >>> Fetch and store the flags from host for later use to determine if
> > >>> specific flags are set.
> > >>>
> > >>> Signed-off-by: Reinette Chatre 
> > >>> Signed-off-by: Jing Liu 
> > >>> ---
> > >>>   hw/vfio/pci.c| 12 
> > >>>   hw/vfio/pci.h|  1 +
> > >>>   hw/vfio/trace-events |  2 ++
> > >>>   3 files changed, 15 insertions(+)
> > >>>
> > >>> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > >>> a205c6b1130f..0c4ac0873d40 100644
> > >>> --- a/hw/vfio/pci.c
> > >>> +++ b/hw/vfio/pci.c
> > >>> @@ -1572,6 +1572,7 @@ static void
> > >>> vfio_msix_early_setup(VFIOPCIDevice
> > >>> *vdev, Error **errp)
> > >>>
> > >>>   static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error
> > >>> **errp)  {
> > >>> +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info)
> > >>> + };
> > >>>   int ret;
> > >>>   Error *err = NULL;
> > >>>
> > >>> @@ -1624,6 +1625,17 @@ static int vfio_msix_setup(VFIOPCIDevice
> > >>> *vdev, int
> > >> pos, Error **errp)
> > >>>   memory_region_set_enabled(&vdev->pdev.msix_table_mmio,
> false);
> > >>>   }
> > >>>
> > >>> +irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;
> > >>> +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO,
> &irq_info);
> > >>> +if (ret) {
> > >>> +/* This can fail for an old kernel or legacy PCI dev */
> > >>> +
> > >>> + trace_vfio_msix_setup_get_irq_info_failure(strerror(errno));
> > >>
> > >> We only call vfio_msix_setup() if the device has an MSI-X
> > >> capability, so the "legacy PCI" portion of this comment seems 
> > >> unjustified.
> > >> Otherwise the GET_IRQ_INFO ioctl has always existed, so I'd also
> > >> question the "old kernel" part of this comment.
> > >
> > > Oh, yes, I just realize that only VFIO_PCI_ERR_IRQ_INDEX and
> > > VFIO_PCI_REQ_IRQ_INDEX were added later in
> > > include/uapi/linux/vfio.h. Thus, this ioctl() with MSIX index would not 
> > > fail by
> the old-kernel or legacy-PCI reason.
> > > Thanks for pointing out this to me.
> > >
> > > We don't currently sanity test the device
> > >> exposed MSI-X info versus that reported by GET_IRQ_INFO, but it
> > >> seems valid to do so.
> > >
> > > Do we want to keep the check of possible failure from kernel (e.g.,
> > > -EFAULT) and report the error code back to caller? Maybe like this,
> > >
> > > static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
> > > {
> > >  
> > >  msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;
> > >
> > >  ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> > >  if (ret < 0) {
> > >  error_setg_errno(errp, -ret, "failed to get MSI-X IRQ INFO");
> > >  return;
> 
> Yes.
> 
> > >  } else {
> > >  vdev->msix->noresize = !!(irq_info.flags & 
> > > VFIO_IRQ_INFO_NORESIZE);
> > >  }
> 
> No 'else' required here since the error branch returns.

Oh, yes. Will remove the "else" and just set the ”noresize“ value in next 
version.

> 
> > >  ...
> > >  trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix-
> >table_bar,
> > >  

RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-28 Thread Liu, Jing2
Hi Alex,

Thanks very much for reviewing the patches.

> On July 28, 2023 1:25 AM, Alex Williamson  wrote:
> 
> On Thu, 27 Jul 2023 03:24:08 -0400
> Jing Liu  wrote:
> 
> > From: Reinette Chatre 
> >
> > Kernel provides the guidance of dynamic MSI-X allocation support of
> > passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
> > guide user space.
> >
> > Fetch and store the flags from host for later use to determine if
> > specific flags are set.
> >
> > Signed-off-by: Reinette Chatre 
> > Signed-off-by: Jing Liu 
> > ---
> >  hw/vfio/pci.c| 12 
> >  hw/vfio/pci.h|  1 +
> >  hw/vfio/trace-events |  2 ++
> >  3 files changed, 15 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > a205c6b1130f..0c4ac0873d40 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1572,6 +1572,7 @@ static void vfio_msix_early_setup(VFIOPCIDevice
> > *vdev, Error **errp)
> >
> >  static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error
> > **errp)  {
> > +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> >  int ret;
> >  Error *err = NULL;
> >
> > @@ -1624,6 +1625,17 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int
> pos, Error **errp)
> >  memory_region_set_enabled(&vdev->pdev.msix_table_mmio, false);
> >  }
> >
> > +irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> > +if (ret) {
> > +/* This can fail for an old kernel or legacy PCI dev */
> > +trace_vfio_msix_setup_get_irq_info_failure(strerror(errno));
> 
> We only call vfio_msix_setup() if the device has an MSI-X capability, so the
> "legacy PCI" portion of this comment seems unjustified.
> Otherwise the GET_IRQ_INFO ioctl has always existed, so I'd also question the
> "old kernel" part of this comment.  

Oh, yes, I just realize that only VFIO_PCI_ERR_IRQ_INDEX and 
VFIO_PCI_REQ_IRQ_INDEX were added later in include/uapi/linux/vfio.h. Thus,
this ioctl() with MSIX index would not fail by the old-kernel or legacy-PCI 
reason.
Thanks for pointing out this to me.

We don't currently sanity test the device
> exposed MSI-X info versus that reported by GET_IRQ_INFO, but it seems valid to
> do so.  

Do we want to keep the check of possible failure from kernel (e.g., -EFAULT) 
and report 
the error code back to caller? Maybe like this,

static void vfio_msix_early_setup(VFIOPCIDevice *vdev, Error **errp)
{

msix->entries = (ctrl & PCI_MSIX_FLAGS_QSIZE) + 1;

ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
if (ret < 0) {
error_setg_errno(errp, -ret, "failed to get MSI-X IRQ INFO");
return;
} else {
vdev->msix->noresize = !!(irq_info.flags & VFIO_IRQ_INFO_NORESIZE);
}
...
trace_vfio_msix_early_setup(vdev->vbasedev.name, pos, msix->table_bar,
msix->table_offset, msix->entries, 
vdev->msix->noresize);

}

> I'd expect this to happen in vfio_msix_early_setup() though, especially
> since that's where the remainder of VFIOMSIXInfo is setup.

> 
> > +} else {
> > +vdev->msix->irq_info_flags = irq_info.flags;
> > +}
> > +trace_vfio_msix_setup_irq_info_flags(vdev->vbasedev.name,
> > + vdev->msix->irq_info_flags);
> > +
> >  return 0;
> >  }
> >
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index
> > a2771b9ff3cc..ad34ec56d0ae 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
> >  uint32_t table_offset;
> >  uint32_t pba_offset;
> >  unsigned long *pending;
> > +uint32_t irq_info_flags;
> 
> Why not simply pull out a "noresize" bool?  Thanks,
> 
Will change to a bool type.

Thanks,
Jing

> Alex
> 
> >  } VFIOMSIXInfo;
> >
> >  #define TYPE_VFIO_PCI "vfio-pci"
> > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index
> > ee7509e68e4f..7d4a398f044d 100644
> > --- a/hw/vfio/trace-events
> > +++ b/hw/vfio/trace-events
> > @@ -28,6 +28,8 @@ vfio_pci_read_config(const char *name, int addr, int
> > len, int val) " (%s, @0x%x,  vfio_pci_write_config(const char *name, int 
> > addr, int
> val, int len) " (%s, @0x%x, 0x%x, len=0x%x)"
> >  vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
> >  vfio_msix_early_setup(const char *name, int pos, int table_bar, int 
> > offset, int
> entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
> > +vfio_msix_setup_get_irq_info_failure(const char *errstr)
> "VFIO_DEVICE_GET_IRQ_INFO failure: %s"
> > +vfio_msix_setup_irq_info_flags(const char *name, uint32_t flags) " (%s) 
> > MSI-X
> irq info flags 0x%x"
> >  vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
> >  vfio_check_pm_reset(const char *name) "%s Supports PM reset"
> >  vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"




RE: [PATCH RFC v1 1/3] vfio/pci: detect the support of dynamic MSI-X allocation

2023-07-28 Thread Liu, Jing2
Hi C.,

Thanks very much for reviewing the patches.

> On July 28, 2023 12:58 AM, Cédric Le Goater  wrote:
> 
> Hello Jing,
> 
> On 7/27/23 09:24, Jing Liu wrote:
> > From: Reinette Chatre 
> >
> > Kernel provides the guidance of dynamic MSI-X allocation support of
> > passthrough device, by clearing the VFIO_IRQ_INFO_NORESIZE flag to
> > guide user space.
> >
> > Fetch and store the flags from host for later use to determine if
> > specific flags are set.
> >
> > Signed-off-by: Reinette Chatre 
> > Signed-off-by: Jing Liu 
> > ---
> >   hw/vfio/pci.c| 12 
> >   hw/vfio/pci.h|  1 +
> >   hw/vfio/trace-events |  2 ++
> >   3 files changed, 15 insertions(+)
> >
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c index
> > a205c6b1130f..0c4ac0873d40 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -1572,6 +1572,7 @@ static void vfio_msix_early_setup(VFIOPCIDevice
> > *vdev, Error **errp)
> >
> >   static int vfio_msix_setup(VFIOPCIDevice *vdev, int pos, Error **errp)
> >   {
> > +struct vfio_irq_info irq_info = { .argsz = sizeof(irq_info) };
> >   int ret;
> >   Error *err = NULL;
> >
> > @@ -1624,6 +1625,17 @@ static int vfio_msix_setup(VFIOPCIDevice *vdev, int
> pos, Error **errp)
> >   memory_region_set_enabled(&vdev->pdev.msix_table_mmio, false);
> >   }
> >
> > +irq_info.index = VFIO_PCI_MSIX_IRQ_INDEX;
> > +ret = ioctl(vdev->vbasedev.fd, VFIO_DEVICE_GET_IRQ_INFO, &irq_info);
> > +if (ret) {
> > +/* This can fail for an old kernel or legacy PCI dev */
> > +trace_vfio_msix_setup_get_irq_info_failure(strerror(errno));
> 
> Is it possible to detect the error reported by a kernel (< 6.4) when dynamic 
> MSI-X
> are not supported. 

I just realized that ioctl(VFIO_DEVICE_GET_IRQ_INFO) with 
VFIO_PCI_MSIX_IRQ_INDEX
should always exists and would not cause failure for older kernel reason, after 
reading
Alex's comment on this patch. (If my understanding is correct)

So the possible failure might be other reason except old kernel or legacy PCI 
dev.

Looking at vfio_pci_ioctl_get_irq_info() in the kernel, could
> info.flags be tested against VFIO_IRQ_INFO_NORESIZE ?
> 
Sorry I didn't quite understand "info.flags be tested against 
VFIO_IRQ_INFO_NORESIZE".
I saw kernel < 6.4 simply added NORESIZE to info.flags and latest kernel adds 
if has_dyn_msix.
Would you please kindly describe more on your point?

> In that case, QEMU should report an error and the trace event is not needed.

I replied an email with new error handling draft code based on my 
understanding, which 
reports the error and need no trace. Could you please help review if that is 
what we want?

Thanks,
Jing

> 
> Thanks,
> 
> C.
> 
> > +} else {
> > +vdev->msix->irq_info_flags = irq_info.flags;
> > +}
> > +trace_vfio_msix_setup_irq_info_flags(vdev->vbasedev.name,
> > + vdev->msix->irq_info_flags);
> > +
> >   return 0;
> >   }
> >
> > diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h index
> > a2771b9ff3cc..ad34ec56d0ae 100644
> > --- a/hw/vfio/pci.h
> > +++ b/hw/vfio/pci.h
> > @@ -113,6 +113,7 @@ typedef struct VFIOMSIXInfo {
> >   uint32_t table_offset;
> >   uint32_t pba_offset;
> >   unsigned long *pending;
> > +uint32_t irq_info_flags;
> >   } VFIOMSIXInfo;
> >
> >   #define TYPE_VFIO_PCI "vfio-pci"
> > diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events index
> > ee7509e68e4f..7d4a398f044d 100644
> > --- a/hw/vfio/trace-events
> > +++ b/hw/vfio/trace-events
> > @@ -28,6 +28,8 @@ vfio_pci_read_config(const char *name, int addr, int len,
> int val) " (%s, @0x%x,
> >   vfio_pci_write_config(const char *name, int addr, int val, int len) " (%s,
> @0x%x, 0x%x, len=0x%x)"
> >   vfio_msi_setup(const char *name, int pos) "%s PCI MSI CAP @0x%x"
> >   vfio_msix_early_setup(const char *name, int pos, int table_bar, int 
> > offset, int
> entries) "%s PCI MSI-X CAP @0x%x, BAR %d, offset 0x%x, entries %d"
> > +vfio_msix_setup_get_irq_info_failure(const char *errstr)
> "VFIO_DEVICE_GET_IRQ_INFO failure: %s"
> > +vfio_msix_setup_irq_info_flags(const char *name, uint32_t flags) " (%s) 
> > MSI-X
> irq info flags 0x%x"
> >   vfio_check_pcie_flr(const char *name) "%s Supports FLR via PCIe cap"
> >   vfio_check_pm_reset(const char *name) "%s Supports PM reset"
> >   vfio_check_af_flr(const char *name) "%s Supports FLR via AF cap"



Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

2020-02-11 Thread Liu, Jing2


On 2/11/2020 3:40 PM, Jason Wang wrote:


On 2020/2/11 下午2:02, Liu, Jing2 wrote:



On 2/11/2020 12:02 PM, Jason Wang wrote:


On 2020/2/11 上午11:35, Liu, Jing2 wrote:


On 2/11/2020 11:17 AM, Jason Wang wrote:


On 2020/2/10 下午5:05, Zha Bin wrote:

From: Liu Jiang

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of 
using

virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only 
supports one
legacy interrupt, which is much heavier than virtio over PCI 
transport
layer using MSI. Legacy interrupt has long work path and causes 
specific

VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to add MSI support for virtio over MMIO via new feature
bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.

With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio 
MSI

uses msi_sharing[1] to indicate the event and vector mapping.
Bit 1 is 0: device uses non-sharing and fixed vector per event 
mapping.

Bit 1 is 1: device uses sharing mode and dynamic mapping.



I believe dynamic mapping should cover the case of fixed vector?


Actually this bit *aims* for msi sharing or msi non-sharing.

It means, when msi sharing bit is 1, device doesn't want vector per 
queue


(it wants msi vector sharing as name) and doesn't want a high 
interrupt rate.


So driver turns to !per_vq_vectors and has to do dynamical mapping.

So they are opposite not superset.

Thanks!

Jing



I think you need add more comments on the command.

E.g if I want to map vector 0 to queue 1, how do I need to do?

write(1, queue_sel);
write(0, vector_sel);


That's true. Besides, two commands are used for msi sharing mode,

VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.

"To set up the event and vector mapping for MSI sharing mode, driver 
SHOULD write a valid MsiVecSel followed by 
VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command 
to map the configuration change/selected queue events respectively.  
" (See spec patch 5/5)


So if driver detects the msi sharing mode, when it does setup vq, 
writes the queue_sel (this already exists in setup vq), vector sel 
and then MAP_QUEUE command to do the queue event mapping.




So actually the per vq msix could be done through this. 


Right, per vq msix can also be mapped by the 2 commands if we want.

The current design benefits for those devices requesting per vq msi that 
driver


doesn't have to map during setup each queue,

since we define the relationship by default.


I don't get why you need to introduce MSI_SHARING_MASK which is the 
charge of driver instead of device. 


MSI_SHARING_MASK is just for identifying the msi_sharing bit in 
readl(MsiState) (0x0c4). The device tells whether it wants msi_sharing.


MsiState register: R

le32 {
    msi_enabled : 1;
    msi_sharing: 1;
    reserved : 30;
};

The interrupt rate should have no direct relationship with whether it 
has been shared or not.




Btw, you introduce mask/unmask without pending, how to deal with the 
lost interrupt during the masking then?



For msi non-sharing mode, no special action is needed because we make 
the rule of per_vq_vector and fixed relationship.


Correct me if this is not that clear for spec/code comments.



The ABI is not as straightforward as PCI did. Why not just reuse the 
PCI layout?


E.g having

queue_sel
queue_msix_vector
msix_config

for configuring map between msi vector and queues/config


Thanks for the advice. :)

Actually when looking into pci, the queue_msix_vector/msix_config is the 
msi vector index, which is the same as the mmio register MsiVecSel (0x0d0).


So we don't introduce two extra registers for mapping even in sharing mode.

What do you think?




Then

vector_sel
address
data
pending
mask
unmask

for configuring msi table?


PCI-like msix table is not introduced to device and instead simply use 
commands to tell the mask/configure/enable.


Thanks!

Jing



Thanks



Thanks!

Jing




?

Thanks






Thanks



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org









Re: [virtio-dev] Re: [PATCH v2 2/5] virtio-mmio: refactor common functionality

2020-02-11 Thread Liu, Jing2



On 2/11/2020 7:19 PM, Michael S. Tsirkin wrote:

On Mon, Feb 10, 2020 at 05:05:18PM +0800, Zha Bin wrote:

From: Liu Jiang 

Common functionality is refactored into virtio_mmio_common.h
in order to MSI support in later patch set.

Signed-off-by: Liu Jiang 
Co-developed-by: Zha Bin 
Signed-off-by: Zha Bin 
Co-developed-by: Jing Liu 
Signed-off-by: Jing Liu 
Co-developed-by: Chao Peng 
Signed-off-by: Chao Peng 

What does this proliferation of header files achieve?
common with what?


We're considering that the virtio mmio structure is useful for virtio 
mmio msi file so refactor out.


e.g. to get the base of virtio_mmio_device from struct msi_desc *desc.

Jing



---
  drivers/virtio/virtio_mmio.c| 21 +
  drivers/virtio/virtio_mmio_common.h | 31 +++
  2 files changed, 32 insertions(+), 20 deletions(-)
  create mode 100644 drivers/virtio/virtio_mmio_common.h

diff --git a/drivers/virtio/virtio_mmio.c b/drivers/virtio/virtio_mmio.c
index 1733ab97..41e1c93 100644
--- a/drivers/virtio/virtio_mmio.c
+++ b/drivers/virtio/virtio_mmio.c
@@ -61,13 +61,12 @@
  #include 
  #include 
  #include 
-#include 
  #include 
  #include 
-#include 
  #include 
  #include 
  #include 
+#include "virtio_mmio_common.h"
  
  
  
@@ -77,24 +76,6 @@
  
  
  
-#define to_virtio_mmio_device(_plat_dev) \

-   container_of(_plat_dev, struct virtio_mmio_device, vdev)
-
-struct virtio_mmio_device {
-   struct virtio_device vdev;
-   struct platform_device *pdev;
-
-   void __iomem *base;
-   unsigned long version;
-
-   /* a list of queues so we can dispatch IRQs */
-   spinlock_t lock;
-   struct list_head virtqueues;
-
-   unsigned short notify_base;
-   unsigned short notify_multiplier;
-};
-
  struct virtio_mmio_vq_info {
/* the actual virtqueue */
struct virtqueue *vq;
diff --git a/drivers/virtio/virtio_mmio_common.h 
b/drivers/virtio/virtio_mmio_common.h
new file mode 100644
index 000..90cb304
--- /dev/null
+++ b/drivers/virtio/virtio_mmio_common.h
@@ -0,0 +1,31 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+#ifndef _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
+#define _DRIVERS_VIRTIO_VIRTIO_MMIO_COMMON_H
+/*
+ * Virtio MMIO driver - common functionality for all device versions
+ *
+ * This module allows virtio devices to be used over a memory-mapped device.
+ */
+
+#include 
+#include 
+
+#define to_virtio_mmio_device(_plat_dev) \
+   container_of(_plat_dev, struct virtio_mmio_device, vdev)
+
+struct virtio_mmio_device {
+   struct virtio_device vdev;
+   struct platform_device *pdev;
+
+   void __iomem *base;
+   unsigned long version;
+
+   /* a list of queues so we can dispatch IRQs */
+   spinlock_t lock;
+   struct list_head virtqueues;
+
+   unsigned short notify_base;
+   unsigned short notify_multiplier;
+};
+
+#endif
--
1.8.3.1


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org





Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

2020-02-10 Thread Liu, Jing2


On 2/11/2020 12:02 PM, Jason Wang wrote:


On 2020/2/11 上午11:35, Liu, Jing2 wrote:


On 2/11/2020 11:17 AM, Jason Wang wrote:


On 2020/2/10 下午5:05, Zha Bin wrote:

From: Liu Jiang

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of 
using

virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only supports one
legacy interrupt, which is much heavier than virtio over PCI transport
layer using MSI. Legacy interrupt has long work path and causes 
specific

VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to add MSI support for virtio over MMIO via new feature
bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.

With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
uses msi_sharing[1] to indicate the event and vector mapping.
Bit 1 is 0: device uses non-sharing and fixed vector per event 
mapping.

Bit 1 is 1: device uses sharing mode and dynamic mapping.



I believe dynamic mapping should cover the case of fixed vector?


Actually this bit *aims* for msi sharing or msi non-sharing.

It means, when msi sharing bit is 1, device doesn't want vector per 
queue


(it wants msi vector sharing as name) and doesn't want a high 
interrupt rate.


So driver turns to !per_vq_vectors and has to do dynamical mapping.

So they are opposite not superset.

Thanks!

Jing



I think you need add more comments on the command.

E.g if I want to map vector 0 to queue 1, how do I need to do?

write(1, queue_sel);
write(0, vector_sel);


That's true. Besides, two commands are used for msi sharing mode,

VIRTIO_MMIO_MSI_CMD_MAP_CONFIG and VIRTIO_MMIO_MSI_CMD_MAP_QUEUE.

"To set up the event and vector mapping for MSI sharing mode, driver 
SHOULD write a valid MsiVecSel followed by 
VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE command to 
map the configuration change/selected queue events respectively.  " (See 
spec patch 5/5)


So if driver detects the msi sharing mode, when it does setup vq, writes 
the queue_sel (this already exists in setup vq), vector sel and then 
MAP_QUEUE command to do the queue event mapping.


For msi non-sharing mode, no special action is needed because we make 
the rule of per_vq_vector and fixed relationship.


Correct me if this is not that clear for spec/code comments.

Thanks!

Jing




?

Thanks






Thanks



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org







Re: [virtio-dev] Re: [PATCH v2 4/5] virtio-mmio: add MSI interrupt feature support

2020-02-10 Thread Liu, Jing2



On 2/11/2020 11:17 AM, Jason Wang wrote:


On 2020/2/10 下午5:05, Zha Bin wrote:

From: Liu Jiang

Userspace VMMs (e.g. Qemu microvm, Firecracker) take advantage of using
virtio over mmio devices as a lightweight machine model for modern
cloud. The standard virtio over MMIO transport layer only supports one
legacy interrupt, which is much heavier than virtio over PCI transport
layer using MSI. Legacy interrupt has long work path and causes specific
VMExits in following cases, which would considerably slow down the
performance:

1) read interrupt status register
2) update interrupt status register
3) write IOAPIC EOI register

We proposed to add MSI support for virtio over MMIO via new feature
bit VIRTIO_F_MMIO_MSI[1] which increases the interrupt performance.

With the VIRTIO_F_MMIO_MSI feature bit supported, the virtio-mmio MSI
uses msi_sharing[1] to indicate the event and vector mapping.
Bit 1 is 0: device uses non-sharing and fixed vector per event mapping.
Bit 1 is 1: device uses sharing mode and dynamic mapping.



I believe dynamic mapping should cover the case of fixed vector?


Actually this bit *aims* for msi sharing or msi non-sharing.

It means, when msi sharing bit is 1, device doesn't want vector per queue

(it wants msi vector sharing as name) and doesn't want a high interrupt 
rate.


So driver turns to !per_vq_vectors and has to do dynamical mapping.

So they are opposite not superset.

Thanks!

Jing



Thanks



-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org





Re: [virtio-dev] [PATCH v2 5/5] virtio-mmio: MSI vector and event mapping

2020-02-10 Thread Liu, Jing2


On 1/29/2020 6:14 PM, Michael S. Tsirkin wrote:

On Tue, Jan 21, 2020 at 09:54:33PM +0800, Jing Liu wrote:

Bit 1 msi_sharing reported in the MsiState register indicates the mapping mode
device uses.

Bit 1 is 0 - device uses MSI non-sharing mode. This indicates vector per event 
and
fixed static vectors and events relationship. This fits for devices with a high 
interrupt
rate and best performance;
Bit 1 is 1 - device uses MSI sharing mode. This indicates vectors and events
dynamic mapping and fits for devices not requiring a high interrupt rate.


Thanks for reviewing! Sorry for the late reply.

For msi sharing/non-sharing mode, we are trying to define a rule that,

simply using 1 bit to indicate what device wants and what driver should 
do. Let me try to explain details.


If it's sharing mode (bit=1), it means device doesn't want vector per 
queue or a high interrupt rate.


Driver should meet such request.

So define that, device should support at least 2 msi vectors and driver 
should use


VIRTIO_MMIO_MSI_CMD_MAP_CONFIG/VIRTIO_MMIO_MSI_CMD_MAP_QUEUE commands

to map the configuration change/selected queue events.

In implementation, driver will detect the bit=1, turn to !per_vq_vector 
and map 1 vector for config


and 1 for all queues to meet the request. (Maybe we need restrict this 
in spec too.)



It seems that sharing mode is a superset of non-sharing mode.


I think sharing mode is not a superset, because driver should not map 
1:1 which will go against


the request from device that it doesn't want 1:1 (non-sharing mode).

In other words, device chooses the way it wants, a high interrupt rate 
with 1:1 or not.



Isn't that right? E.g. with sharing mode drivers
can still avoid sharing if they like.

Drivers should not avoid msi sharing if that bit is 1.

Maybe it should just be a hint to drivers whether to share
interrupts,

The sharing mode bit is the hint to indicate whether to share interrupts. :)

instead of a completely different layout?


Except the bit, no other different register layout is there for such 
feature.


Thanks!

Jing


diff --git a/msi-state.c b/msi-state.c
index b1fa0c1..d470be4 100644
--- a/msi-state.c
+++ b/msi-state.c
@@ -1,4 +1,5 @@
  le32 {
  msi_enabled : 1;
-reserved : 31;
+msi_sharing: 1;
+reserved : 30;
  };
--
2.7.4


-
To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org


Re: [Qemu-devel] [Issues] PCI hotplug does not work well on pc platform?

2019-02-17 Thread Liu, Jing2

Hi Marcel and Igor,

Thanks very much for your help!

On 2/14/2019 10:42 PM, Marcel Apfelbaum wrote:




[...]

I have two questions.
1. PCI hotplug on pci.0 must manually rescan in guest. The ACPI 
hotplug
handler sends the GPE event to guest but it seems guest doesn't 
receive
it? I tried to open ACPI debug level/layer to 0x, in 
order to
see if there is any message after device_add in monitor, but no 
message

comes out until I manually rescan. Also tried printk in
acpi_ev_gpe_xrupt_handler() and acpi_ev_sci_xrupt_handler(). No 
output

in dmesg.
(I'm sure that CONFIG_HOTPLUG_PCI_PCIE=y, CONFIG_HOTPLUG_PCI_CPCI=y,
CONFIG_HOTPLUG_PCI=y, CONFIG_HOTPLUG_PCI_ACPI=y)

What about |CONFIG_HOTPLUG_PCI_SHPC=y ?



Thanks and I finally found it is because I set hardware reduced acpi 
before and now it is correct.



|
Whether this is a kind of design or a known issue? Does guest 
receive

the request, where can I find the

does it work with known to work kernels (RHEL7)?

Also sharing used QEMU version and command line could help.
Is there any key config of kernel in guest, besides those I listed 
above?

Maybe Marcel knows something about it
(CCed)

May I ask why do you need SHPC hotplug and not the ACPI based hotplug?
Oh, it is some customer's requests. And the method to set PIIX4_PM 
property works.



Thanks!
Jing



Re: [Qemu-devel] [Issues] PCI hotplug does not work well on pc platform?

2019-02-12 Thread Liu, Jing2

Hi Igor,

Thanks for your reply!

On 2/5/2019 11:47 PM, Igor Mammedov wrote:

On Wed, 30 Jan 2019 21:02:10 +0800
"Liu, Jing2"  wrote:


Hi everyone,

I have two questions.
1. PCI hotplug on pci.0 must manually rescan in guest. The ACPI hotplug
handler sends the GPE event to guest but it seems guest doesn't receive
it? I tried to open ACPI debug level/layer to 0x, in order to
see if there is any message after device_add in monitor, but no message
comes out until I manually rescan. Also tried printk in
acpi_ev_gpe_xrupt_handler() and acpi_ev_sci_xrupt_handler(). No output
in dmesg.
(I'm sure that CONFIG_HOTPLUG_PCI_PCIE=y, CONFIG_HOTPLUG_PCI_CPCI=y,
CONFIG_HOTPLUG_PCI=y, CONFIG_HOTPLUG_PCI_ACPI=y)

Whether this is a kind of design or a known issue? Does guest receive
the request, where can I find the

does it work with known to work kernels (RHEL7)?

Also sharing used QEMU version and command line could help.


Is there any key config of kernel in guest, besides those I listed above?

I used guest kernel v4.18 and qemu upstream v3.1.0
Command line:
sudo /home/xxx/qemu/x86_64-softmmu/qemu-system-x86_64  \
-machine pc,accel=kvm,kernel_irqchip -cpu host -m 1G,slots=2,maxmem=10G \
-nographic -no-user-config -nodefaults -vga none \
-drive file=/home/xxx/image/clear-24690-kvm.img,if=virtio,format=raw \
 -smp sockets=1,cpus=4,cores=2,maxcpus=8 \
-device virtio-serial-pci,id=virtio-serial0,disable-modern,addr=0x5 \
-monitor tcp:0.0.0.0:5000,nowait,server \
-chardev stdio,id=charconsole0 -device 
virtconsole,chardev=charconsole0,id=console0  \

-kernel /home/xxx/linux-stable/arch/x86/boot/bzImage \
-append 'root=/dev/vda3 rw rootfstype=ext4 data=ordered 
rcupdate.rcu_expedited=1 pci=lastbus=0 pci=realloc=on tsc=reliable 
no_timer_check reboot=t noapictimer console=hvc0 iommu=off  panic=1 
initcall_debug acpi.debug_layer=0x4 acpi.debug_level=4 ' \

-device pci-bridge,bus=pci.0,id=br1,chassis_nr=1,shpc=on,addr=6  \



2. I want to try hotplugging on pci-bridge on pc platform, using shpc. I
set shpc=on, but when I do device_add, qemu still calls
acpi_pcihp_device_plug_cb? Why it does not call pci_bridge_dev_hotplug_cb?
(CONFIG_HOTPLUG_PCI_SHPC=y)


try to disable ACPI hotplug for bridges
  -global PIIX4_PM.acpi-pci-hotplug-with-bridge-support=off


I'll try it!

Thanks!
Jing






[Qemu-devel] [Issues] PCI hotplug does not work well on pc platform?

2019-01-30 Thread Liu, Jing2

Hi everyone,

I have two questions.
1. PCI hotplug on pci.0 must manually rescan in guest. The ACPI hotplug 
handler sends the GPE event to guest but it seems guest doesn't receive 
it? I tried to open ACPI debug level/layer to 0x, in order to 
see if there is any message after device_add in monitor, but no message 
comes out until I manually rescan. Also tried printk in 
acpi_ev_gpe_xrupt_handler() and acpi_ev_sci_xrupt_handler(). No output 
in dmesg.
(I'm sure that CONFIG_HOTPLUG_PCI_PCIE=y, CONFIG_HOTPLUG_PCI_CPCI=y, 
CONFIG_HOTPLUG_PCI=y, CONFIG_HOTPLUG_PCI_ACPI=y)


Whether this is a kind of design or a known issue? Does guest receive 
the request, where can I find the


2. I want to try hotplugging on pci-bridge on pc platform, using shpc. I 
set shpc=on, but when I do device_add, qemu still calls 
acpi_pcihp_device_plug_cb? Why it does not call pci_bridge_dev_hotplug_cb?

(CONFIG_HOTPLUG_PCI_SHPC=y)

Thanks very much!
Jing



[Qemu-devel] [Question] Hotplug rescan/remove by manual

2018-11-28 Thread Liu, Jing2

Hi guys,
When I tested hotplug on pci.0 on pc platform, it seems that we have to
echo 1 > /sys/bus/pci/rescan manually in guest? Is this a non-complete 
feature because of something like gpe interrupt issue?

For hot-unplug, I uses echo 1 > /sys/bus/pci//remove but this
seems only remove the device in guest but we still can see it in qemu 
monitor by "info pci". I guess this is not a right way to unplug. I'm

not sure if I missed something?

Looking forwards to your response.
Thanks!

Jing



Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-09-19 Thread Liu, Jing2

Hi Marcle and Michael,

Re-ping if this is not in the PR list :)

Thanks,
Jing

On 9/6/2018 10:16 AM, Liu, Jing2 wrote:

Hi Marcle,

On 9/6/2018 12:36 AM, Marcel Apfelbaum wrote:



On 09/05/2018 05:08 AM, Liu, Jing2 wrote:

Hi Marcel and Michael,

Got no response so I would like to ask if I need do something more for
this serial? :)



Hi Jing,

Maybe Michael is PTO, let's wait a few more days.


Thank you, I got it :)

BRs,
Jing

Michael, I can send a pull request for this series if you are busy.
Thanks,
Marcel


Thanks,
Jing

On 8/30/2018 10:58 AM, Liu, Jing2 wrote:

Ping Michael :)

Thanks,
Jing

On 8/25/2018 12:51 AM, Marcel Apfelbaum wrote:

On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)

[...]

Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!


Please ping Michael if your code is not merged in a week or so.


BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/ 



I plan to have  a look this weekend.

Thanks,
Marcel











Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-09-05 Thread Liu, Jing2

Hi Marcle,

On 9/6/2018 12:36 AM, Marcel Apfelbaum wrote:



On 09/05/2018 05:08 AM, Liu, Jing2 wrote:

Hi Marcel and Michael,

Got no response so I would like to ask if I need do something more for
this serial? :)



Hi Jing,

Maybe Michael is PTO, let's wait a few more days.


Thank you, I got it :)

BRs,
Jing

Michael, I can send a pull request for this series if you are busy.
Thanks,
Marcel


Thanks,
Jing

On 8/30/2018 10:58 AM, Liu, Jing2 wrote:

Ping Michael :)

Thanks,
Jing

On 8/25/2018 12:51 AM, Marcel Apfelbaum wrote:

On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)

[...]

Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!


Please ping Michael if your code is not merged in a week or so.


BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/ 



I plan to have  a look this weekend.

Thanks,
Marcel









Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-09-04 Thread Liu, Jing2

Hi Marcel and Michael,

Got no response so I would like to ask if I need do something more for
this serial? :)

Thanks,
Jing

On 8/30/2018 10:58 AM, Liu, Jing2 wrote:

Ping Michael :)

Thanks,
Jing

On 8/25/2018 12:51 AM, Marcel Apfelbaum wrote:

On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)

[...]

Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!


Please ping Michael if your code is not merged in a week or so.


BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/ 



I plan to have  a look this weekend.

Thanks,
Marcel






Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-08-29 Thread Liu, Jing2

Ping Michael :)

Thanks,
Jing

On 8/25/2018 12:51 AM, Marcel Apfelbaum wrote:

On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)

[...]

Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!


Please ping Michael if your code is not merged in a week or so.


BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/ 



I plan to have  a look this weekend.

Thanks,
Marcel




Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-08-26 Thread Liu, Jing2

Hi Marcel,

On 8/25/2018 12:51 AM, Marcel Apfelbaum wrote:

On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)


[...]


Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!


Please ping Michael if your code is not merged in a week or so.


Got it, I will check by "git pull" to see if master branch has that.


BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/ 



I plan to have  a look this weekend.


Glad to see your comments!

Thanks very much,
Jing


Thanks,
Marcel


Jing

Thanks,
Marcel


Thanks,
Jing


Thanks,
Marcel











Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-08-23 Thread Liu, Jing2

Hi Marcel,

On 8/22/2018 2:58 PM, Marcel Apfelbaum wrote:

Hi Jing,

On 08/22/2018 04:53 AM, Liu, Jing2 wrote:

Hi Marcel,

On 8/21/2018 5:59 PM, Marcel Apfelbaum wrote:



On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)


[...]


Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?



You just need to wait until Michael adds the series
to his next pull request. Nothing more to do.


OK, got it! Thanks!
BTW, do you have some suggestion on the seabios counterpart patches?
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/


Jing

Thanks,
Marcel


Thanks,
Jing


Thanks,
Marcel








Re: [Qemu-devel] [PATCH v3 2/2] hw/pci: add PCI resource reserve capability to legacy PCI bridge

2018-08-21 Thread Liu, Jing2

Hi Marcel,

On 8/21/2018 5:59 PM, Marcel Apfelbaum wrote:



On 08/21/2018 06:18 AM, Jing Liu wrote:

Add hint to firmware (e.g. SeaBIOS) to reserve addtional
BUS/IO/MEM/PREF resource for legacy pci-pci bridge. Add the
resource reserve capability deleting in pci_bridge_dev_exitfn.

Signed-off-by: Jing Liu 
---
  hw/pci-bridge/pci_bridge_dev.c | 24 
  1 file changed, 24 insertions(+)


[...]


Reviewed-by: Marcel Apfelbaum


Thanks for the quick reviewing and feedback.
So could I ask what I should do now, update a new version
with your rb or just waiting for pushing, or else?

Thanks,
Jing


Thanks,
Marcel





Re: [Qemu-devel] [PATCH v2 2/3] hw/pci: add teardown function for PCI resource reserve capability

2018-08-20 Thread Liu, Jing2



Hi Marcel,

On 8/20/2018 9:38 PM, Marcel Apfelbaum wrote:

Hi Jing,

On 08/20/2018 05:58 AM, Liu, Jing2 wrote:

Hi Marcel,

On 8/18/2018 12:10 AM, Marcel Apfelbaum wrote:

Hi Jing,

On 08/16/2018 12:28 PM, Jing Liu wrote:

Clean up the PCI config space of resource reserve capability.

Signed-off-by: Jing Liu 
---
  hw/pci/pci_bridge.c | 9 +
  include/hw/pci/pci_bridge.h | 1 +
  2 files changed, 10 insertions(+)

diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 15b055e..dbcee90 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -465,6 +465,15 @@ int pci_bridge_qemu_reserve_cap_init(PCIDevice 
*dev, int cap_offset,

  return 0;
  }
+void pci_bridge_qemu_reserve_cap_uninit(PCIDevice *dev)
+{
+    uint8_t pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+
+    pci_del_capability(dev, PCI_CAP_ID_VNDR, 
sizeof(PCIBridgeQemuCap));


I think that you only need to call pci_del_capability,


+    memset(dev->config + pos + PCI_CAP_FLAGS, 0,
+   sizeof(PCIBridgeQemuCap) - PCI_CAP_FLAGS);
+}


... no need for the above line. The reason is pci_del_capability
will "unlink" the capability, and even if the data remains in
the configuration space array, it will not be used.


I think I got it: pci_del_capability "unlink" by set the tag
pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
so that pdev->config will not be used, right?


If is the latest capability in the list, yes.
Otherwise it will simply link 'prev' with 'next' using config array 
offsets.

I got it! Thanks very much for the details!

Jing



Thanks,
Marcel




Do you agree? If yes, just call pci_del_capability and you don't need
this patch.


Yup, I agree with you. And let me remove this patch in next version.

Thanks,
Jing



Thanks,
Marcel



[...]






Re: [Qemu-devel] [PATCH v2 1/3] hw/pci: factor PCI reserve resources to a separate structure

2018-08-19 Thread Liu, Jing2

Hi Marcel,

On 8/17/2018 11:49 PM, Marcel Apfelbaum wrote:

Hi Jing,


[...]

+/*
+ * additional resources to reserve on firmware init
+ */
+typedef struct PCIResReserve {
+    uint32_t bus_reserve;
+    uint64_t io_reserve;
+    uint64_t mem_reserve;


The patch looks good to me, I noticed you renamed
'mem_no_pref_reserve' to 'mem reserve'.
I remember we had a lot of discussions about the  naming, so they would
be clear and consistent with the firmware counterpart.


OK, will change 'mem_no_pref_reserve' to 'mem_no_pref' and also for
others.

Please add a least a comment in the PCIResReserve.

Will add a comment to the structure definition, and where it's called.


Also, since you encapsulated the fields into a new struct,
you could remove the  "_reserve" suffix so we
remain with clear "bus", "io", "mem" ...


Got it.

Thanks,
Jing


Thanks,
Marcel


+    uint64_t pref32_reserve;
+    uint64_t pref64_reserve;
+} PCIResReserve;
+
  int pci_bridge_qemu_reserve_cap_init(PCIDevice *dev, int cap_offset,
-  uint32_t bus_reserve, uint64_t io_reserve,
-  uint64_t mem_non_pref_reserve,
-  uint64_t mem_pref_32_reserve,
-  uint64_t mem_pref_64_reserve,
-  Error **errp);
+   PCIResReserve res_reserve, Error **errp);
  #endif /* QEMU_PCI_BRIDGE_H */






Re: [Qemu-devel] [PATCH v2 2/3] hw/pci: add teardown function for PCI resource reserve capability

2018-08-19 Thread Liu, Jing2

Hi Marcel,

On 8/18/2018 12:10 AM, Marcel Apfelbaum wrote:

Hi Jing,

On 08/16/2018 12:28 PM, Jing Liu wrote:

Clean up the PCI config space of resource reserve capability.

Signed-off-by: Jing Liu 
---
  hw/pci/pci_bridge.c | 9 +
  include/hw/pci/pci_bridge.h | 1 +
  2 files changed, 10 insertions(+)

diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 15b055e..dbcee90 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -465,6 +465,15 @@ int pci_bridge_qemu_reserve_cap_init(PCIDevice 
*dev, int cap_offset,

  return 0;
  }
+void pci_bridge_qemu_reserve_cap_uninit(PCIDevice *dev)
+{
+    uint8_t pos = pci_find_capability(dev, PCI_CAP_ID_VNDR);
+
+    pci_del_capability(dev, PCI_CAP_ID_VNDR, sizeof(PCIBridgeQemuCap));


I think that you only need to call pci_del_capability,


+    memset(dev->config + pos + PCI_CAP_FLAGS, 0,
+   sizeof(PCIBridgeQemuCap) - PCI_CAP_FLAGS);
+}


... no need for the above line. The reason is pci_del_capability
will "unlink" the capability, and even if the data remains in
the configuration space array, it will not be used.


I think I got it: pci_del_capability "unlink" by set the tag
pdev->config[PCI_STATUS] &= ~PCI_STATUS_CAP_LIST;
so that pdev->config will not be used, right?


Do you agree? If yes, just call pci_del_capability and you don't need
this patch.


Yup, I agree with you. And let me remove this patch in next version.

Thanks,
Jing



Thanks,
Marcel



[...]



Re: [Qemu-devel] [PATCH v2 0/3] hw/pci: PCI resource reserve capability

2018-08-19 Thread Liu, Jing2

Hi Marcel,

On 8/18/2018 12:18 AM, Marcel Apfelbaum wrote:

Hi Jing,

On 08/16/2018 12:28 PM, Jing Liu wrote:

This patch serial is about PCI resource reserve capability.

First patch refactors the resource reserve fields in GenPCIERoorPort
structure out to another new structure, called "PCIResReserve". Modify
the parameter list of pci_bridge_qemu_reserve_cap_init().

Then we add the teardown function called 
pci_bridge_qemu_reserve_cap_uninit().


Last we enable the resource reserve capability for legacy PCI bridge
so that firmware can reserve additional resources for the bridge.


The series looks good to me, please see some minor comments
in the patches.


Thanks very much for your reviewing. I will improve the
codes and send new version then.


Can you please point me to the SeaBIOS / OVMF counterpart?


Sure. SeaBIOS patch serial is here:
https://patchew.org/Seabios/1534386737-8131-1-git-send-email-jing2@linux.intel.com/

Thanks,
Jing


Thanks,
Marcel



Change Log:
v2 -> v1
* add refactoring patch
* add teardown function
* some other fixes

Jing Liu (3):
   hw/pci: factor PCI reserve resources to a separate structure
   hw/pci: add teardown function for PCI resource reserve capability
   hw/pci: add PCI resource reserve capability to legacy PCI bridge

  hw/pci-bridge/gen_pcie_root_port.c | 32 +-
  hw/pci-bridge/pci_bridge_dev.c | 25 
  hw/pci/pci_bridge.c    | 47 
+-

  include/hw/pci/pci_bridge.h    | 18 +++
  4 files changed, 80 insertions(+), 42 deletions(-)







Re: [Qemu-devel] [PATCH v2 0/3] hw/pci: PCI resource reserve capability

2018-08-16 Thread Liu, Jing2

Hi Laszlo,
Thanks very much for your reminder.
Looking forward to comments from all!

Thanks,
Jing

On 8/17/2018 12:17 AM, Laszlo Ersek wrote:

Hi,

On 08/16/18 11:28, Jing Liu wrote:

This patch serial is about PCI resource reserve capability.

First patch refactors the resource reserve fields in GenPCIERoorPort
structure out to another new structure, called "PCIResReserve". Modify
the parameter list of pci_bridge_qemu_reserve_cap_init().

Then we add the teardown function called pci_bridge_qemu_reserve_cap_uninit().

Last we enable the resource reserve capability for legacy PCI bridge
so that firmware can reserve additional resources for the bridge.

Change Log:
v2 -> v1
* add refactoring patch
* add teardown function
* some other fixes

Jing Liu (3):
   hw/pci: factor PCI reserve resources to a separate structure
   hw/pci: add teardown function for PCI resource reserve capability
   hw/pci: add PCI resource reserve capability to legacy PCI bridge

  hw/pci-bridge/gen_pcie_root_port.c | 32 +-
  hw/pci-bridge/pci_bridge_dev.c | 25 
  hw/pci/pci_bridge.c| 47 +-
  include/hw/pci/pci_bridge.h| 18 +++
  4 files changed, 80 insertions(+), 42 deletions(-)



just some meta comments for now:

- I've added Marcel; please keep him CC'd on this set (and the SeaBIOS
counterpart, [SeaBIOS] [PATCH v2 0/3] pci: resource reserve capability
found)

- my task queue has blown up and I'm unsure when I'll get to reviewing
this set. The same applies to the SeaBIOS counterpart.

This is just to say that you should feel free to go ahead and work with
the (sub)maintainers; I'll try to get back to this as time allows, but
don't wait for me.

If further versions are necessary, I'd appreciate being CC'd on those,
just so I know what to look at when I find the time again.

Thanks!
Laszlo





Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-16 Thread Liu, Jing2




On 8/14/2018 9:27 PM, Laszlo Ersek wrote:
[...]

   +cap_error:
+    msi_uninit(dev);


(4) This error handler doesn't look entirely correct; we can reach it
without having initialized MSI. (MSI setup is conditional; and even if
we attempt it, it is permitted to fail with "msi=auto".) Therefore,
msi_uninit() should be wrapped into "if (msi_present())", same as you
see in pci_bridge_dev_exitfn().


sure, can add a pre-check. But I don't understand why we need that,
msi_uninit() will check msi_present()?


Thanks for pointing that out.

So, I grepped the source code for msi_uninit(). It seems that only one
location gates it with msi_present() -- in pci_bridge_dev_exitfn() --,
while 13 other locations just call it unconditionally.

I don't know what the consistent approach is here. I'll have to defer to
the PCI maintainers; their taste will decide. Technically your code
looks correct, then.
Yes, I also grep that. I guess if we directly use msi_uninit, this is 
because msi_init is non-conditional too.
Anyway, I added check in my version two. BTW, other comments are 
included in v2.


Thanks,
Jing



Thanks
Laszlo





Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2




On 8/12/2018 3:11 PM, Marcel Apfelbaum wrote:
[...]


I don't know what to suggest for the new capability's
teardown, in pci_bridge_dev_exitfn()  -- should we just ignore it (as
suggested by this patch)?


No, we should remove it properly.

I think it is not considered a "big" issue since adding/removing PCI 
capabilities
is not an operation that deals with resources, we only edit an array 
(the config space)

that will not be used anyway if the device init sequence failed.

That does not mean the code should not be clean.

I just search some PCI_CAP_ID_* to see what they do for exitfn/uninit.
It's weird that some more capabilities aren't being deleted, like 
pci_ich9_uninit().


Thanks,
Jing



Thanks,
Marcel



Thanks
Laszlo







Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2

Hi Marcel,

On 8/12/2018 3:11 PM, Marcel Apfelbaum wrote:

Hi Laszlo,


[...]

  hw/pci-bridge/pci_bridge_dev.c | 20 
  1 file changed, 20 insertions(+)
+cap_error:
+    msi_uninit(dev);

(4) This error handler doesn't look entirely correct; we can reach it
without having initialized MSI. (MSI setup is conditional; and even if
we attempt it, it is permitted to fail with "msi=auto".) Therefore,
msi_uninit() should be wrapped into "if (msi_present())", same as you
see in pci_bridge_dev_exitfn().


You are right.  msi_present should be checked.


I looked at the codes calling this function. It need be added to be strong.
But could I ask why we need check twice? msi_unint() help check again.





  msi_error:
  slotid_cap_cleanup(dev);
  slotid_error:

I tried to understand the error handling a bit better. I'm confused.


[...]

Second, msi_uninit() and shpc_cleanup() are internally inconsistent
between each other. The former removes the respective capability from
config space -- with pci_del_capability() --,


Right


  but the latter only has a
comment, "/* TODO: cleanup config space changes? */".


But also disables the QEMU_PCI_CAP_SLOTID (but no code checks it anyway)
I agree it should call pci_del_capability to delete the SHPC capability,
maybe is some "debt" from early development stages.


  The same comment
is present in the slotid_cap_cleanup() function. Given this
inconsistency,


Here we also miss a call to pci_del_capability.


I don't know what to suggest for the new capability's
teardown, in pci_bridge_dev_exitfn()  

Aha, yes, the teardown needs to be added here.
Will add that in next version.

-- should we just ignore it (as

suggested by this patch)?


No, we should remove it properly.

I think it is not considered a "big" issue since adding/removing PCI 
capabilities
is not an operation that deals with resources, we only edit an array 
(the config space)

that will not be used anyway if the device init sequence failed.

That does not mean the code should not be clean.

Do we need set up another serial patches (separated from this
one) to add pci_del_capability() for slotid_cap_cleanup() and 
shpc_cleanup()?


Thanks,
Jing

[...]



Re: [Qemu-devel] [PATCH] hw/pci: add pci capability to pci-pci bridge

2018-08-14 Thread Liu, Jing2

Hi Laszlo,
Sorry for late reply. I missed these mails because of wrong filter.
And thanks very much for comments. My reply as belows.

On 8/7/2018 8:19 PM, Laszlo Ersek wrote:

On 08/07/18 09:04, Jing Liu wrote:

[...]

@@ -46,6 +46,12 @@ struct PCIBridgeDev {
  uint32_t flags;
  
  OnOffAuto msi;

+
+/* additional resources to reserve on firmware init */
+uint64_t io_reserve;
+uint64_t mem_reserve;
+uint64_t pref32_reserve;
+uint64_t pref64_reserve;
  };
  typedef struct PCIBridgeDev PCIBridgeDev;


(1) Please evaluate the following idea:


This is really good and I'm only not sure if DEFINE_PROP_ like,
DEFINE_PROP_UINT32("bus-reserve", GenPCIERootPort,
res_reserve.bus_reserve, -1), is a right/good way?
I mean the third parameter "_f".

[...]


- gen_rp_props should be updated accordingly. The property names should
stay the same, but they should reference fields of the "res_reserve" field.

(2) Then, in a separate patch, you can add another "res_reserve" field
to "PCIBridgeDev". (As a consequence, "bus_reserve" will also be added,
which I think is the right thing to do.)

I consider whether we need limit the bus_reserve of pci-pci bridge. For 
old codes, we could add "unlimited" numbers of devices (but less than 
than max) onto pci-pci bridge.



(3) There is a class that is derived from TYPE_PCI_BRIDGE_DEV:
TYPE_PCI_BRIDGE_SEAT_DEV. Is it correct to add the new properties /
fields to the latter as well?


Umm, I looked into the codes that doesn't have hotplug property?
So do we need add resource reserve for it?

  
@@ -95,6 +101,13 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error **errp)

  error_free(local_err);
  }
  
+err = pci_bridge_qemu_reserve_cap_init(dev, 0, 0,

+  bridge_dev->io_reserve, bridge_dev->mem_reserve,
+  bridge_dev->pref32_reserve, bridge_dev->pref64_reserve, errp);
+if (err) {
+goto cap_error;
+}
+
  if (shpc_present(dev)) {
  /* TODO: spec recommends using 64 bit prefetcheable BAR.
   * Check whether that works well. */
@@ -103,6 +116,8 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
**errp)
  }
  return;
  
+cap_error:

+msi_uninit(dev);


(4) This error handler doesn't look entirely correct; we can reach it
without having initialized MSI. (MSI setup is conditional; and even if
we attempt it, it is permitted to fail with "msi=auto".) Therefore,
msi_uninit() should be wrapped into "if (msi_present())", same as you
see in pci_bridge_dev_exitfn().


sure, can add a pre-check. But I don't understand why we need that,
msi_uninit() will check msi_present()?




  msi_error:
  slotid_cap_cleanup(dev);
  slotid_error:
@@ -162,6 +177,11 @@ static Property pci_bridge_dev_properties[] = {
  ON_OFF_AUTO_AUTO),
  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
  PCI_BRIDGE_DEV_F_SHPC_REQ, true),
+DEFINE_PROP_SIZE("io-reserve", PCIBridgeDev, io_reserve, -1),
+DEFINE_PROP_SIZE("mem-reserve", PCIBridgeDev, mem_reserve, -1),
+DEFINE_PROP_SIZE("pref32-reserve", PCIBridgeDev, pref32_reserve, -1),
+DEFINE_PROP_SIZE("pref64-reserve", PCIBridgeDev, pref64_reserve, -1),
+
  DEFINE_PROP_END_OF_LIST(),
  };
  



(5) Similarly to (2), this property list should cover "bus-reserve" too.
If we are exposing the same capability in PCI config space, then I think
we should let the user control all fields of it as well.

(6) Please clarify the capability ID in the subject line of the patch.

OK, will add the details.


Thanks very much.
Jing


For example:

   hw/pci: support resource reservation capability on "pci-bridge"

Thanks
Laszlo





Re: [Qemu-devel] [Consult] nfs-vsocks support

2018-04-15 Thread Liu, Jing2



On 4/4/2018 5:45 PM, Stefan Hajnoczi wrote:

On Wed, Mar 28, 2018 at 07:25:21PM +0800, Liu, Jing2 wrote:

On 3/26/2018 6:12 PM, Liu, Jing2 wrote:

Hi Stefan,

Thank you very much for the response! It truly gave me much help.

On 3/24/2018 12:02 AM, Stefan Hajnoczi wrote:

[...]

Could you give me some help and thanks in advance!


Here is a script that launches nfsd and runs a guest:
https://github.com/stefanha/linux/commit/38cbc15661a6dd44b69c4f318091f2047d707035#diff-05f3fe8941076453942a8c059b409009



Using the script, I met some problems making the
initramfs.gz. It confused me a lot... I don't have busybox under
/usr/bin or /usr/sbin. Where to get it?





It may be easier to use a full guest disk image instead of the disk
image generation in my go.sh script.  The initramfs is very specific to
the development machine I was using at the time.  The main thing to look
at the in the go.sh script is how the NFS services are launched on the
host.

Busybox is a package that is available on most Linux distributions.


I realized the rhel-7.x doesn't has busybox anymore. And yes, I used 
some other image to continue.



Before you spend more time on this, NFS over AF_VSOCK is only suitable
for developers right now.  It is not ready for production and I am not
supporting it for end users.  So if you want to hack on NFS, please go
ahead, but otherwise it's probably not what you want.

Yes, so far we didn't plan for production. And thank you very much for 
your kind response.


Jing


Stefan


(host)$./go.sh nfs_tcp

File /usr/sbin/busybox could not be opened for reading
  line 55
File /lib64/libtirpc.so.3 could not be opened for reading
  line 306
File nc-vsock could not be opened for reading
  line 321
File ../netperf-2.7.0/src/netserver could not be opened for reading
  line 325

It results in guest kernel_panic when gdb launches the guest:

[0.425738] Failed to execute /init (error -2)
[0.426074] Kernel panic - not syncing: No working init found.  Try
passing .
[0.427077] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0+ #3
[0.427507] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-4
[0.428354] Call Trace:
[0.428538]  dump_stack+0x63/0x89
[0.428774]  ? rest_init+0x60/0xc0
[0.429017]  panic+0xeb/0x245
[0.429225]  ? putname+0x53/0x60
[0.429455]  ? rest_init+0xc0/0xc0
[0.429722]  kernel_init+0xf1/0x104
[0.430017]  ret_from_fork+0x25/0x30

Looking forward to the kind reply.
BTW, for the following questions that I sent two days ago, I have some
ideas in mind and it could be ignored now.

Jing


I have updated both host and guest kernel with vsock-nfsd repo and
installed nfs-utils in both host and guest successfully.

There're two different manuals in
https://www.spinics.net/lists/linux-nfs/msg64563.html(Quickstart 3~7)
and nfs-utils (README step 3 DAEMON STARTUP ORDER).
Which one should be the best usage?

For the first quickstart manual step 7-Mount the export from the guest,
two questions as follows.
1. why hypervisor's CID is 2? I didn't notice when it is specified?
2. Though I did step 3~7 successfully, I can't see the synchronistical
changing on host folder /export/ and guest /mnt? So how to test the
communcation between host and guest on the folder?

Thanks again!

Jing

[...]


Stefan









Re: [Qemu-devel] [Consult] nfs-vsocks support

2018-03-28 Thread Liu, Jing2



On 3/26/2018 6:12 PM, Liu, Jing2 wrote:

Hi Stefan,

Thank you very much for the response! It truly gave me much help.

On 3/24/2018 12:02 AM, Stefan Hajnoczi wrote:

[...]

Could you give me some help and thanks in advance!


Here is a script that launches nfsd and runs a guest:
https://github.com/stefanha/linux/commit/38cbc15661a6dd44b69c4f318091f2047d707035#diff-05f3fe8941076453942a8c059b409009 



Using the script, I met some problems making the
initramfs.gz. It confused me a lot... I don't have busybox under
/usr/bin or /usr/sbin. Where to get it?

(host)$./go.sh nfs_tcp

File /usr/sbin/busybox could not be opened for reading
 line 55
File /lib64/libtirpc.so.3 could not be opened for reading
 line 306
File nc-vsock could not be opened for reading
 line 321
File ../netperf-2.7.0/src/netserver could not be opened for reading
 line 325

It results in guest kernel_panic when gdb launches the guest:

[0.425738] Failed to execute /init (error -2)
[0.426074] Kernel panic - not syncing: No working init found.  Try 
passing .

[0.427077] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0+ #3
[0.427507] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
BIOS rel-4

[0.428354] Call Trace:
[0.428538]  dump_stack+0x63/0x89
[0.428774]  ? rest_init+0x60/0xc0
[0.429017]  panic+0xeb/0x245
[0.429225]  ? putname+0x53/0x60
[0.429455]  ? rest_init+0xc0/0xc0
[0.429722]  kernel_init+0xf1/0x104
[0.430017]  ret_from_fork+0x25/0x30

Looking forward to the kind reply.
BTW, for the following questions that I sent two days ago, I have some
ideas in mind and it could be ignored now.

Jing

I have updated both host and guest kernel with vsock-nfsd repo and 
installed nfs-utils in both host and guest successfully.


There're two different manuals in 
https://www.spinics.net/lists/linux-nfs/msg64563.html(Quickstart 3~7) 
and nfs-utils (README step 3 DAEMON STARTUP ORDER).

Which one should be the best usage?

For the first quickstart manual step 7-Mount the export from the guest, 
two questions as follows.

1. why hypervisor's CID is 2? I didn't notice when it is specified?
2. Though I did step 3~7 successfully, I can't see the synchronistical 
changing on host folder /export/ and guest /mnt? So how to test the 
communcation between host and guest on the folder?


Thanks again!

Jing

[...]


Stefan







Re: [Qemu-devel] [Consult] nfs-vsocks support

2018-03-26 Thread Liu, Jing2

Hi Stefan,

Thank you very much for the response! It truly gave me much help.

On 3/24/2018 12:02 AM, Stefan Hajnoczi wrote:

On Fri, Mar 23, 2018 at 05:54:53PM +0800, Liu, Jing2 wrote:

I am currently trying to use nfs-vsocks on x86 for vitural machine
filesystem by some manuals on
https://www.spinics.net/lists/linux-nfs/msg64563.html
and https://lwn.net/Articles/647516/


[...]



3. For the 5th step (Start nfsd), I'd like to know the rpc.nfsd --vsock is
enabled by which codes? Is it the CONFIG_VHOST_VSOCK kernel module in host?
Because I installed the kernel with vsock-nfs repo with the specified config
options but failed to do rpc.nfsd --vsock.


Could you give me some help and thanks in advance!


Here is a script that launches nfsd and runs a guest:
https://github.com/stefanha/linux/commit/38cbc15661a6dd44b69c4f318091f2047d707035#diff-05f3fe8941076453942a8c059b409009


I have updated both host and guest kernel with vsock-nfsd repo and 
installed nfs-utils in both host and guest successfully.


There're two different manuals in 
https://www.spinics.net/lists/linux-nfs/msg64563.html(Quickstart 3~7) 
and nfs-utils (README step 3 DAEMON STARTUP ORDER).

Which one should be the best usage?

For the first quickstart manual step 7-Mount the export from the guest, 
two questions as follows.

1. why hypervisor's CID is 2? I didn't notice when it is specified?
2. Though I did step 3~7 successfully, I can't see the synchronistical 
changing on host folder /export/ and guest /mnt? So how to test the 
communcation between host and guest on the folder?


Thanks again!

Jing



Regarding rpc.nfsd, on my Fedora 27 system rfc.nfsd is launched by
/usr/lib/systemd/system/nfs.service.  When developing NFS over AF_VSOCK
I disable the NFS service and launch nfsd manually from the script
above.

Stefan





[Qemu-devel] [Consult] nfs-vsocks support

2018-03-23 Thread Liu, Jing2

Hello,

I am currently trying to use nfs-vsocks on x86 for vitural machine 
filesystem by some manuals on 
https://www.spinics.net/lists/linux-nfs/msg64563.html

and https://lwn.net/Articles/647516/

It tells the quickstart steps with the following codes but I got some 
problems as listed.

 * Linux kernel: https://github.com/stefanha/linux.git vsock-nfs
 * QEMU virtio-vsock device: https://github.com/stefanha/qemu.git vsock
 * nfs-utils vsock: https://github.com/stefanha/nfs-utils.git vsock

1. I would like to ask what is the current status of these codes? If 
there're some updates? :)
2. nfs-utils codes could not compile for me... I refered to the README 
but failed for both ways. It seems no configure script and 
autoheader/automake... So I don't know how to install this.
3. For the 5th step (Start nfsd), I'd like to know the rpc.nfsd --vsock 
is enabled by which codes? Is it the CONFIG_VHOST_VSOCK kernel module in 
host? Because I installed the kernel with vsock-nfs repo with the 
specified config options but failed to do rpc.nfsd --vsock.



Could you give me some help and thanks in advance!

BRs,
Jing Liu