from:"Venu Busireddy"

Re: [PATCH v4] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-11 Thread Venu Busireddy

On 2022-10-11 12:34:56 +0200, Paolo Bonzini wrote:
> Queued, thanks.

Thank you!

Venu

> 
> Paolo
>

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-07 Thread Venu Busireddy

On 2022-10-07 06:55:15 -0400, Paolo Bonzini wrote:
> Il gio 6 ott 2022, 15:25 Venu Busireddy  ha
> scritto:
> 
> > I do see that the Solaris driver does send the 0x1a command during
> > the initialization, perhaps (?) seeking the value of UA_INTLCK_CTRL.
> > Since QEMU currently does not support it, QEMU sends back a
> > key/asc/ascq=0x05/0x24/0x00 response, indicating that 0x1a is an Illegal
> > Request.
> 
> 
> What is your QEMU command line and what is the full CDB (apart from 0x1a)?
> 
> I am assuming that the Solaris driver does not handle that
> > response well (I still don't have access to the source code to verify
> > that), confuses itself about the value of UA_INTLCK_CTRL, and hence does
> > not handle the response to the REPORT_LUNS command correctly.
> 
> 
> No this has nothing to do with what's happening. The most likely reason for
> the bug IMO is simple: the event is causing the driver to send the REPORT
> LUNS command, but it does so in a way that does not handle the unit
> attention when it is reported.

I had a developer with access to the Solaris code review how the response
to REPORT_LUNS is being handled. And they do see that the response to
REPORT_LUNS is mishandled.

With the fix proposed in v4, and fixing the handling of REPORT_LUNS
on the Solaris side, we believe we will have a complete working
solution. Therefore, I believe we can conclude this thread on v3.
Do you agree?

Venu

> 
> Maybe the
> > Solaris driver assumes that QEMU will retain the unit attention condition
> > (UA_INTLCK_CTRL = 10b?), and will respond with a REPORTED_LUNS_CHANGED
> > for a subsequent command?
> >
> > Based on your confirmation that we want to handle the REPORT_LUNS command
> > as if UA_INTLCK_CTRL is set to 0, I will proceed with the assumption
> > that the Solaris driver is at fault, and will work with the Solaris
> > driver folks.
> >
> > In the meantime, as you suggested, I will post v4 with the bus unit
> > attention mechanism implemented. We still need that.
> >
> > Venu
> >
> >

[PATCH v4] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-06 Thread Venu Busireddy

Section 5.6.6.3 of VirtIO specification states, "Events will also
be reported via sense codes..." However, no sense data is sent when
VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
are reported (when disk hotplug/hotunplug events occur). SCSI layer
on Solaris depends on this sense data, and hence does not handle disk
hotplug/hotunplug events.

When the disk inventory changes, use the bus unit attention mechanism
to return a CHECK_CONDITION status with sense data of 0x06/0x3F/0x0E
(sense code REPORTED_LUNS_CHANGED).

Signed-off-by: Venu Busireddy 

v3 -> v4:
- As suggested by Paolo Bonzini, use the bus unit attention mechanism
  instead of the device unit attention mechanism.

v2 -> v3:
- Implement the suggestion from Paolo Bonzini .

v1 -> v2:
- Send the sense data for VIRTIO_SCSI_EVT_RESET_REMOVED event too.
---
 hw/scsi/scsi-bus.c | 18 ++
 hw/scsi/virtio-scsi.c  |  2 ++
 include/hw/scsi/scsi.h |  1 +
 3 files changed, 21 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4403717c4aaf..ceceafb2cdf3 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -1616,6 +1616,24 @@ static int scsi_ua_precedence(SCSISense sense)
 return (sense.asc << 8) | sense.ascq;
 }
 
+void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense)
+{
+int prec1, prec2;
+if (sense.key != UNIT_ATTENTION) {
+return;
+}
+
+/*
+ * Override a pre-existing unit attention condition, except for a more
+ * important reset condition.
+ */
+prec1 = scsi_ua_precedence(bus->unit_attention);
+prec2 = scsi_ua_precedence(sense);
+if (prec2 < prec1) {
+bus->unit_attention = sense;
+}
+}
+
 void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense)
 {
 int prec1, prec2;
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 41f2a5630173..cf2721aa46c0 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -956,6 +956,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_RESCAN);
+scsi_bus_set_ua(>bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
 virtio_scsi_release(s);
 }
 }
@@ -973,6 +974,7 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_REMOVED);
+scsi_bus_set_ua(>bus, SENSE_CODE(REPORTED_LUNS_CHANGED));
 virtio_scsi_release(s);
 }
 
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 001103488c23..3b1b3d278ee8 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -186,6 +186,7 @@ SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, 
BlockBackend *blk,
   BlockdevOnError rerror,
   BlockdevOnError werror,
   const char *serial, Error **errp);
+void scsi_bus_set_ua(SCSIBus *bus, SCSISense sense);
 void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
 void scsi_legacy_handle_cmdline(void);

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-06 Thread Venu Busireddy

On 2022-10-05 23:37:33 +0200, Paolo Bonzini wrote:
> On 10/4/22 01:13, Venu Busireddy wrote:
> > > script? Something must be putting the SCSI command in the queue.
> > > Perhaps the driver is doing so when it sees an event? And if it is
> > > bypassing the normal submission mechanism, the REPORT LUNS commands is
> > > hidden in scsitrac; that in turn retruns a unit attention and steals
> > 
> > While SAM does say "if a REPORT LUNS command enters the enabled command
> > state, the device server shall process the REPORT LUNS command and shall
> > not report any unit attention condition;," it also says that the unit
> > attention condition will not be cleared if the UA_INTLCK_CTRL is set to
> > 10b or 11b in the "Control mode page."
> > 
> > It doesn't appear to me that virtio-scsi supports "Control mode pages."
> > Does it? If it doesn't, is the expected handling of REPORT LUNS command
> > be same as the case of UA_INTLCK_CTRL being set to 00b?
> 
> In QEMU, all HBAs except for esp.c and lsi53c895a.c support autosense. As in
> the comment below, 00b is the right value for virtio-scsi.
> 
> The code to build the 0Ah (control) mode page would be in scsi-disk.c for
> example.  Nobody ever wrote it because the values mentioned in the comment
> below (00b if HBA supports autosense and therefore calls scsi_req_get_sense;
> 10b for HBAs with no autosense, typically very old emulated parallel-SCSI
> hardware) are the ones that make the most sense and OSes will just assume
> them.
> 
> 00b is also the default UA_INTLCK_CTRL value, so the mode page is not needed
> at all for virtio-scsi.

I do see that the Solaris driver does send the 0x1a command during
the initialization, perhaps (?) seeking the value of UA_INTLCK_CTRL.
Since QEMU currently does not support it, QEMU sends back a
key/asc/ascq=0x05/0x24/0x00 response, indicating that 0x1a is an Illegal
Request. I am assuming that the Solaris driver does not handle that
response well (I still don't have access to the source code to verify
that), confuses itself about the value of UA_INTLCK_CTRL, and hence does
not handle the response to the REPORT_LUNS command correctly. Maybe the
Solaris driver assumes that QEMU will retain the unit attention condition
(UA_INTLCK_CTRL = 10b?), and will respond with a REPORTED_LUNS_CHANGED
for a subsequent command?

Based on your confirmation that we want to handle the REPORT_LUNS command
as if UA_INTLCK_CTRL is set to 0, I will proceed with the assumption
that the Solaris driver is at fault, and will work with the Solaris
driver folks.

In the meantime, as you suggested, I will post v4 with the bus unit
attention mechanism implemented. We still need that.

Venu

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-03 Thread Venu Busireddy

On 2022-10-03 18:13:06 -0500, Venu Busireddy wrote:
> On 2022-09-30 18:25:48 +0200, Paolo Bonzini wrote:
> > On Fri, Sep 30, 2022 at 4:42 PM Venu Busireddy
> >  wrote:
> > > > > Immediately after a hotunplug event, qemu (without any action from
> > > > > the guest) processes a REPORT_LUNS command on the lun 0 of the device
> > > > > (haven't figured out what causes this).
> > > >
> > > > There is only one call to virtio_scsi_handle_cmd_req_prepare and it
> > > > takes the command from the guest, are you sure it is without any
> > > > action from the guest?
> > >
> > > I am sure, based on what I am observing. I am running the scsitrace
> > > (scsitrace -n vtioscsi -v) command on the Solaris guest, and I see no
> > > output there.
> > 
> > Do you have the sources to the driver and/or to the scsitrace dtrace
> 
> I do not have access to the source code. I am working on gaining access.
> 
> > script? Something must be putting the SCSI command in the queue.
> > Perhaps the driver is doing so when it sees an event? And if it is
> > bypassing the normal submission mechanism, the REPORT LUNS commands is
> > hidden in scsitrac; that in turn retruns a unit attention and steals
> 
> While SAM does say "if a REPORT LUNS command enters the enabled command
> state, the device server shall process the REPORT LUNS command and shall
> not report any unit attention condition;," it also says that the unit
> attention condition will not be cleared if the UA_INTLCK_CTRL is set to
> 10b or 11b in the "Control mode page."
> 
> It doesn't appear to me that virtio-scsi supports "Control mode pages."

Just to clarify, I am referring the mode pages with page code 0x0a (and
any subpage codes).

> Does it? If it doesn't, is the expected handling of REPORT LUNS command
> be same as the case of UA_INTLCK_CTRL being set to 00b?
> 
> And while trying to understand this, and reading the code regarding
> the handling of UA_INTLCK_CTRL, I ran across the following comment in
> scsi_req_get_sense():
> 
> /*
>  * FIXME: clearing unit attention conditions upon autosense should be done
>  * only if the UA_INTLCK_CTRL field in the Control mode page is set to 00b
>  * (SAM-5, 5.14).
>  *
>  * We assume UA_INTLCK_CTRL to be 00b for HBAs that support autosense, and
>  * 10b for HBAs that do not support it (do not call scsi_req_get_sense).
>  * Here we handle unit attention clearing for UA_INTLCK_CTRL == 00b.
>  */
> 
> If virtio-scsi doesn't support "Control mode pages," why does the above
> comment even say "assume UA_INTLCK_CTRL to be 00b" or address the case
> of 10b? Also, other than the reference to it in the above comment,
> UA_INTLCK_CTRL is not used anywhere else in the code. This comment
> confused me. Is the comment just wrong, or am I missing something? I am
> just trying to understand this better so that I am better prepared when
> the client driver folks start asking me questions about the qemu support.
> 
> Venu
> 
> > it from the other commands such as TEST UNIT READY, but that's a guest
> > driver bug.
> > 
> > But QEMU cannot just return the unit attention twice. I would start
> > with the patch to use the bus unit attention mechanism. It would be
> > even better to have two unit tests that check the behavior prescribed
> > by the standard: 1) UNIT ATTENTION from TEST UNIT READY immediately
> > after a hotunplug notification; 2) no UNIT ATTENTION from REPORT LUNS
> > and also no UNIT ATTENTION from a subsequent TEST UNIT READY command.
> > Debugging the guest is a separate step.

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-10-03 Thread Venu Busireddy

On 2022-09-30 18:25:48 +0200, Paolo Bonzini wrote:
> On Fri, Sep 30, 2022 at 4:42 PM Venu Busireddy
>  wrote:
> > > > Immediately after a hotunplug event, qemu (without any action from
> > > > the guest) processes a REPORT_LUNS command on the lun 0 of the device
> > > > (haven't figured out what causes this).
> > >
> > > There is only one call to virtio_scsi_handle_cmd_req_prepare and it
> > > takes the command from the guest, are you sure it is without any
> > > action from the guest?
> >
> > I am sure, based on what I am observing. I am running the scsitrace
> > (scsitrace -n vtioscsi -v) command on the Solaris guest, and I see no
> > output there.
> 
> Do you have the sources to the driver and/or to the scsitrace dtrace

I do not have access to the source code. I am working on gaining access.

> script? Something must be putting the SCSI command in the queue.
> Perhaps the driver is doing so when it sees an event? And if it is
> bypassing the normal submission mechanism, the REPORT LUNS commands is
> hidden in scsitrac; that in turn retruns a unit attention and steals

While SAM does say "if a REPORT LUNS command enters the enabled command
state, the device server shall process the REPORT LUNS command and shall
not report any unit attention condition;," it also says that the unit
attention condition will not be cleared if the UA_INTLCK_CTRL is set to
10b or 11b in the "Control mode page."

It doesn't appear to me that virtio-scsi supports "Control mode pages."
Does it? If it doesn't, is the expected handling of REPORT LUNS command
be same as the case of UA_INTLCK_CTRL being set to 00b?

And while trying to understand this, and reading the code regarding
the handling of UA_INTLCK_CTRL, I ran across the following comment in
scsi_req_get_sense():

/*
 * FIXME: clearing unit attention conditions upon autosense should be done
 * only if the UA_INTLCK_CTRL field in the Control mode page is set to 00b
 * (SAM-5, 5.14).
 *
 * We assume UA_INTLCK_CTRL to be 00b for HBAs that support autosense, and
 * 10b for HBAs that do not support it (do not call scsi_req_get_sense).
 * Here we handle unit attention clearing for UA_INTLCK_CTRL == 00b.
 */

If virtio-scsi doesn't support "Control mode pages," why does the above
comment even say "assume UA_INTLCK_CTRL to be 00b" or address the case
of 10b? Also, other than the reference to it in the above comment,
UA_INTLCK_CTRL is not used anywhere else in the code. This comment
confused me. Is the comment just wrong, or am I missing something? I am
just trying to understand this better so that I am better prepared when
the client driver folks start asking me questions about the qemu support.

Venu

> it from the other commands such as TEST UNIT READY, but that's a guest
> driver bug.
> 
> But QEMU cannot just return the unit attention twice. I would start
> with the patch to use the bus unit attention mechanism. It would be
> even better to have two unit tests that check the behavior prescribed
> by the standard: 1) UNIT ATTENTION from TEST UNIT READY immediately
> after a hotunplug notification; 2) no UNIT ATTENTION from REPORT LUNS
> and also no UNIT ATTENTION from a subsequent TEST UNIT READY command.
> Debugging the guest is a separate step.

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-09-30 Thread Venu Busireddy

On 2022-09-30 10:41:03 +0200, Paolo Bonzini wrote:
> On Fri, Sep 30, 2022 at 12:31 AM Venu Busireddy
>  wrote:
> > > >*/
> > > >   !(buf[0] == REQUEST_SENSE && d->sense_is_ua))) {
> > > >  ops = _unit_attention;
> > > > +d->clear_reported_luns_changed = true;
> > >
> > > Any reason to have this flag, and not just clear
> > > s->reported_luns_changed after scsi_req_new? Is it to handle the
> > > invalid opcode case?
> >
> > Immediately after a hotunplug event, qemu (without any action from
> > the guest) processes a REPORT_LUNS command on the lun 0 of the device
> > (haven't figured out what causes this).
> 
> There is only one call to virtio_scsi_handle_cmd_req_prepare and it
> takes the command from the guest, are you sure it is without any
> action from the guest?

I am sure, based on what I am observing. I am running the scsitrace
(scsitrace -n vtioscsi -v) command on the Solaris guest, and I see no
output there. I do see output when I run any scsi related commands (such
as sg_luns, sg_raw, etc) on the guest. But no output when I hotunplug
the disk, either while virtio_scsi_handle_cmd_req_prepare() is running
or after that, until I run any sg_* commands on the guest later.

However, for whatever it's worth, if I have two or more luns
on a virtio-scsi adapter, the spurious REPORT_LUNS processing
(virtio_scsi_handle_cmd_req_prepare() call) occurs only when
I hotunplug a lun while the other luns are still plugged in,
until the last lun is unplugged. I do not see the spurious call to
virtio_scsi_handle_cmd_req_prepare() when the last lun is unplugged,
whether that was the only lun present, or if it was the last of many.

Venu

Re: [PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-09-29 Thread Venu Busireddy

On 2022-09-29 12:49:51 +0200, Paolo Bonzini wrote:
> On Wed, Sep 28, 2022 at 8:06 PM Venu Busireddy
>  wrote:
> >
> > Section 5.6.6.3 of VirtIO specification states, "Events will also
> > be reported via sense codes..." However, no sense data is sent when
> > VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
> > are reported (when disk hotplug/hotunplug events occur). SCSI layer
> > on Solaris depends on this sense data, and hence does not handle disk
> > hotplug/hotunplug events.
> >
> > When disk inventory changes, return a CHECK_CONDITION status with sense
> > data of 0x06/0x3F/0x0E (sense code REPORTED_LUNS_CHANGED), as per the
> > specifications in Section 5.14 (h) of SAM-4.
> >
> > Signed-off-by: Venu Busireddy 
> >
> > v2 -> v3:
> > - Implement the suggestion from Paolo Bonzini .
> >
> > v1 -> v2:
> > - Send the sense data for VIRTIO_SCSI_EVT_RESET_REMOVED event too.
> > ---
> >  hw/scsi/scsi-bus.c  |  1 +
> >  hw/scsi/virtio-scsi.c   | 16 
> >  include/hw/scsi/scsi.h  |  6 ++
> >  include/hw/virtio/virtio-scsi.h |  1 +
> >  4 files changed, 24 insertions(+)
> >
> > diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
> > index 4403717c4aaf..b7cb249f2eab 100644
> > --- a/hw/scsi/scsi-bus.c
> > +++ b/hw/scsi/scsi-bus.c
> > @@ -730,6 +730,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
> > uint32_t lun,
> >*/
> >   !(buf[0] == REQUEST_SENSE && d->sense_is_ua))) {
> >  ops = _unit_attention;
> > +d->clear_reported_luns_changed = true;
> 
> Any reason to have this flag, and not just clear
> s->reported_luns_changed after scsi_req_new? Is it to handle the
> invalid opcode case?

Immediately after a hotunplug event, qemu (without any action from
the guest) processes a REPORT_LUNS command on the lun 0 of the device
(haven't figured out what causes this). If we unconditionally clear
the s->reported_luns_changed flag right after calling scsi_req_new(),
the action taken in scsi_device_set_ua() is undone by the eventual call
to scsi_clear_unit_attention(). Here is the sequence of the events:

(Note: SCSIDevice = 0x7ff180005010 is lun 1, and SCSIDevice = 0x557fed88fd40 is 
lun 0)

virtio_scsi_hotunplug(): Entered, reported_luns_changed = 0, VirtIOSCSI = 
0x557feda9f750, SCSIDevice = 0x7ff180005010, bus = 0x557feda9f9c0
virtio_scsi_hotunplug(): Exiting, reported_luns_changed = 1, VirtIOSCSI = 
0x557feda9f750, SCSIDevice = 0x7ff180005010, bus = 0x557feda9f9c0
virtio_scsi_handle_cmd_req_prepare(): Entered, reported_luns_changed = 1, 
VirtIOSCSI = 0x557feda9f750, SCSIDevice = 0x557fed88fd40, cdb[0] = 0xa0
scsi_device_set_ua(): Entered, SCSIDevice = 0x557fed88fd40
scsi_device_set_ua(): prec1 = 0x7fff,  sdev->key = 0,  sdev->asc = 0x00,  
sdev->ascq = 0x00
scsi_device_set_ua(): prec2 = 0x3f0e, sense->key = 6, sense->asc = 0x3f, 
sense->ascq = 0x0e
scsi_req_new(): SCSIDevice = 0x557fed88fd40, bus = 0x557feda9f9c0, buf[0] = a0
scsi_req_new(): sdev.key = 6, sdev.asc = 0x3f, sdev.ascq = 0x0e
virtio_scsi_handle_cmd_req_prepare(): Exiting, reported_luns_changed = 0, 
VirtIOSCSI = 0x557feda9f750, SCSIDevice = 0x557fed88fd40, cdb[0] = 0xa0
scsi_clear_unit_attention(): Entered, buf[0] = 0xa0, SCSIDevice = 
0x557fed88fd40, key = 6, asc = 0x3f, ascq = 0x0e
scsi_clear_unit_attention(): Exiting, buf[0] = 0xa0, SCSIDevice = 
0x557fed88fd40, key = 0, asc = 0x00, ascq = 0x00

As can be seen, before the guest does anything, we cleared the
reported_luns_changed flag as well as the unit attention condition.

Therefore, when the guest eventually sends a TEST_UNIT_READY command,
we don't report anything back, as can be seen by the traces below:

virtio_scsi_handle_cmd_req_prepare(): Entered, reported_luns_changed = 0, 
VirtIOSCSI = 0x557feda9f750, SCSIDevice = 0x557fed88fd40, cdb[0] = 0x00
scsi_req_new(): SCSIDevice = 0x557fed88fd40, bus = 0x557feda9f9c0, buf[0] = 00
scsi_req_new(): sdev.key = 0, sdev.asc = 0x00, sdev.ascq = 0x00
virtio_scsi_handle_cmd_req_prepare(): Exiting, reported_luns_changed = 0, 
VirtIOSCSI = 0x557feda9f750, SCSIDevice = 0x557fed88fd40, cdb[0] = 0x00
scsi_clear_unit_attention(): Entered, buf[0] = 0x00, SCSIDevice = 
0x557fed88fd40, key = 0, asc = 0x00, ascq = 0x00

That is why we need the d->clear_reported_luns_changed flag, to know
when we genuinely processed a command from the guest and only then clear
the reported_luns_changed flag.

> 
> I just reread the code and noticed that there is also a *bus* unit
> attention mechanism, which is unused but seems perfect for this
> usecase. The first device on the bus to execute a command successfully
> will consume it.
> 
> You need s

[PATCH v3] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-09-28 Thread Venu Busireddy

Section 5.6.6.3 of VirtIO specification states, "Events will also
be reported via sense codes..." However, no sense data is sent when
VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
are reported (when disk hotplug/hotunplug events occur). SCSI layer
on Solaris depends on this sense data, and hence does not handle disk
hotplug/hotunplug events.

When disk inventory changes, return a CHECK_CONDITION status with sense
data of 0x06/0x3F/0x0E (sense code REPORTED_LUNS_CHANGED), as per the
specifications in Section 5.14 (h) of SAM-4.

Signed-off-by: Venu Busireddy 

v2 -> v3:
- Implement the suggestion from Paolo Bonzini .

v1 -> v2:
- Send the sense data for VIRTIO_SCSI_EVT_RESET_REMOVED event too.
---
 hw/scsi/scsi-bus.c  |  1 +
 hw/scsi/virtio-scsi.c   | 16 
 include/hw/scsi/scsi.h  |  6 ++
 include/hw/virtio/virtio-scsi.h |  1 +
 4 files changed, 24 insertions(+)

diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c
index 4403717c4aaf..b7cb249f2eab 100644
--- a/hw/scsi/scsi-bus.c
+++ b/hw/scsi/scsi-bus.c
@@ -730,6 +730,7 @@ SCSIRequest *scsi_req_new(SCSIDevice *d, uint32_t tag, 
uint32_t lun,
   */
  !(buf[0] == REQUEST_SENSE && d->sense_is_ua))) {
 ops = _unit_attention;
+d->clear_reported_luns_changed = true;
 } else if (lun != d->lun ||
buf[0] == REPORT_LUNS ||
(buf[0] == REQUEST_SENSE && d->sense_len)) {
diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 41f2a5630173..b7f66d366fcb 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -695,9 +695,23 @@ static int virtio_scsi_handle_cmd_req_prepare(VirtIOSCSI 
*s, VirtIOSCSIReq *req)
 return -ENOENT;
 }
 virtio_scsi_ctx_check(s, d);
+
+if (s->reported_luns_changed) {
+scsi_device_set_ua(d, SENSE_CODE(REPORTED_LUNS_CHANGED));
+}
+
+/*
+ * set d->clear_reported_luns_changed.
+ * scsi_req_new() will clear it if the required conditions are met.
+ */
+d->clear_reported_luns_changed = false;
 req->sreq = scsi_req_new(d, req->req.cmd.tag,
  virtio_scsi_get_lun(req->req.cmd.lun),
  req->req.cmd.cdb, vs->cdb_size, req);
+if (d->clear_reported_luns_changed) {
+s->reported_luns_changed = false;
+d->clear_reported_luns_changed = false;
+}
 
 if (req->sreq->cmd.mode != SCSI_XFER_NONE
 && (req->sreq->cmd.mode != req->mode ||
@@ -956,6 +970,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_RESCAN);
+s->reported_luns_changed = true;
 virtio_scsi_release(s);
 }
 }
@@ -973,6 +988,7 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_REMOVED);
+s->reported_luns_changed = true;
 virtio_scsi_release(s);
 }
 
diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h
index 001103488c23..ea81c6f89e74 100644
--- a/include/hw/scsi/scsi.h
+++ b/include/hw/scsi/scsi.h
@@ -89,6 +89,12 @@ struct SCSIDevice
 uint32_t io_timeout;
 bool needs_vpd_bl_emulation;
 bool hba_supports_iothread;
+/*
+ * clear_reported_luns_changed is used, if the required
+ * conditions are met, to inform the virtio-scsi layer that
+ * any pending REPORTED_LUNS_CHANGED condition can be cleared.
+ */
+bool clear_reported_luns_changed;
 };
 
 extern const VMStateDescription vmstate_scsi_device;
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index a36aad9c8695..efbcf9ba069a 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -81,6 +81,7 @@ struct VirtIOSCSI {
 SCSIBus bus;
 int resetting;
 bool events_dropped;
+bool reported_luns_changed;
 
 /* Fields for dataplane below */
 AioContext *ctx; /* one iothread per virtio-scsi-pci for now */

Re: [PATCH v2] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-09-21 Thread Venu Busireddy

On 2022-09-21 16:33:35 +0200, Paolo Bonzini wrote:
> On Fri, Sep 16, 2022 at 3:44 AM Venu Busireddy
>  wrote:
> > diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> > index 41f2a5630173..69194c7ae23c 100644
> > --- a/hw/scsi/virtio-scsi.c
> > +++ b/hw/scsi/virtio-scsi.c
> > @@ -608,7 +608,19 @@ static void virtio_scsi_command_complete(SCSIRequest 
> > *r, size_t resid)
> >
> >  req->resp.cmd.response = VIRTIO_SCSI_S_OK;
> >  req->resp.cmd.status = r->status;
> > -if (req->resp.cmd.status == GOOD) {
> > +if (req->dev->reported_luns_changed &&
> > +(req->req.cmd.cdb[0] != INQUIRY) &&
> > +(req->req.cmd.cdb[0] != REPORT_LUNS) &&
> > +(req->req.cmd.cdb[0] != REQUEST_SENSE)) {
> > +req->dev->reported_luns_changed = false;
> > +req->resp.cmd.resid = 0;
> > +req->resp.cmd.status_qualifier = 0;
> > +req->resp.cmd.status = CHECK_CONDITION;
> > +sense_len = scsi_build_sense(sense, 
> > SENSE_CODE(REPORTED_LUNS_CHANGED));
> > +qemu_iovec_from_buf(>resp_iov, sizeof(req->resp.cmd),
> > +sense, sense_len);
> > +req->resp.cmd.sense_len = virtio_tswap32(vdev, sense_len);
> > +} else if (req->resp.cmd.status == GOOD) {
> >  req->resp.cmd.resid = virtio_tswap32(vdev, resid);
> >  } else {
> >  req->resp.cmd.resid = 0;
> 
> Hi,
> 
> a unit attention sense must be sent _instead_ of executing the command.
> 
> QEMU already has a function scsi_device_set_ua() that handles
> everything; you have to call it, if reported_luns_changed is true,
> from virtio_scsi_handle_cmd_req_prepare() before scsi_req_new().
> 
> It will also skip GET_CONFIGURATION and GET_EVENT_STATUS_NOTIFICATION
> commands which are further special-cased in 4.1.6.1 of the MMC
> specification.

Thanks, Paolo. I will test your suggestion (as soon as I finish what I
am working currently), and get back with either more questions, or with
a v3 post.

Venu

> > @@ -956,6 +968,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  virtio_scsi_push_event(s, sd,
> > VIRTIO_SCSI_T_TRANSPORT_RESET,
> > VIRTIO_SCSI_EVT_RESET_RESCAN);
> > +s->reported_luns_changed = true;
> >  virtio_scsi_release(s);
> >  }
> >  }
> > @@ -973,6 +986,7 @@ static void virtio_scsi_hotunplug(HotplugHandler 
> > *hotplug_dev, DeviceState *dev,
> >  virtio_scsi_push_event(s, sd,
> > VIRTIO_SCSI_T_TRANSPORT_RESET,
> > VIRTIO_SCSI_EVT_RESET_REMOVED);
> > +s->reported_luns_changed = true;
> >  virtio_scsi_release(s);
> >  }
> >
> > diff --git a/include/hw/virtio/virtio-scsi.h 
> > b/include/hw/virtio/virtio-scsi.h
> > index a36aad9c8695..efbcf9ba069a 100644
> > --- a/include/hw/virtio/virtio-scsi.h
> > +++ b/include/hw/virtio/virtio-scsi.h
> > @@ -81,6 +81,7 @@ struct VirtIOSCSI {
> >  SCSIBus bus;
> >  int resetting;
> >  bool events_dropped;
> > +bool reported_luns_changed;
> >
> >  /* Fields for dataplane below */
> >  AioContext *ctx; /* one iothread per virtio-scsi-pci for now */
> >
>

[PATCH v2] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon disk hotplug events.

2022-09-15 Thread Venu Busireddy

Section 5.6.6.3 of VirtIO specification states, "Events will also
be reported via sense codes..." However, no sense data is sent when
VIRTIO_SCSI_EVT_RESET_RESCAN or VIRTIO_SCSI_EVT_RESET_REMOVED events
are reported (when disk hotplug/hotunplug events occur). SCSI layer
on Solaris depends on this sense data, and hence does not handle disk
hotplug/hotunplug events.

As specified in SAM-4, Section 5.14 (h), return a CHECK_CONDITION status
with sense data of 0x06/0x3F/0x0E (sense code REPORTED_LUNS_CHANGED) when
the disk inventory changes and a command other than INQUIRY, REPORT_LUNS,
or REQUEST_SENSE is received.

Signed-off-by: Venu Busireddy 

v1 -> v2:
- Send the sense data for VIRTIO_SCSI_EVT_RESET_REMOVED event too.
---
 hw/scsi/virtio-scsi.c   | 16 +++-
 include/hw/virtio/virtio-scsi.h |  1 +
 2 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 41f2a5630173..69194c7ae23c 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -608,7 +608,19 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
size_t resid)
 
 req->resp.cmd.response = VIRTIO_SCSI_S_OK;
 req->resp.cmd.status = r->status;
-if (req->resp.cmd.status == GOOD) {
+if (req->dev->reported_luns_changed &&
+(req->req.cmd.cdb[0] != INQUIRY) &&
+(req->req.cmd.cdb[0] != REPORT_LUNS) &&
+(req->req.cmd.cdb[0] != REQUEST_SENSE)) {
+req->dev->reported_luns_changed = false;
+req->resp.cmd.resid = 0;
+req->resp.cmd.status_qualifier = 0;
+req->resp.cmd.status = CHECK_CONDITION;
+sense_len = scsi_build_sense(sense, SENSE_CODE(REPORTED_LUNS_CHANGED));
+qemu_iovec_from_buf(>resp_iov, sizeof(req->resp.cmd),
+sense, sense_len);
+req->resp.cmd.sense_len = virtio_tswap32(vdev, sense_len);
+} else if (req->resp.cmd.status == GOOD) {
 req->resp.cmd.resid = virtio_tswap32(vdev, resid);
 } else {
 req->resp.cmd.resid = 0;
@@ -956,6 +968,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_RESCAN);
+s->reported_luns_changed = true;
 virtio_scsi_release(s);
 }
 }
@@ -973,6 +986,7 @@ static void virtio_scsi_hotunplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
 virtio_scsi_push_event(s, sd,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_REMOVED);
+s->reported_luns_changed = true;
 virtio_scsi_release(s);
 }
 
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index a36aad9c8695..efbcf9ba069a 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -81,6 +81,7 @@ struct VirtIOSCSI {
 SCSIBus bus;
 int resetting;
 bool events_dropped;
+bool reported_luns_changed;
 
 /* Fields for dataplane below */
 AioContext *ctx; /* one iothread per virtio-scsi-pci for now */

Re: [Qemu-devel] [PATCH v1] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon a disk hotplug.

2022-07-08 Thread Venu Busireddy



Ping?

On 2022-05-31 15:22:37 -0500, Venu Busireddy wrote:
> When a disk is hotplugged, QEMU reports a VIRTIO_SCSI_EVT_RESET_RESCAN
> event, but does not send the "REPORTED LUNS CHANGED" sense data. This
> does not conform to Section 5.6.6.3 of the VirtIO specification, which
> states "Events will also be reported via sense codes..." SCSI layer on
> Solaris depends on this sense data, and hence does not recognize the
> hotplugged disks (until a reboot).
> 
> As specified in SAM-4, Section 5.14, return a CHECK_CONDITION status with
> a sense data of 0x06/0x3F/0x0E, whenever a command other than INQUIRY,
> REPORT_LUNS, or REQUEST_SENSE is received.
> 
> Signed-off-by: Venu Busireddy 
> ---
>  hw/scsi/virtio-scsi.c   | 15 ++-
>  include/hw/virtio/virtio-scsi.h |  1 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 4141517a..7ae1cfa6e584 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -608,7 +608,19 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
> size_t resid)
>  
>  req->resp.cmd.response = VIRTIO_SCSI_S_OK;
>  req->resp.cmd.status = r->status;
> -if (req->resp.cmd.status == GOOD) {
> +if (req->dev->reported_luns_changed &&
> +(req->req.cmd.cdb[0] != INQUIRY) &&
> +(req->req.cmd.cdb[0] != REPORT_LUNS) &&
> +(req->req.cmd.cdb[0] != REQUEST_SENSE)) {
> +req->dev->reported_luns_changed = false;
> +req->resp.cmd.resid = 0;
> +req->resp.cmd.status_qualifier = 0;
> +req->resp.cmd.status = CHECK_CONDITION;
> +sense_len = scsi_build_sense(sense, 
> SENSE_CODE(REPORTED_LUNS_CHANGED));
> +qemu_iovec_from_buf(>resp_iov, sizeof(req->resp.cmd),
> +sense, sense_len);
> +req->resp.cmd.sense_len = virtio_tswap32(vdev, sense_len);
> +} else if (req->resp.cmd.status == GOOD) {
>  req->resp.cmd.resid = virtio_tswap32(vdev, resid);
>  } else {
>  req->resp.cmd.resid = 0;
> @@ -956,6 +968,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
> VIRTIO_SCSI_T_TRANSPORT_RESET,
> VIRTIO_SCSI_EVT_RESET_RESCAN);
>  virtio_scsi_release(s);
> +s->reported_luns_changed = true;
>  }
>  }
>  
> diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
> index a36aad9c8695..efbcf9ba069a 100644
> --- a/include/hw/virtio/virtio-scsi.h
> +++ b/include/hw/virtio/virtio-scsi.h
> @@ -81,6 +81,7 @@ struct VirtIOSCSI {
>  SCSIBus bus;
>  int resetting;
>  bool events_dropped;
> +bool reported_luns_changed;
>  
>  /* Fields for dataplane below */
>  AioContext *ctx; /* one iothread per virtio-scsi-pci for now */

Re: [Qemu-devel] [PATCH v1] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon a disk hotplug.

2022-06-16 Thread Venu Busireddy



Ping?

On 2022-05-31 15:22:37 -0500, Venu Busireddy wrote:
> When a disk is hotplugged, QEMU reports a VIRTIO_SCSI_EVT_RESET_RESCAN
> event, but does not send the "REPORTED LUNS CHANGED" sense data. This
> does not conform to Section 5.6.6.3 of the VirtIO specification, which
> states "Events will also be reported via sense codes..." SCSI layer on
> Solaris depends on this sense data, and hence does not recognize the
> hotplugged disks (until a reboot).
> 
> As specified in SAM-4, Section 5.14, return a CHECK_CONDITION status with
> a sense data of 0x06/0x3F/0x0E, whenever a command other than INQUIRY,
> REPORT_LUNS, or REQUEST_SENSE is received.
> 
> Signed-off-by: Venu Busireddy 
> ---
>  hw/scsi/virtio-scsi.c   | 15 ++-
>  include/hw/virtio/virtio-scsi.h |  1 +
>  2 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
> index 4141517a..7ae1cfa6e584 100644
> --- a/hw/scsi/virtio-scsi.c
> +++ b/hw/scsi/virtio-scsi.c
> @@ -608,7 +608,19 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
> size_t resid)
>  
>  req->resp.cmd.response = VIRTIO_SCSI_S_OK;
>  req->resp.cmd.status = r->status;
> -if (req->resp.cmd.status == GOOD) {
> +if (req->dev->reported_luns_changed &&
> +(req->req.cmd.cdb[0] != INQUIRY) &&
> +(req->req.cmd.cdb[0] != REPORT_LUNS) &&
> +(req->req.cmd.cdb[0] != REQUEST_SENSE)) {
> +req->dev->reported_luns_changed = false;
> +req->resp.cmd.resid = 0;
> +req->resp.cmd.status_qualifier = 0;
> +req->resp.cmd.status = CHECK_CONDITION;
> +sense_len = scsi_build_sense(sense, 
> SENSE_CODE(REPORTED_LUNS_CHANGED));
> +qemu_iovec_from_buf(>resp_iov, sizeof(req->resp.cmd),
> +sense, sense_len);
> +req->resp.cmd.sense_len = virtio_tswap32(vdev, sense_len);
> +} else if (req->resp.cmd.status == GOOD) {
>  req->resp.cmd.resid = virtio_tswap32(vdev, resid);
>  } else {
>  req->resp.cmd.resid = 0;
> @@ -956,6 +968,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
> VIRTIO_SCSI_T_TRANSPORT_RESET,
> VIRTIO_SCSI_EVT_RESET_RESCAN);
>  virtio_scsi_release(s);
> +s->reported_luns_changed = true;
>  }
>  }
>  
> diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
> index a36aad9c8695..efbcf9ba069a 100644
> --- a/include/hw/virtio/virtio-scsi.h
> +++ b/include/hw/virtio/virtio-scsi.h
> @@ -81,6 +81,7 @@ struct VirtIOSCSI {
>  SCSIBus bus;
>  int resetting;
>  bool events_dropped;
> +bool reported_luns_changed;
>  
>  /* Fields for dataplane below */
>  AioContext *ctx; /* one iothread per virtio-scsi-pci for now */

[Qemu-devel] [PATCH v1] virtio-scsi: Send "REPORTED LUNS CHANGED" sense data upon a disk hotplug.

2022-05-31 Thread Venu Busireddy

When a disk is hotplugged, QEMU reports a VIRTIO_SCSI_EVT_RESET_RESCAN
event, but does not send the "REPORTED LUNS CHANGED" sense data. This
does not conform to Section 5.6.6.3 of the VirtIO specification, which
states "Events will also be reported via sense codes..." SCSI layer on
Solaris depends on this sense data, and hence does not recognize the
hotplugged disks (until a reboot).

As specified in SAM-4, Section 5.14, return a CHECK_CONDITION status with
a sense data of 0x06/0x3F/0x0E, whenever a command other than INQUIRY,
REPORT_LUNS, or REQUEST_SENSE is received.

Signed-off-by: Venu Busireddy 
---
 hw/scsi/virtio-scsi.c   | 15 ++-
 include/hw/virtio/virtio-scsi.h |  1 +
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c
index 4141517a..7ae1cfa6e584 100644
--- a/hw/scsi/virtio-scsi.c
+++ b/hw/scsi/virtio-scsi.c
@@ -608,7 +608,19 @@ static void virtio_scsi_command_complete(SCSIRequest *r, 
size_t resid)
 
 req->resp.cmd.response = VIRTIO_SCSI_S_OK;
 req->resp.cmd.status = r->status;
-if (req->resp.cmd.status == GOOD) {
+if (req->dev->reported_luns_changed &&
+(req->req.cmd.cdb[0] != INQUIRY) &&
+(req->req.cmd.cdb[0] != REPORT_LUNS) &&
+(req->req.cmd.cdb[0] != REQUEST_SENSE)) {
+req->dev->reported_luns_changed = false;
+req->resp.cmd.resid = 0;
+req->resp.cmd.status_qualifier = 0;
+req->resp.cmd.status = CHECK_CONDITION;
+sense_len = scsi_build_sense(sense, SENSE_CODE(REPORTED_LUNS_CHANGED));
+qemu_iovec_from_buf(>resp_iov, sizeof(req->resp.cmd),
+sense, sense_len);
+req->resp.cmd.sense_len = virtio_tswap32(vdev, sense_len);
+} else if (req->resp.cmd.status == GOOD) {
 req->resp.cmd.resid = virtio_tswap32(vdev, resid);
 } else {
 req->resp.cmd.resid = 0;
@@ -956,6 +968,7 @@ static void virtio_scsi_hotplug(HotplugHandler 
*hotplug_dev, DeviceState *dev,
VIRTIO_SCSI_T_TRANSPORT_RESET,
VIRTIO_SCSI_EVT_RESET_RESCAN);
 virtio_scsi_release(s);
+s->reported_luns_changed = true;
 }
 }
 
diff --git a/include/hw/virtio/virtio-scsi.h b/include/hw/virtio/virtio-scsi.h
index a36aad9c8695..efbcf9ba069a 100644
--- a/include/hw/virtio/virtio-scsi.h
+++ b/include/hw/virtio/virtio-scsi.h
@@ -81,6 +81,7 @@ struct VirtIOSCSI {
 SCSIBus bus;
 int resetting;
 bool events_dropped;
+bool reported_luns_changed;
 
 /* Fields for dataplane below */
 AioContext *ctx; /* one iothread per virtio-scsi-pci for now */

Re: [PATCH v8 07/13] confidential guest support: Introduce cgs "ready" flag

2021-02-10 Thread Venu Busireddy

On 2021-02-02 15:13:09 +1100, David Gibson wrote:
> The platform specific details of mechanisms for implementing
> confidential guest support may require setup at various points during
> initialization.  Thus, it's not really feasible to have a single cgs
> initialization hook, but instead each mechanism needs its own
> initialization calls in arch or machine specific code.
> 
> However, to make it harder to have a bug where a mechanism isn't
> properly initialized under some circumstances, we want to have a
> common place, late in boot, where we verify that cgs has been
> initialized if it was requested.
> 
> This patch introduces a ready flag to the ConfidentialGuestSupport
> base type to accomplish this, which we verify in
> qemu_machine_creation_done().
> 
> Signed-off-by: David Gibson 
> Reviewed-by: Dr. David Alan Gilbert 
> Reviewed-by: Greg Kurz 
> ---
>  include/exec/confidential-guest-support.h | 24 +++
>  softmmu/vl.c  | 10 ++
>  target/i386/sev.c |  2 ++
>  3 files changed, 36 insertions(+)
> 
> diff --git a/include/exec/confidential-guest-support.h 
> b/include/exec/confidential-guest-support.h
> index 3db6380e63..5dcf602047 100644
> --- a/include/exec/confidential-guest-support.h
> +++ b/include/exec/confidential-guest-support.h
> @@ -27,6 +27,30 @@ OBJECT_DECLARE_SIMPLE_TYPE(ConfidentialGuestSupport, 
> CONFIDENTIAL_GUEST_SUPPORT)
>  
>  struct ConfidentialGuestSupport {
>  Object parent;
> +
> +/*
> + * ready: flag set by CGS initialization code once it's ready to
> + *start executing instructions in a potentially-secure
> + *guest
> + *
> + * The definition here is a bit fuzzy, because this is essentially
> + * part of a self-sanity-check, rather than a strict mechanism.
> + *
> + * It's not fasible to have a single point in the common machine

Just a nit pick.

s/fasible/feasible/

> + * init path to configure confidential guest support, because
> + * different mechanisms have different interdependencies requiring
> + * initialization in different places, often in arch or machine
> + * type specific code.  It's also usually not possible to check
> + * for invalid configurations until that initialization code.
> + * That means it would be very easy to have a bug allowing CGS
> + * init to be bypassed entirely in certain configurations.
> + *
> + * Silently ignoring a requested security feature would be bad, so
> + * to avoid that we check late in init that this 'ready' flag is
> + * set if CGS was requested.  If the CGS init hasn't happened, and
> + * so 'ready' is not set, we'll abort.
> + */
> +bool ready;
>  };
>  
>  typedef struct ConfidentialGuestSupportClass {
> diff --git a/softmmu/vl.c b/softmmu/vl.c
> index 1b464e3474..1869ed54a9 100644
> --- a/softmmu/vl.c
> +++ b/softmmu/vl.c
> @@ -101,6 +101,7 @@
>  #include "qemu/plugin.h"
>  #include "qemu/queue.h"
>  #include "sysemu/arch_init.h"
> +#include "exec/confidential-guest-support.h"
>  
>  #include "ui/qemu-spice.h"
>  #include "qapi/string-input-visitor.h"
> @@ -2497,6 +2498,8 @@ static void qemu_create_cli_devices(void)
>  
>  static void qemu_machine_creation_done(void)
>  {
> +MachineState *machine = MACHINE(qdev_get_machine());
> +
>  /* Did we create any drives that we failed to create a device for? */
>  drive_check_orphaned();
>  
> @@ -2516,6 +2519,13 @@ static void qemu_machine_creation_done(void)
>  
>  qdev_machine_creation_done();
>  
> +if (machine->cgs) {
> +/*
> + * Verify that Confidential Guest Support has actually been 
> initialized
> + */
> +assert(machine->cgs->ready);
> +}
> +
>  if (foreach_device_config(DEV_GDB, gdbserver_start) < 0) {
>  exit(1);
>  }
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index 590cb31fa8..f9e9b5d8ae 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -737,6 +737,8 @@ int sev_kvm_init(ConfidentialGuestSupport *cgs, Error 
> **errp)
>  qemu_add_machine_init_done_notifier(_machine_done_notify);
>  qemu_add_vm_change_state_handler(sev_vm_state_change, sev);
>  
> +cgs->ready = true;
> +
>  return 0;
>  err:
>  sev_guest = NULL;
> -- 
> 2.29.2

Re: [PATCH v6 6/6] sev/i386: Enable an SEV-ES guest based on SEV policy

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:49 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Update the sev_es_enabled() function return value to be based on the SEV
> policy that has been specified. SEV-ES is enabled if SEV is enabled and
> the SEV-ES policy bit is set in the policy object.
> 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Venu Busireddy 

> ---
>  target/i386/sev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index badc141554..62ecc28cf6 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -371,7 +371,7 @@ sev_enabled(void)
>  bool
>  sev_es_enabled(void)
>  {
> -return false;
> +return sev_enabled() && (sev_guest->policy & SEV_POLICY_ES);
>  }
>  
>  uint64_t
> -- 
> 2.30.0
>

Re: [PATCH v6 1/6] sev/i386: Add initial support for SEV-ES

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:44 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> Provide initial support for SEV-ES. This includes creating a function to
> indicate the guest is an SEV-ES guest (which will return false until all
> support is in place), performing the proper SEV initialization and
> ensuring that the guest CPU state is measured as part of the launch.
> 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Reviewed-by: Dr. David Alan Gilbert 
> Co-developed-by: Jiri Slaby 
> Signed-off-by: Jiri Slaby 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Venu Busireddy 

> ---
>  target/i386/cpu.c  |  1 +
>  target/i386/sev-stub.c |  6 ++
>  target/i386/sev.c  | 44 --
>  target/i386/sev_i386.h |  1 +
>  4 files changed, 50 insertions(+), 2 deletions(-)
> 
> diff --git a/target/i386/cpu.c b/target/i386/cpu.c
> index 72a79e6019..0415d8a99c 100644
> --- a/target/i386/cpu.c
> +++ b/target/i386/cpu.c
> @@ -5987,6 +5987,7 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, 
> uint32_t count,
>  break;
>  case 0x801F:
>  *eax = sev_enabled() ? 0x2 : 0;
> +*eax |= sev_es_enabled() ? 0x8 : 0;
>  *ebx = sev_get_cbit_position();
>  *ebx |= sev_get_reduced_phys_bits() << 6;
>  *ecx = 0;
> diff --git a/target/i386/sev-stub.c b/target/i386/sev-stub.c
> index c1fecc2101..229a2ee77b 100644
> --- a/target/i386/sev-stub.c
> +++ b/target/i386/sev-stub.c
> @@ -49,8 +49,14 @@ SevCapability *sev_get_capabilities(Error **errp)
>  error_setg(errp, "SEV is not available in this QEMU");
>  return NULL;
>  }
> +
>  int sev_inject_launch_secret(const char *hdr, const char *secret,
>   uint64_t gpa, Error **errp)
>  {
>  return 1;
>  }
> +
> +bool sev_es_enabled(void)
> +{
> +return false;
> +}
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index 1546606811..fce2128c07 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -360,6 +360,12 @@ sev_enabled(void)
>  return !!sev_guest;
>  }
>  
> +bool
> +sev_es_enabled(void)
> +{
> +return false;
> +}
> +
>  uint64_t
>  sev_get_me_mask(void)
>  {
> @@ -580,6 +586,20 @@ sev_launch_update_data(SevGuestState *sev, uint8_t 
> *addr, uint64_t len)
>  return ret;
>  }
>  
> +static int
> +sev_launch_update_vmsa(SevGuestState *sev)
> +{
> +int ret, fw_error;
> +
> +ret = sev_ioctl(sev->sev_fd, KVM_SEV_LAUNCH_UPDATE_VMSA, NULL, 
> _error);
> +if (ret) {
> +error_report("%s: LAUNCH_UPDATE_VMSA ret=%d fw_error=%d '%s'",
> +__func__, ret, fw_error, fw_error_to_str(fw_error));
> +}
> +
> +return ret;
> +}
> +
>  static void
>  sev_launch_get_measure(Notifier *notifier, void *unused)
>  {
> @@ -592,6 +612,14 @@ sev_launch_get_measure(Notifier *notifier, void *unused)
>  return;
>  }
>  
> +if (sev_es_enabled()) {
> +/* measure all the VM save areas before getting launch_measure */
> +ret = sev_launch_update_vmsa(sev);
> +if (ret) {
> +exit(1);
> +}
> +}
> +
>  measurement = g_new0(struct kvm_sev_launch_measure, 1);
>  
>  /* query the measurement blob length */
> @@ -686,7 +714,7 @@ sev_guest_init(const char *id)
>  {
>  SevGuestState *sev;
>  char *devname;
> -int ret, fw_error;
> +int ret, fw_error, cmd;
>  uint32_t ebx;
>  uint32_t host_cbitpos;
>  struct sev_user_data_status status = {};
> @@ -747,8 +775,20 @@ sev_guest_init(const char *id)
>  sev->api_major = status.api_major;
>  sev->api_minor = status.api_minor;
>  
> +if (sev_es_enabled()) {
> +if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
> +error_report("%s: guest policy requires SEV-ES, but "
> + "host SEV-ES support unavailable",
> + __func__);
> +goto err;
> +}
> +cmd = KVM_SEV_ES_INIT;
> +} else {
> +cmd = KVM_SEV_INIT;
> +}
> +
>  trace_kvm_sev_init();
> -ret = sev_ioctl(sev->sev_fd, KVM_SEV_INIT, NULL, _error);
> +ret = sev_ioctl(sev->sev_fd, cmd, NULL, _error);
>  if (ret) {
>  error_report("%s: failed to initialize ret=%d fw_error=%d '%s'",
>   __func__, ret, fw_error, fw_error_to_str(fw_error));
> diff --git a/target/i386/sev_i386.h b/target/i386/sev_i386.h
> index 4db6960f60..4f9a5e9b21 100644
> --- a/target/i386/sev_i386.h
> +++ b/target/i386/sev_i386.h
> @@ -29,6 +29,7 @@
>  #define SEV_POLICY_SEV  0x20
>  
>  extern bool sev_enabled(void);
> +extern bool sev_es_enabled(void);
>  extern uint64_t sev_get_me_mask(void);
>  extern SevInfo *sev_get_info(void);
>  extern uint32_t sev_get_cbit_position(void);
> -- 
> 2.30.0
>

Re: [PATCH v6 4/6] sev/i386: Don't allow a system reset under an SEV-ES guest

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:47 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> An SEV-ES guest does not allow register state to be altered once it has
> been measured. When an SEV-ES guest issues a reboot command, Qemu will
> reset the vCPU state and resume the guest. This will cause failures under
> SEV-ES. Prevent that from occuring by introducing an arch-specific
> callback that returns a boolean indicating whether vCPUs are resettable.
> 
> Cc: Peter Maydell 
> Cc: Aurelien Jarno 
> Cc: Jiaxun Yang 
> Cc: Aleksandar Rikalo 
> Cc: David Gibson 
> Cc: David Hildenbrand 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Venu Busireddy 

> ---
>  accel/kvm/kvm-all.c   |  5 +
>  include/sysemu/cpus.h |  2 ++
>  include/sysemu/hw_accel.h |  5 +
>  include/sysemu/kvm.h  | 10 ++
>  softmmu/cpus.c|  5 +
>  softmmu/runstate.c|  3 +++
>  target/arm/kvm.c  |  5 +
>  target/i386/kvm/kvm.c |  6 ++
>  target/mips/kvm.c |  5 +
>  target/ppc/kvm.c  |  5 +
>  target/s390x/kvm.c|  5 +
>  11 files changed, 56 insertions(+)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 410879cf94..6c099a3869 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -2414,6 +2414,11 @@ void kvm_flush_coalesced_mmio_buffer(void)
>  s->coalesced_flush_in_progress = false;
>  }
>  
> +bool kvm_cpu_check_are_resettable(void)
> +{
> +return kvm_arch_cpu_check_are_resettable();
> +}
> +
>  static void do_kvm_cpu_synchronize_state(CPUState *cpu, run_on_cpu_data arg)
>  {
>  if (!cpu->vcpu_dirty) {
> diff --git a/include/sysemu/cpus.h b/include/sysemu/cpus.h
> index e8156728c6..1cb4f9dbeb 100644
> --- a/include/sysemu/cpus.h
> +++ b/include/sysemu/cpus.h
> @@ -57,6 +57,8 @@ extern int icount_align_option;
>  /* Unblock cpu */
>  void qemu_cpu_kick_self(void);
>  
> +bool cpus_are_resettable(void);
> +
>  void cpu_synchronize_all_states(void);
>  void cpu_synchronize_all_post_reset(void);
>  void cpu_synchronize_all_post_init(void);
> diff --git a/include/sysemu/hw_accel.h b/include/sysemu/hw_accel.h
> index ffed6192a3..61672f9b32 100644
> --- a/include/sysemu/hw_accel.h
> +++ b/include/sysemu/hw_accel.h
> @@ -22,4 +22,9 @@ void cpu_synchronize_post_reset(CPUState *cpu);
>  void cpu_synchronize_post_init(CPUState *cpu);
>  void cpu_synchronize_pre_loadvm(CPUState *cpu);
>  
> +static inline bool cpu_check_are_resettable(void)
> +{
> +return kvm_enabled() ? kvm_cpu_check_are_resettable() : true;
> +}
> +
>  #endif /* QEMU_HW_ACCEL_H */
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 875ca101e3..3e265cea3d 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -573,4 +573,14 @@ int kvm_get_max_memslots(void);
>  /* Notify resamplefd for EOI of specific interrupts. */
>  void kvm_resample_fd_notify(int gsi);
>  
> +/**
> + * kvm_cpu_check_are_resettable - return whether CPUs can be reset
> + *
> + * Returns: true: CPUs are resettable
> + *  false: CPUs are not resettable
> + */
> +bool kvm_cpu_check_are_resettable(void);
> +
> +bool kvm_arch_cpu_check_are_resettable(void);
> +
>  #endif
> diff --git a/softmmu/cpus.c b/softmmu/cpus.c
> index 1dc20b9dc3..89de46eae0 100644
> --- a/softmmu/cpus.c
> +++ b/softmmu/cpus.c
> @@ -194,6 +194,11 @@ void cpu_synchronize_pre_loadvm(CPUState *cpu)
>  }
>  }
>  
> +bool cpus_are_resettable(void)
> +{
> +return cpu_check_are_resettable();
> +}
> +
>  int64_t cpus_get_virtual_clock(void)
>  {
>  /*
> diff --git a/softmmu/runstate.c b/softmmu/runstate.c
> index beee050815..1813691898 100644
> --- a/softmmu/runstate.c
> +++ b/softmmu/runstate.c
> @@ -527,6 +527,9 @@ void qemu_system_reset_request(ShutdownCause reason)
>  if (reboot_action == REBOOT_ACTION_SHUTDOWN &&
>  reason != SHUTDOWN_CAUSE_SUBSYSTEM_RESET) {
>  shutdown_requested = reason;
> +} else if (!cpus_are_resettable()) {
> +error_report("cpus are not resettable, terminating");
> +shutdown_requested = reason;
>  } else {
>  reset_requested = reason;
>  }
> diff --git a/target/arm/kvm.c b/target/arm/kvm.c
> index ffe186de8d..00e124c812 100644
> --- a/target/arm/kvm.c
> +++ b/target/arm/kvm.c
> @@ -1045,3 +1045,8 @@ int kvm_arch_msi_data_to_gsi(uint32_t data)
>  {
>  return (data - 32) & 0x;
>  }
> +
> +bool kvm_arch_cpu_check_are_resettable(void)
> +{
> +return true;
> +}
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kv

Re: [PATCH v6 5/6] kvm/i386: Use a per-VM check for SMM capability

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:48 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> SMM is not currently supported for an SEV-ES guest by KVM. Change the SMM
> capability check from a KVM-wide check to a per-VM check in order to have
> a finer-grained SMM capability check.
> 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Suggested-by: Sean Christopherson 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Venu Busireddy 

> ---
>  target/i386/kvm/kvm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
> index bb6bfc19de..37fca43cd9 100644
> --- a/target/i386/kvm/kvm.c
> +++ b/target/i386/kvm/kvm.c
> @@ -135,7 +135,7 @@ int kvm_has_pit_state2(void)
>  
>  bool kvm_has_smm(void)
>  {
> -return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM);
> +return kvm_vm_check_extension(kvm_state, KVM_CAP_X86_SMM);
>  }
>  
>  bool kvm_has_adjust_clock_stable(void)
> -- 
> 2.30.0
>

Re: [PATCH v6 3/6] sev/i386: Allow AP booting under SEV-ES

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:46 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> When SEV-ES is enabled, it is not possible modify the guests register
> state after it has been initially created, encrypted and measured.
> 
> Normally, an INIT-SIPI-SIPI request is used to boot the AP. However, the
> hypervisor cannot emulate this because it cannot update the AP register
> state. For the very first boot by an AP, the reset vector CS segment
> value and the EIP value must be programmed before the register has been
> encrypted and measured. Search the guest firmware for the guest for a
> specific GUID that tells Qemu the value of the reset vector to use.
> 
> Cc: Paolo Bonzini 
> Cc: "Michael S. Tsirkin" 
> Cc: Marcel Apfelbaum 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Cc: Marcelo Tosatti 
> Signed-off-by: Tom Lendacky 
> ---
>  accel/kvm/kvm-all.c| 64 
>  accel/stubs/kvm-stub.c |  5 +++
>  hw/i386/pc_sysfw.c | 10 +-
>  include/sysemu/kvm.h   | 16 +
>  include/sysemu/sev.h   |  3 ++
>  target/i386/kvm/kvm.c  |  2 ++
>  target/i386/sev.c  | 74 ++
>  7 files changed, 173 insertions(+), 1 deletion(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index 3feb17d965..410879cf94 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -39,6 +39,7 @@
>  #include "qemu/main-loop.h"
>  #include "trace.h"
>  #include "hw/irq.h"
> +#include "sysemu/kvm.h"
>  #include "sysemu/sev.h"
>  #include "qapi/visitor.h"
>  #include "qapi/qapi-types-common.h"
> @@ -126,6 +127,12 @@ struct KVMState
>  /* memory encryption */
>  void *memcrypt_handle;
>  int (*memcrypt_encrypt_data)(void *handle, uint8_t *ptr, uint64_t len);
> +int (*memcrypt_save_reset_vector)(void *handle, void *flash_ptr,
> +  uint64_t flash_size, uint32_t *addr);
> +
> +uint32_t reset_cs;
> +uint32_t reset_ip;
> +bool reset_data_valid;
>  
>  /* For "info mtree -f" to tell if an MR is registered in KVM */
>  int nr_as;
> @@ -245,6 +252,62 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
>  return 1;
>  }
>  
> +void kvm_memcrypt_set_reset_vector(CPUState *cpu)
> +{
> +X86CPU *x86;
> +CPUX86State *env;
> +
> +/* Only update if we have valid reset information */
> +if (!kvm_state->reset_data_valid) {
> +return;
> +}
> +
> +/* Do not update the BSP reset state */
> +if (cpu->cpu_index == 0) {
> +return;
> +}
> +
> +x86 = X86_CPU(cpu);
> +env = >env;
> +
> +cpu_x86_load_seg_cache(env, R_CS, 0xf000, kvm_state->reset_cs, 0x,
> +   DESC_P_MASK | DESC_S_MASK | DESC_CS_MASK |
> +   DESC_R_MASK | DESC_A_MASK);
> +
> +env->eip = kvm_state->reset_ip;
> +}
> +
> +int kvm_memcrypt_save_reset_vector(void *flash_ptr, uint64_t flash_size)
> +{
> +CPUState *cpu;
> +uint32_t addr;
> +int ret;
> +
> +if (kvm_memcrypt_enabled() &&
> +kvm_state->memcrypt_save_reset_vector) {
> +
> +addr = 0;
> +ret = 
> kvm_state->memcrypt_save_reset_vector(kvm_state->memcrypt_handle,
> +flash_ptr, flash_size,
> +);
> +if (ret) {
> +return ret;
> +}
> +
> +if (addr) {
> +kvm_state->reset_cs = addr & 0x;
> +kvm_state->reset_ip = addr & 0x;
> +kvm_state->reset_data_valid = true;
> +
> +CPU_FOREACH(cpu) {
> +kvm_memcrypt_set_reset_vector(cpu);
> +}
> +}
> +}
> +
> +return 0;
> +}
> +
>  /* Called with KVMMemoryListener.slots_lock held */
>  static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
>  {
> @@ -2216,6 +2279,7 @@ static int kvm_init(MachineState *ms)
>  }
>  
>  kvm_state->memcrypt_encrypt_data = sev_encrypt_data;
> +kvm_state->memcrypt_save_reset_vector = sev_es_save_reset_vector;
>  }
>  
>  ret = kvm_arch_init(ms, s);
> diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
> index 680e099463..162c28429e 100644
> --- a/accel/stubs/kvm-stub.c
> +++ b/accel/stubs/kvm-stub.c
> @@ -91,6 +91,11 @@ int kvm_memcrypt_encrypt_data(uint8_t *ptr, uint64_t len)
>return 1;
>  }
>  
> +int kvm_memcrypt_save_reset_vector(void *flash_ptr, uint64_t flash_size)
> +{
> +return -ENOSYS;
> +}
> +
>  #ifndef CONFIG_USER_ONLY
>  int kvm_irqchip_add_msi_route(KVMState *s, int vector, PCIDevice *dev)
>  {
> diff --git a/hw/i386/pc_sysfw.c b/hw/i386/pc_sysfw.c
> index 436b78c587..edec28842d 100644
> --- a/hw/i386/pc_sysfw.c
> +++ b/hw/i386/pc_sysfw.c
> @@ -248,7 +248,8 @@ static void pc_system_flash_map(PCMachineState *pcms,
>  PFlashCFI01 *system_flash;
>  MemoryRegion *flash_mem;
>  void *flash_ptr;
> -int ret, flash_size;
> +uint64_t

Re: [PATCH v6 2/6] sev/i386: Require in-kernel irqchip support for SEV-ES guests

2021-01-29 Thread Venu Busireddy

On 2021-01-26 11:36:45 -0600, Tom Lendacky wrote:
> From: Tom Lendacky 
> 
> In prep for AP booting, require the use of in-kernel irqchip support. This
> lessens the Qemu support burden required to boot APs.
> 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Reviewed-by: Dr. David Alan Gilbert 
> Signed-off-by: Tom Lendacky 

Reviewed-by: Venu Busireddy 

> ---
>  target/i386/sev.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/target/i386/sev.c b/target/i386/sev.c
> index fce2128c07..ddec7ebaa7 100644
> --- a/target/i386/sev.c
> +++ b/target/i386/sev.c
> @@ -776,6 +776,12 @@ sev_guest_init(const char *id)
>  sev->api_minor = status.api_minor;
>  
>  if (sev_es_enabled()) {
> +if (!kvm_kernel_irqchip_allowed()) {
> +error_report("%s: SEV-ES guests require in-kernel irqchip 
> support",
> + __func__);
> +goto err;
> +}
> +
>  if (!(status.flags & SEV_STATUS_FLAGS_CONFIG_ES)) {
>  error_report("%s: guest policy requires SEV-ES, but "
>   "host SEV-ES support unavailable",
> -- 
> 2.30.0
>

Re: [Qemu-devel] [PATCH v2 06/13] doc: update AMD SEV to include Live migration flow

2019-07-24 Thread Venu Busireddy

On 2019-07-10 20:23:03 +, Singh, Brijesh wrote:
> Signed-off-by: Brijesh Singh 
> ---
>  docs/amd-memory-encryption.txt | 42 +-
>  1 file changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/amd-memory-encryption.txt b/docs/amd-memory-encryption.txt
> index abb9a976f5..374f4b0a94 100644
> --- a/docs/amd-memory-encryption.txt
> +++ b/docs/amd-memory-encryption.txt
> @@ -89,7 +89,47 @@ TODO
>  
>  Live Migration
>  
> -TODO
> +AMD SEV encrypts the memory of VMs and because a different key is used
> +in each VM, the hypervisor will be unable to simply copy the
> +ciphertext from one VM to another to migrate the VM. Instead the AMD SEV Key
> +Management API provides sets of function which the hypervisor can use
> +to package a guest page for migration, while maintaining the confidentiality
> +provided by AMD SEV.
> +
> +SEV guest VMs have the concept of private and shared memory. The private
> +memory is encrypted with the guest-specific key, while shared memory may
> +be encrypted with the hypervisor key. The migration APIs provided by the
> +SEV API spec should be used for migrating the private pages. The
> +KVM_GET_PAGE_ENC_BITMAP ioctl can be used to get the guest page encryption
> +bitmap. The bitmap can be used to check if the given guest page is
> +private or shared.
> +
> +Before initiating the migration, we need to know the targets machine's public
> +Diffie-Hellman key (PDH) and certificate chain. It can be retrieved
> +with the 'query-sev-capabilities' QMP command or using the sev-tool. The
> +migrate-set-sev-info object can be used to pass the target machine's PDH and
> +certificate chain.
> +
> +e.g
> +(QMP) migrate-sev-set-info pdh= plat-cert= \

'migrate-sev-set-info' needs to be changed to 'migrate-set-sev-info'.

> +   amd-cert=
> +(QMP) migrate tcp:0:
> +
> +
> +During the migration flow, the SEND_START is called on the source hypervisor
> +to create outgoing encryption context. The SEV guest policy dectates whether
> +the certificate passed through the migrate-sev-set-info command will be

Same here.

> +validate. SEND_UPDATE_DATA is called to encrypt the guest private pages.
> +After migration is completed, SEND_FINISH is called to destroy the encryption
> +context and make the VM non-runnable to protect it against the cloning.
> +
> +On the target machine, RECEIVE_START is called first to create an
> +incoming encryption context. The RECEIVE_UPDATE_DATA is called to copy
> +the receieved encrypted page into guest memory. After migration has
> +completed, RECEIVE_FINISH is called to make the VM runnable.
> +
> +For more information about the migration see SEV API Appendix A
> +Usage flow (Live migration section).
>  
>  References
>  -
> -- 
> 2.17.1
> 
>

Re: [Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

2019-01-08 Thread Venu Busireddy

On 2019-01-09 00:56:38 +0800, Dongli Zhang wrote:
> I am not familiar with libvirt and I would like to play with this only with 
> qemu.
> 
> With failover, we need to hotplug the VF on destination server to VM after 
> live
> migration. However, the VF on destination server would have different mac 
> address.
> 
> How can we specify the mac for the new VF to hotplug via qemu, as VF is only a
> vfio pci device?

How is the VF device on the destination host any different from the VF
on the source host?

As you do on the source host, you first assign the MAC address of
00:00:00:00:00:00 to the VF. After the migration, you assign the same
MAC address as that of the virtio_net device to the VF, and hotadd the VF
device to the VM. And then, after you receive the FAILOVER_PRIMARY_CHANGED
event, set the macvtap device to down state.

Venu

> 
> I am trying to play with this with only qemu (w/o libvirt).
> 
> Thank you very much!
> 
> Dongli Zhang
> 
> On 01/08/2019 06:29 AM, Venu Busireddy wrote:
> > From: Sridhar Samudrala 
> > 
> > This feature bit can be used by a hypervisor to indicate to the virtio_net
> > device that it can act as a standby for another device with the same MAC
> > address.
> > 
> > Signed-off-by: Sridhar Samudrala 
> > Signed-off-by: Venu Busireddy 
> > ---
> >  hw/net/virtio-net.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 385b1a0..411f8fb 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
> >   true),
> >  DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
> >  DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
> > +DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
> > VIRTIO_NET_F_STANDBY,
> > +  false),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> >

[Qemu-devel] [PATCH v3 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

2019-01-07 Thread Venu Busireddy

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c| 16 +++
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h  |  2 ++
 net/net.c  | 61 ++
 qapi/net.json  | 46 +++
 5 files changed, 126 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, 
uint8_t status)
  */
 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+n->standby_enabled = true;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
 } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+n->standby_enabled = false;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
 }
 }
@@ -448,6 +450,19 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+StandbyStatusInfo *info;
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+info = g_malloc0(sizeof(*info));
+info->device = g_strdup(n->netclient_name);
+info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+info->enabled = n->standby_enabled;
+
+return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
 .receive = virtio_net_receive,
 .link_status_changed = virtio_net_set_link_status,
 .query_rx_filter = virtio_net_query_rxfilter,
+.query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
 int announce_counter;
 bool needs_vnet_hdr_swap;
 bool mtu_bypass_backend;
+bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
+QueryStandbyStatus *query_standby_status;
 NetPoll *poll;
 HasUfo *has_ufo;
 HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..fbf288e 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,67 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, 
const char *name,
 return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+const char *device,
+Error **errp)
+{
+NetClientState *nc;
+StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+QTAILQ_FOREACH(nc, _clients, next) {
+StandbyStatusInfoList *entry;
+StandbyStatusInfo *info;
+
+if (has_device && strcmp(nc->name, device) != 0) {
+continue;
+}
+
+/* only query standby status information of NIC */
+if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+if (has_device) {
+error_setg(errp, "net client(%s) isn't a NIC", device);
+return NULL;
+}
+continue;
+}
+
+/*
+ * only query information on queue 0 since the info is per nic,
+ * not per queue.
+ */
+if (nc->queue_index != 0) {
+continue;
+}
+
+if (nc->info->query_standby_status) {
+info = nc->info->query_standby_status(nc);
+entry = g_malloc0(sizeof(*entry));
+entry->value = info;
+
+if (!status_li

[Qemu-devel] [PATCH v3 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hmp.c  | 5 +
 hw/pci/pci.c   | 5 +
 qapi/misc.json | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 7828f93..7a75c93 100644
--- a/hmp.c
+++ b/hmp.c
@@ -890,6 +890,11 @@ static void hmp_info_pci_device(Monitor *mon, const 
PciDeviceInfo *dev)
 }
 }
 
+if (dev->has_failover_status) {
+monitor_printf(mon, "  Failover primary, bus master %s.\n",
+   dev->failover_status ? "enabled" : "disabled");
+}
+
 monitor_printf(mon, "  id \"%s\"\n", dev->qdev_id);
 
 if (dev->has_pci_bridge) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 56b13b3..9da49fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1761,6 +1761,11 @@ static PciDeviceInfo *qmp_query_pci_device(PCIDevice 
*dev, PCIBus *bus,
 pci_get_word(dev->config + PCI_CB_SUBSYSTEM_VENDOR_ID);
 }
 
+if (dev->failover_primary) {
+info->has_failover_status = true;
+info->failover_status = dev->bus_master_enable_region.enabled;
+}
+
 return info;
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 6c1c5c0..05f003e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -865,6 +865,9 @@
 #
 # @regions: a list of the PCI I/O regions associated with the device
 #
+# @failover_status: if 'failover-primary' property is 'true', true if PCI
+#   bus master bit on the device is enabled
+#
 # Notes: the contents of @class_info.desc are not stable and should only be
 #treated as informational.
 #
@@ -874,7 +877,7 @@
   'data': {'bus': 'int', 'slot': 'int', 'function': 'int',
'class_info': 'PciDeviceClass', 'id': 'PciDeviceId',
'*irq': 'int', 'qdev_id': 'str', '*pci_bridge': 'PciBridgeInfo',
-   'regions': ['PciMemoryRegion']} }
+   'regions': ['PciMemoryRegion'], '*failover_status': 'bool'} }
 
 ##
 # @PciInfo:

[Qemu-devel] [PATCH v3 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hw/vfio/pci.c | 57 +
 qapi/net.json | 26 ++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 uint32_t val_le = cpu_to_le32(val);
+bool may_notify = false;
+bool master_was = false;
 
 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
  __func__, vdev->vbasedev.name, addr, val, len);
 }
 
+/* Bus Master Enabling/Disabling */
+if (pdev->failover_primary && current_cpu &&
+range_covers_byte(addr, len, PCI_COMMAND)) {
+master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+PCI_COMMAND_MASTER);
+may_notify = true;
+}
+
 /* MSI/MSI-X Enabling/Disabling */
 if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
 ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
 /* Write everything to QEMU to keep emulated bits correct */
 pci_default_write_config(pdev, addr, val, len);
 }
+
+if (may_notify) {
+bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+ PCI_COMMAND_MASTER);
+if (master_was != master_now) {
+vfio_failover_notify(vdev, master_now);
+}
+}
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+PCIDevice *pdev = >pdev;
+const char *n;
+gchar *path;
+
+n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+path = object_get_canonical_path(OBJECT(vdev));
+qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
 vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = >pdev;
+
+/*
+ * Guest driver may not get the chance to disable bus mastering
+ * before the device object gets to be unrealized. In that event,
+ * send out a "disabled" notification on behalf of guest driver.
+ */
+if (pdev->failover_primary &&
+pdev->bus_master_enable_region.enabled) {
+vfio_failover_notify(vdev, false);
+}
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+/*
+ * During the guest reboot sequence, it is sometimes possible that
+ * the guest may not get sufficient time to complete the entire driver
+ * removal sequence, near the end of which a PCI config space write to
+ * disable bus mastering can be intercepted by device. In such cases,
+ * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+ * is imperative to generate the event on the guest's behalf if the
+ * guest fails to make it.
+ */
+vfio_exit_failover_notify(vdev);
+
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(>pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 633ac87..a5b8d70 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -757,3 +757,29 @@
 ##
 { 'command': 'query-standby-status', 'data': { '*device': 'str' },
   'returns': ['StandbyStatusInfo'] }

[Qemu-devel] [PATCH v3 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.

2019-01-07 Thread Venu Busireddy

Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
the status of the virtio_net driver in the guest changes (either the
guest successfully loads the driver after the F_STANDBY feature bit
is negotiated, or the guest unloads the driver or reboots). Management
stack can use this event to determine when to plug/unplug the VF device
to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/pcihp.c  | 27 +++
 hw/net/virtio-net.c  | 24 
 hw/vfio/pci.c|  3 +++
 hw/vfio/pci.h|  1 +
 include/hw/pci/pci.h |  1 +
 qapi/net.json| 28 
 6 files changed, 84 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, 
unsigned bsel, unsigned slo
 }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+BusChild *kid, *next;
+PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+if (!bus) {
+return;
+}
+QTAILQ_FOREACH_SAFE(kid, >qbus.children, sibling, next) {
+DeviceState *qdev = kid->child;
+PCIDevice *pdev = PCI_DEVICE(qdev);
+int slot = PCI_SLOT(pdev->devfn);
+
+if (pdev->failover_primary) {
+s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+}
+}
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
 BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
 int i;
 
 for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+/*
+ * Set the acpi_pcihp_pci_status[].down bits of all the
+ * failover_primary devices so that the devices are ejected
+ * from the guest. We can't use the qdev_unplug() as well as the
+ * hotplug_handler to unplug the devices, because the guest may
+ * not be in a state to cooperate.
+ */
+acpi_pcihp_cleanup_failover_primary(s, i);
 acpi_pcihp_update_hotplug_bus(s, i);
 }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..7b1bcde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+const char *ncn = n->netclient_name;
+gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+/*
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
+ *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
+ *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+ */
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+(!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
+} else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
+}
+}
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +279,7 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 uint8_t queue_status;
 
 virtio_net_vnet_endian_status(n, status);
+virtio_net_failover_notify_event(n, status);
 virtio_net_vhost_status(n, status);
 
 for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..bd83b58 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
+pdev->failover_primary = vdev->failover_primary;
 
 return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
 DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
 OFF_AUTOPCIBAR_OFF),

[Qemu-devel] [PATCH v3 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

2019-01-07 Thread Venu Busireddy

From: Sridhar Samudrala 

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
  true),
 DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
 DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
 DEFINE_PROP_END_OF_LIST(),
 };

[Qemu-devel] [PATCH v3 0/5] Support for datapath switching during live migration

2019-01-07 Thread Venu Busireddy

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.


Si-Wei Liu (2):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover
  pci: query command extension to check the bus master enabling status of the 
failover-primary device

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
  virtio_net: Add support for "Data Path Switching" during Live Migration.
  virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v3:
  Fix issues with coding style in patch 3/5.

Changes in v2:
  Added a query command for FAILOVER_STANDBY_CHANGED event.
  Added a query command for FAILOVER_PRIMARY_CHANGED event.

 hmp.c  |   5 +++
 hw/acpi/pcihp.c|  27 +++
 hw/net/virtio-net.c|  42 +
 hw/pci/pci.c   |   5 +++
 hw/vfio/pci.c  |  60 +
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   1 +
 include/hw/virtio/virtio-net.h |   1 +
 include/net/net.h  |   2 +
 net/net.c  |  61 +
 qapi/misc.json |   5 ++-
 qapi/net.json  | 100 +
 12 files changed, 309 insertions(+), 1 deletion(-)

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Venu Busireddy

On 2018-12-10 12:31:43 -0500, Michael S. Tsirkin wrote:
> On Mon, Dec 10, 2018 at 11:15:48AM -0500, Venu Busireddy wrote:
> > From: Si-Wei Liu 
> > 
> > When a VF is hotplugged into the guest, datapath switching will be
> > performed immediately, which is sub-optimal in terms of timing, and
> > could end up with substantial network downtime. One of ways to shorten
> > this downtime is to switch the datapath only after the VF is seen to get
> > enabled by guest, indicated by the bus master bit in VF's PCI config
> > space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
> > at that time to indicate this condition. Then management stack can kick
> > off datapath switching upon receiving the event.
> > 
> > Signed-off-by: Si-Wei Liu 
> > Signed-off-by: Venu Busireddy 
> 
> As management stacks can lose events, it's necessary
> to also have a query command to check device status.

Thanks for the feedback. Implemented the changes, and posted v2:

https://lists.oasis-open.org/archives/virtio-dev/201901/msg00046.html


> > ---
> >  hw/vfio/pci.c | 57 
> > +
> >  qapi/net.json | 26 ++
> >  2 files changed, 83 insertions(+)
> > 
> > diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> > index ce1f33c..ea24ca2 100644
> > --- a/hw/vfio/pci.c
> > +++ b/hw/vfio/pci.c
> > @@ -34,6 +34,7 @@
> >  #include "pci.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "qapi/qapi-events-net.h"
> >  
> >  #define MSIX_CAP_LENGTH 12
> >  
> > @@ -42,6 +43,7 @@
> >  
> >  static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
> >  static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
> >  
> >  /*
> >   * Disabling BAR mmaping can be slow, but toggling it around INTx can
> > @@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  {
> >  VFIOPCIDevice *vdev = PCI_VFIO(pdev);
> >  uint32_t val_le = cpu_to_le32(val);
> > +bool may_notify = false;
> > +bool master_was = false;
> >  
> >  trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
> >  
> > @@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >   __func__, vdev->vbasedev.name, addr, val, len);
> >  }
> >  
> > +/* Bus Master Enabling/Disabling */
> > +if (pdev->failover_primary && current_cpu &&
> > +range_covers_byte(addr, len, PCI_COMMAND)) {
> > +master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > +PCI_COMMAND_MASTER);
> > +may_notify = true;
> > +}
> > +
> >  /* MSI/MSI-X Enabling/Disabling */
> >  if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
> >  ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
> > @@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >  /* Write everything to QEMU to keep emulated bits correct */
> >  pci_default_write_config(pdev, addr, val, len);
> >  }
> > +
> > +if (may_notify) {
> > +bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
> > + PCI_COMMAND_MASTER);
> > +if (master_was != master_now) {
> > +vfio_failover_notify(vdev, master_now);
> > +}
> > +}
> >  }
> >  
> >  /*
> 
> It's very easy to have guest trigger a high load of events by playing
> with the bus master enable bits.  How about instead sending an event
> that just says "something changed" without the current status and have
> management issue a query command to check the status. QEMU then does not
> need to re-send an event until management issues a query command.
> 
> 
> > @@ -2801,6 +2821,17 @@ static void 
> > vfio_unregister_req_notifier(VFIOPCIDevice *vdev)
> >  vdev->req_enabled = false;
> >  }
> >  
> > +static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
> > +{
> > +PCIDevice *pdev = >pdev;
> > +const char *n;
> > +gchar *path;
> > +
> > +n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
> > +path = object_get_canonical_path(OBJECT(vdev));
> > +qapi_event_send_failover_primary_changed(!!n, n, path, status);
> > +}
> >

[Qemu-devel] [PATCH v2 3/5] virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

2019-01-07 Thread Venu Busireddy

Add a query command to check the status of the FAILOVER_STANDBY_CHANGED
state of the virtio_net devices.

Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c| 16 
 include/hw/virtio/virtio-net.h |  1 +
 include/net/net.h  |  2 ++
 net/net.c  | 59 ++
 qapi/net.json  | 46 
 5 files changed, 124 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7b1bcde..a4e07ac 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -263,9 +263,11 @@ static void virtio_net_failover_notify_event(VirtIONet *n, 
uint8_t status)
  */
 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
 (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+n->standby_enabled = true;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
 } else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
 (vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+n->standby_enabled = false;
 qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
 }
 }
@@ -448,6 +450,19 @@ static RxFilterInfo 
*virtio_net_query_rxfilter(NetClientState *nc)
 return info;
 }
 
+static StandbyStatusInfo *virtio_net_query_standby_status(NetClientState *nc)
+{
+StandbyStatusInfo *info;
+VirtIONet *n = qemu_get_nic_opaque(nc);
+
+info = g_malloc0(sizeof(*info));
+info->device = g_strdup(n->netclient_name);
+info->path = g_strdup(object_get_canonical_path(OBJECT(n->qdev)));
+info->enabled = n->standby_enabled;
+
+return info;
+}
+
 static void virtio_net_reset(VirtIODevice *vdev)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -1923,6 +1938,7 @@ static NetClientInfo net_virtio_info = {
 .receive = virtio_net_receive,
 .link_status_changed = virtio_net_set_link_status,
 .query_rx_filter = virtio_net_query_rxfilter,
+.query_standby_status = virtio_net_query_standby_status,
 };
 
 static bool virtio_net_guest_notifier_pending(VirtIODevice *vdev, int idx)
diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index 4d7f3c8..9071e96 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -103,6 +103,7 @@ typedef struct VirtIONet {
 int announce_counter;
 bool needs_vnet_hdr_swap;
 bool mtu_bypass_backend;
+bool standby_enabled;
 } VirtIONet;
 
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
diff --git a/include/net/net.h b/include/net/net.h
index ec13702..61e8513 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -50,6 +50,7 @@ typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
 typedef RxFilterInfo *(QueryRxFilter)(NetClientState *);
+typedef StandbyStatusInfo *(QueryStandbyStatus)(NetClientState *);
 typedef bool (HasUfo)(NetClientState *);
 typedef bool (HasVnetHdr)(NetClientState *);
 typedef bool (HasVnetHdrLen)(NetClientState *, int);
@@ -71,6 +72,7 @@ typedef struct NetClientInfo {
 NetCleanup *cleanup;
 LinkStatusChanged *link_status_changed;
 QueryRxFilter *query_rx_filter;
+QueryStandbyStatus *query_standby_status;
 NetPoll *poll;
 HasUfo *has_ufo;
 HasVnetHdr *has_vnet_hdr;
diff --git a/net/net.c b/net/net.c
index 1f7d626..a6d8e73 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1320,6 +1320,65 @@ RxFilterInfoList *qmp_query_rx_filter(bool has_name, 
const char *name,
 return filter_list;
 }
 
+StandbyStatusInfoList *qmp_query_standby_status(bool has_device,
+const char *device,
+Error **errp)
+{
+NetClientState *nc;
+StandbyStatusInfoList *status_list = NULL, *last_entry = NULL;
+
+QTAILQ_FOREACH(nc, _clients, next) {
+StandbyStatusInfoList *entry;
+StandbyStatusInfo *info;
+
+if (has_device && strcmp(nc->name, device) != 0) {
+continue;
+}
+
+/* only query standby status information of NIC */
+if (nc->info->type != NET_CLIENT_DRIVER_NIC) {
+if (has_device) {
+error_setg(errp, "net client(%s) isn't a NIC", device);
+return NULL;
+}
+continue;
+}
+
+/* only query information on queue 0 since the info is per nic,
+ * not per queue
+ */
+if (nc->queue_index != 0)
+continue;
+
+if (nc->info->query_standby_status) {
+info = nc->info->query_standby_status(nc);
+entry = g_malloc0(sizeof(*entry));
+entry->value = info;
+
+if (!status_list) {
+status_list = entry;
+

Re: [Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.

2019-01-07 Thread Venu Busireddy

On 2018-12-10 13:28:42 -0600, Eric Blake wrote:
> On 12/10/18 10:15 AM, Venu Busireddy wrote:
> > Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
> > The first is emitted when the guest negotiates the F_STANDBY feature
> > bit. The second is emitted when the virtio_net driver is removed, either
> > manually or as a result of guest reboot. Management stack can use these
> > events to determine when to plug/unplug the VF device to/from the guest.
> > 
> > Also, the Virtual Functions will be automatically removed from the guest
> > if the guest is rebooted. To properly identify the VFIO devices that
> > must be removed, a new property named "x-failover-primary" is added to
> > the vfio-pci devices. Only the vfio-pci devices that have this property
> > enabled are removed from the guest upon reboot.
> > 
> > Signed-off-by: Venu Busireddy 
> > ---
> 
> > +++ b/qapi/net.json
> > @@ -683,3 +683,47 @@
> >   ##
> >   { 'event': 'NIC_RX_FILTER_CHANGED',
> > 'data': { '*name': 'str', 'path': 'str' } }
> > +
> > +##
> > +# @FAILOVER_PLUG_PRIMARY:
> > +#
> > +# Emitted when the guest successfully loads the driver after the STANDBY
> > +# feature bit is negotiated.
> > +#
> > +# @device: Indicates the virtio_net device.
> > +#
> > +# @path: Indicates the device path.
> > +#
> > +# Since: 3.0
> 
> You've missed 3.0, and even 3.1.  This should be 4.0.

Sorry for the delay in responding...

Posted v2 including your suggestion at
https://lists.oasis-open.org/archives/virtio-dev/201901/msg00046.html

Thanks!

> 
> > +#
> > +# Example:
> > +#
> > +# <- {"timestamp": {"seconds": 1432121972, "microseconds": 744001},
> > +# "event": "FAILOVER_PLUG_PRIMARY",
> > +# "data": {"device": "net0", "path": 
> > "/machine/peripheral/net0/virtio-backend"} }
> > +#
> > +##
> > +{ 'event': 'FAILOVER_PLUG_PRIMARY',
> > +  'data': {'*device': 'str', 'path': 'str'} }
> > +
> > +##
> > +# @FAILOVER_UNPLUG_PRIMARY:
> > +#
> > +# Emitted when the guest resets the virtio_net driver.
> > +# The reset can be the result of either unloading the driver or a reboot.
> > +#
> > +# @device: Indicates the virtio_net device.
> > +#
> > +# @path: Indicates the device path.
> > +#
> > +# Since: 3.0
> 
> and again
> 
> 
> -- 
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3266
> Virtualization:  qemu.org | libvirt.org

[Qemu-devel] [PATCH v2 5/5] pci: query command extension to check the bus master enabling status of the failover-primary device

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hmp.c  | 5 +
 hw/pci/pci.c   | 5 +
 qapi/misc.json | 5 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/hmp.c b/hmp.c
index 7828f93..7a75c93 100644
--- a/hmp.c
+++ b/hmp.c
@@ -890,6 +890,11 @@ static void hmp_info_pci_device(Monitor *mon, const 
PciDeviceInfo *dev)
 }
 }
 
+if (dev->has_failover_status) {
+monitor_printf(mon, "  Failover primary, bus master %s.\n",
+   dev->failover_status ? "enabled" : "disabled");
+}
+
 monitor_printf(mon, "  id \"%s\"\n", dev->qdev_id);
 
 if (dev->has_pci_bridge) {
diff --git a/hw/pci/pci.c b/hw/pci/pci.c
index 56b13b3..9da49fd 100644
--- a/hw/pci/pci.c
+++ b/hw/pci/pci.c
@@ -1761,6 +1761,11 @@ static PciDeviceInfo *qmp_query_pci_device(PCIDevice 
*dev, PCIBus *bus,
 pci_get_word(dev->config + PCI_CB_SUBSYSTEM_VENDOR_ID);
 }
 
+if (dev->failover_primary) {
+info->has_failover_status = true;
+info->failover_status = dev->bus_master_enable_region.enabled;
+}
+
 return info;
 }
 
diff --git a/qapi/misc.json b/qapi/misc.json
index 6c1c5c0..05f003e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -865,6 +865,9 @@
 #
 # @regions: a list of the PCI I/O regions associated with the device
 #
+# @failover_status: if 'failover-primary' property is 'true', true if PCI
+#   bus master bit on the device is enabled
+#
 # Notes: the contents of @class_info.desc are not stable and should only be
 #treated as informational.
 #
@@ -874,7 +877,7 @@
   'data': {'bus': 'int', 'slot': 'int', 'function': 'int',
'class_info': 'PciDeviceClass', 'id': 'PciDeviceId',
'*irq': 'int', 'qdev_id': 'str', '*pci_bridge': 'PciBridgeInfo',
-   'regions': ['PciMemoryRegion']} }
+   'regions': ['PciMemoryRegion'], '*failover_status': 'bool'} }
 
 ##
 # @PciInfo:

[Qemu-devel] [PATCH v2 1/5] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

2019-01-07 Thread Venu Busireddy

From: Sridhar Samudrala 

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
  true),
 DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
 DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
 DEFINE_PROP_END_OF_LIST(),
 };

[Qemu-devel] [PATCH v2 2/5] virtio_net: Add support for "Data Path Switching" during Live Migration.

2019-01-07 Thread Venu Busireddy

Added a new event, FAILOVER_STANDBY_CHANGED, which is emitted whenever
the status of the virtio_net driver in the guest changes (either the
guest successfully loads the driver after the F_STANDBY feature bit
is negotiated, or the guest unloads the driver or reboots). Management
stack can use this event to determine when to plug/unplug the VF device
to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/pcihp.c  | 27 +++
 hw/net/virtio-net.c  | 24 
 hw/vfio/pci.c|  3 +++
 hw/vfio/pci.h|  1 +
 include/hw/pci/pci.h |  1 +
 qapi/net.json| 28 
 6 files changed, 84 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, 
unsigned bsel, unsigned slo
 }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+BusChild *kid, *next;
+PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+if (!bus) {
+return;
+}
+QTAILQ_FOREACH_SAFE(kid, >qbus.children, sibling, next) {
+DeviceState *qdev = kid->child;
+PCIDevice *pdev = PCI_DEVICE(qdev);
+int slot = PCI_SLOT(pdev->devfn);
+
+if (pdev->failover_primary) {
+s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+}
+}
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
 BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
 int i;
 
 for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+/*
+ * Set the acpi_pcihp_pci_status[].down bits of all the
+ * failover_primary devices so that the devices are ejected
+ * from the guest. We can't use the qdev_unplug() as well as the
+ * hotplug_handler to unplug the devices, because the guest may
+ * not be in a state to cooperate.
+ */
+acpi_pcihp_cleanup_failover_primary(s, i);
 acpi_pcihp_update_hotplug_bus(s, i);
 }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..7b1bcde 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,29 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+const char *ncn = n->netclient_name;
+gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+/*
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=true
+ *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+ * Emit FAILOVER_STANDBY_CHANGED event with enabled=false
+ *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+ */
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+(!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, true);
+} else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+qapi_event_send_failover_standby_changed(!!ncn, ncn, path, false);
+}
+}
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +279,7 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 uint8_t queue_status;
 
 virtio_net_vnet_endian_status(n, status);
+virtio_net_failover_notify_event(n, status);
 virtio_net_vhost_status(n, status);
 
 for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..bd83b58 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
+pdev->failover_primary = vdev->failover_primary;
 
 return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
 DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
 OFF_AUTOPCIBAR_OFF),

[Qemu-devel] [PATCH v2 0/5] Support for datapath switching during live migration

2019-01-07 Thread Venu Busireddy

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.


Si-Wei Liu (2):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover
  pci: query command extension to check the bus master enabling status of the 
failover-primary device

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (2):
  virtio_net: Add support for "Data Path Switching" during Live Migration.
  virtio_net: Add a query command for FAILOVER_STANDBY_CHANGED event.

---
Changes in v2:
  Added a query command for FAILOVER_STANDBY_CHANGED event.
  Added a query command for FAILOVER_PRIMARY_CHANGED event.

 hmp.c  |   5 +++
 hw/acpi/pcihp.c|  27 +++
 hw/net/virtio-net.c|  42 +
 hw/pci/pci.c   |   5 +++
 hw/vfio/pci.c  |  60 +
 hw/vfio/pci.h  |   1 +
 include/hw/pci/pci.h   |   1 +
 include/hw/virtio/virtio-net.h |   1 +
 include/net/net.h  |   2 +
 net/net.c  |  59 
 qapi/misc.json |   5 ++-
 qapi/net.json  | 100 +
 12 files changed, 307 insertions(+), 1 deletion(-)

[Qemu-devel] [PATCH v2 4/5] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2019-01-07 Thread Venu Busireddy

From: Si-Wei Liu 

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hw/vfio/pci.c | 57 +
 qapi/net.json | 26 ++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index bd83b58..adcc95a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 uint32_t val_le = cpu_to_le32(val);
+bool may_notify = false;
+bool master_was = false;
 
 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
  __func__, vdev->vbasedev.name, addr, val, len);
 }
 
+/* Bus Master Enabling/Disabling */
+if (pdev->failover_primary && current_cpu &&
+range_covers_byte(addr, len, PCI_COMMAND)) {
+master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+PCI_COMMAND_MASTER);
+may_notify = true;
+}
+
 /* MSI/MSI-X Enabling/Disabling */
 if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
 ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
 /* Write everything to QEMU to keep emulated bits correct */
 pci_default_write_config(pdev, addr, val, len);
 }
+
+if (may_notify) {
+bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+ PCI_COMMAND_MASTER);
+if (master_was != master_now) {
+vfio_failover_notify(vdev, master_now);
+}
+}
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+PCIDevice *pdev = >pdev;
+const char *n;
+gchar *path;
+
+n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+path = object_get_canonical_path(OBJECT(vdev));
+qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
 vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = >pdev;
+
+/*
+ * Guest driver may not get the chance to disable bus mastering
+ * before the device object gets to be unrealized. In that event,
+ * send out a "disabled" notification on behalf of guest driver.
+ */
+if (pdev->failover_primary &&
+pdev->bus_master_enable_region.enabled) {
+vfio_failover_notify(vdev, false);
+}
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+/*
+ * During the guest reboot sequence, it is sometimes possible that
+ * the guest may not get sufficient time to complete the entire driver
+ * removal sequence, near the end of which a PCI config space write to
+ * disable bus mastering can be intercepted by device. In such cases,
+ * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+ * is imperative to generate the event on the guest's behalf if the
+ * guest fails to make it.
+ */
+vfio_exit_failover_notify(vdev);
+
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(>pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 633ac87..a5b8d70 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -757,3 +757,29 @@
 ##
 { 'command': 'query-standby-status', 'data': { '*device': 'str' },
   'returns': ['StandbyStatusInfo'] }

[Qemu-devel] [PATCH 3/3] vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during failover

2018-12-10 Thread Venu Busireddy

From: Si-Wei Liu 

When a VF is hotplugged into the guest, datapath switching will be
performed immediately, which is sub-optimal in terms of timing, and
could end up with substantial network downtime. One of ways to shorten
this downtime is to switch the datapath only after the VF is seen to get
enabled by guest, indicated by the bus master bit in VF's PCI config
space getting enabled. The FAILOVER_PRIMARY_CHANGED event is emitted
at that time to indicate this condition. Then management stack can kick
off datapath switching upon receiving the event.

Signed-off-by: Si-Wei Liu 
Signed-off-by: Venu Busireddy 
---
 hw/vfio/pci.c | 57 +
 qapi/net.json | 26 ++
 2 files changed, 83 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index ce1f33c..ea24ca2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -34,6 +34,7 @@
 #include "pci.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "qapi/qapi-events-net.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -42,6 +43,7 @@
 
 static void vfio_disable_interrupts(VFIOPCIDevice *vdev);
 static void vfio_mmap_set_enabled(VFIOPCIDevice *vdev, bool enabled);
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status);
 
 /*
  * Disabling BAR mmaping can be slow, but toggling it around INTx can
@@ -1170,6 +1172,8 @@ void vfio_pci_write_config(PCIDevice *pdev,
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 uint32_t val_le = cpu_to_le32(val);
+bool may_notify = false;
+bool master_was = false;
 
 trace_vfio_pci_write_config(vdev->vbasedev.name, addr, val, len);
 
@@ -1180,6 +1184,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
  __func__, vdev->vbasedev.name, addr, val, len);
 }
 
+/* Bus Master Enabling/Disabling */
+if (pdev->failover_primary && current_cpu &&
+range_covers_byte(addr, len, PCI_COMMAND)) {
+master_was = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+PCI_COMMAND_MASTER);
+may_notify = true;
+}
+
 /* MSI/MSI-X Enabling/Disabling */
 if (pdev->cap_present & QEMU_PCI_CAP_MSI &&
 ranges_overlap(addr, len, pdev->msi_cap, vdev->msi_cap_size)) {
@@ -1235,6 +1247,14 @@ void vfio_pci_write_config(PCIDevice *pdev,
 /* Write everything to QEMU to keep emulated bits correct */
 pci_default_write_config(pdev, addr, val, len);
 }
+
+if (may_notify) {
+bool master_now = !!(pci_get_word(pdev->config + PCI_COMMAND) &
+ PCI_COMMAND_MASTER);
+if (master_was != master_now) {
+vfio_failover_notify(vdev, master_now);
+}
+}
 }
 
 /*
@@ -2801,6 +2821,17 @@ static void vfio_unregister_req_notifier(VFIOPCIDevice 
*vdev)
 vdev->req_enabled = false;
 }
 
+static void vfio_failover_notify(VFIOPCIDevice *vdev, bool status)
+{
+PCIDevice *pdev = >pdev;
+const char *n;
+gchar *path;
+
+n = pdev->qdev.id ? pdev->qdev.id : vdev->vbasedev.name;
+path = object_get_canonical_path(OBJECT(vdev));
+qapi_event_send_failover_primary_changed(!!n, n, path, status);
+}
+
 static void vfio_realize(PCIDevice *pdev, Error **errp)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
@@ -3109,10 +3140,36 @@ static void vfio_instance_finalize(Object *obj)
 vfio_put_group(group);
 }
 
+static void vfio_exit_failover_notify(VFIOPCIDevice *vdev)
+{
+PCIDevice *pdev = >pdev;
+
+/*
+ * Guest driver may not get the chance to disable bus mastering
+ * before the device object gets to be unrealized. In that event,
+ * send out a "disabled" notification on behalf of guest driver.
+ */
+if (pdev->failover_primary &&
+pdev->bus_master_enable_region.enabled) {
+vfio_failover_notify(vdev, false);
+}
+}
+
 static void vfio_exitfn(PCIDevice *pdev)
 {
 VFIOPCIDevice *vdev = PCI_VFIO(pdev);
 
+/*
+ * During the guest reboot sequence, it is sometimes possible that
+ * the guest may not get sufficient time to complete the entire driver
+ * removal sequence, near the end of which a PCI config space write to
+ * disable bus mastering can be intercepted by device. In such cases,
+ * the FAILOVER_PRIMARY_CHANGED "disable" event will not be emitted. It
+ * is imperative to generate the event on the guest's behalf if the
+ * guest fails to make it.
+ */
+vfio_exit_failover_notify(vdev);
+
 vfio_unregister_req_notifier(vdev);
 vfio_unregister_err_notifier(vdev);
 pci_device_set_intx_routing_notifier(>pdev, NULL);
diff --git a/qapi/net.json b/qapi/net.json
index 04a9de9..e5992c8 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -727,3 +727,29 @@
 ##
 { 'event': 'FAILOVER_UNPLUG_PRIMARY',
   'data': {'*device': 'str', 'path': 'str'} }
+
+##
+# @FAILOVER_PRIMAR

[Qemu-devel] [PATCH 2/3] virtio_net: Add support for "Data Path Switching" during Live Migration.

2018-12-10 Thread Venu Busireddy

Added two new events, FAILOVER_PLUG_PRIMARY and FAILOVER_UNPLUG_PRIMARY.
The first is emitted when the guest negotiates the F_STANDBY feature
bit. The second is emitted when the virtio_net driver is removed, either
manually or as a result of guest reboot. Management stack can use these
events to determine when to plug/unplug the VF device to/from the guest.

Also, the Virtual Functions will be automatically removed from the guest
if the guest is rebooted. To properly identify the VFIO devices that
must be removed, a new property named "x-failover-primary" is added to
the vfio-pci devices. Only the vfio-pci devices that have this property
enabled are removed from the guest upon reboot.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/pcihp.c| 27 ++
 hw/net/virtio-net.c| 23 ++
 hw/vfio/pci.c  |  3 +++
 hw/vfio/pci.h  |  1 +
 include/hw/pci/pci.h   |  1 +
 include/hw/virtio/virtio-net.h |  4 
 qapi/net.json  | 44 ++
 7 files changed, 103 insertions(+)

diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
index 80d42e1..2a3ffd3 100644
--- a/hw/acpi/pcihp.c
+++ b/hw/acpi/pcihp.c
@@ -176,6 +176,25 @@ static void acpi_pcihp_eject_slot(AcpiPciHpState *s, 
unsigned bsel, unsigned slo
 }
 }
 
+static void acpi_pcihp_cleanup_failover_primary(AcpiPciHpState *s, int bsel)
+{
+BusChild *kid, *next;
+PCIBus *bus = acpi_pcihp_find_hotplug_bus(s, bsel);
+
+if (!bus) {
+return;
+}
+QTAILQ_FOREACH_SAFE(kid, >qbus.children, sibling, next) {
+DeviceState *qdev = kid->child;
+PCIDevice *pdev = PCI_DEVICE(qdev);
+int slot = PCI_SLOT(pdev->devfn);
+
+if (pdev->failover_primary) {
+s->acpi_pcihp_pci_status[bsel].down |= (1U << slot);
+}
+}
+}
+
 static void acpi_pcihp_update_hotplug_bus(AcpiPciHpState *s, int bsel)
 {
 BusChild *kid, *next;
@@ -207,6 +226,14 @@ static void acpi_pcihp_update(AcpiPciHpState *s)
 int i;
 
 for (i = 0; i < ACPI_PCIHP_MAX_HOTPLUG_BUS; ++i) {
+/*
+ * Set the acpi_pcihp_pci_status[].down bits of all the
+ * failover_primary devices so that the devices are ejected
+ * from the guest. We can't use the qdev_unplug() as well as the
+ * hotplug_handler to unplug the devices, because the guest may
+ * not be in a state to cooperate.
+ */
+acpi_pcihp_cleanup_failover_primary(s, i);
 acpi_pcihp_update_hotplug_bus(s, i);
 }
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 411f8fb..d37f33c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -248,6 +248,28 @@ static void virtio_net_drop_tx_queue_data(VirtIODevice 
*vdev, VirtQueue *vq)
 }
 }
 
+static void virtio_net_failover_notify_event(VirtIONet *n, uint8_t status)
+{
+VirtIODevice *vdev = VIRTIO_DEVICE(n);
+
+if (virtio_has_feature(vdev->guest_features, VIRTIO_NET_F_STANDBY)) {
+gchar *path = object_get_canonical_path(OBJECT(n->qdev));
+/*
+ * Emit the FAILOVER_PLUG_PRIMARY event
+ *   when the status transitions from 0 to VIRTIO_CONFIG_S_DRIVER_OK
+ * Emit the FAILOVER_UNPLUG_PRIMARY event
+ *   when the status transitions from VIRTIO_CONFIG_S_DRIVER_OK to 0
+ */
+if ((status & VIRTIO_CONFIG_S_DRIVER_OK) &&
+(!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK))) {
+FAILOVER_NOTIFY_EVENT(plug, n, path);
+} else if ((!(status & VIRTIO_CONFIG_S_DRIVER_OK)) &&
+(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) {
+FAILOVER_NOTIFY_EVENT(unplug, n, path);
+}
+}
+}
+
 static void virtio_net_set_status(struct VirtIODevice *vdev, uint8_t status)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
@@ -256,6 +278,7 @@ static void virtio_net_set_status(struct VirtIODevice 
*vdev, uint8_t status)
 uint8_t queue_status;
 
 virtio_net_vnet_endian_status(n, status);
+virtio_net_failover_notify_event(n, status);
 virtio_net_vhost_status(n, status);
 
 for (i = 0; i < n->max_queues; i++) {
diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5c7bd96..ce1f33c 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -3077,6 +3077,7 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
+pdev->failover_primary = vdev->failover_primary;
 
 return;
 
@@ -3219,6 +3220,8 @@ static Property vfio_pci_dev_properties[] = {
qdev_prop_nv_gpudirect_clique, uint8_t),
 DEFINE_PROP_OFF_AUTO_PCIBAR("x-msix-relocation", VFIOPCIDevice, msix_relo,
 OFF_AUTOPCIBAR_OFF),
+DEFINE_PROP_BOOL("x-failover-prima

[Qemu-devel] [PATCH 1/3] virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

2018-12-10 Thread Venu Busireddy

From: Sridhar Samudrala 

This feature bit can be used by a hypervisor to indicate to the virtio_net
device that it can act as a standby for another device with the same MAC
address.

Signed-off-by: Sridhar Samudrala 
Signed-off-by: Venu Busireddy 
---
 hw/net/virtio-net.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 385b1a0..411f8fb 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -2198,6 +2198,8 @@ static Property virtio_net_properties[] = {
  true),
 DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
 DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
+DEFINE_PROP_BIT64("standby", VirtIONet, host_features, 
VIRTIO_NET_F_STANDBY,
+  false),
 DEFINE_PROP_END_OF_LIST(),
 };

[Qemu-devel] [PATCH 0/3] Support for datapath switching during live migration

2018-12-10 Thread Venu Busireddy

Implement the infrastructure to support datapath switching during live
migration involving SR-IOV devices.

1. This patch is based off on the current VIRTIO_NET_F_STANDBY feature
   bit and MAC address device pairing.

2. This set of events will be consumed by userspace management software
   to orchestrate all the hot plug and datapath switching activities.
   This scheme has the least QEMU modifications while allowing userspace
   software to build its own intelligence to control the whole process
   of SR-IOV live migration.

3. While the hidden device model (viz. coupled device model) is still
   being explored for automatic hot plugging (QEMU) and automatic datapath
   switching (host-kernel), this series provides a supplemental set
   of interfaces if management software wants to drive the SR-IOV live
   migration on its own. It should not conflict with the hidden device
   model but just offers simplicity of implementation.

Sridhar Samudrala (1):
  virtio_net: Add VIRTIO_NET_F_STANDBY feature bit.

Venu Busireddy (1):
  virtio_net: Add support for "Data Path Switching" during Live Migration.

Si-Wei Liu (1):
  vfio-pci: Add FAILOVER_PRIMARY_CHANGED event to shorten downtime during 
failover

 hw/acpi/pcihp.c| 27 
 hw/net/virtio-net.c| 25 +++
 hw/vfio/pci.c  | 60 
 hw/vfio/pci.h  |  1 +
 include/hw/pci/pci.h   |  1 +
 include/hw/virtio/virtio-net.h |  4 +++
 qapi/net.json  | 70 ++
 7 files changed, 188 insertions(+)

Re: [Qemu-devel] [virtio-dev] [PATCH v3 3/3] Add "Group Identifier" support to Red Hat PCI Express bridge.

2018-07-31 Thread Venu Busireddy

On 2018-07-07 15:14:11 +0300, Marcel Apfelbaum wrote:
> Hi Venu,
> 
> On 06/30/2018 01:19 AM, Venu Busireddy wrote:
> > Add a new bridge device "pcie-downstream" with a
> > Vendor ID of PCI_VENDOR_ID_REDHAT and a Device ID of
> > PCI_DEVICE_ID_REDHAT_DOWNPORT_FAILOVER.
> 
> Can't we use the current pcie-pci-bridge device (see
> hw/pci-bridge/pcie_pci_bridge.c)
> by adding the new vendor capability to it so we will not need to support a
> new bridge?

Sorry for the delayed response. I was away on vacation.

This question is probably better answered by Michael, but me let me try.
Michael suggested adding a new device to avoid confusion with existing
bridge devices.

Venu


> 
> Thanks,
> Marcel
> 
> >   Also, add the "Vendor-Specific"
> > capability to the bridge to contain the "Group Identifier" that will be
> > used to pair a virtio device with the passthrough device attached to
> > that bridge.
> > 
> > This capability is added to the bridge iff the "failover-group-id"
> > option is specified for the bridge.
> > 
> > Signed-off-by: Venu Busireddy 
> > ---
> >   default-configs/arm-softmmu.mak|   1 +
> >   default-configs/i386-softmmu.mak   |   1 +
> >   default-configs/x86_64-softmmu.mak |   1 +
> >   hw/pci-bridge/Makefile.objs|   1 +
> >   hw/pci-bridge/pcie_downstream.c| 220 +
> >   hw/pci-bridge/pcie_downstream.h|  10 ++
> >   include/hw/pci/pci.h   |   1 +
> >   7 files changed, 235 insertions(+)
> >   create mode 100644 hw/pci-bridge/pcie_downstream.c
> >   create mode 100644 hw/pci-bridge/pcie_downstream.h
> > 
> > diff --git a/default-configs/arm-softmmu.mak 
> > b/default-configs/arm-softmmu.mak
> > index 834d45cfaf..b86c6fb122 100644
> > --- a/default-configs/arm-softmmu.mak
> > +++ b/default-configs/arm-softmmu.mak
> > @@ -139,6 +139,7 @@ CONFIG_IMX_I2C=y
> >   CONFIG_PCIE_PORT=y
> >   CONFIG_XIO3130=y
> >   CONFIG_IOH3420=y
> > +CONFIG_PCIE_DOWNSTREAM=y
> >   CONFIG_I82801B11=y
> >   CONFIG_ACPI=y
> >   CONFIG_SMBIOS=y
> > diff --git a/default-configs/i386-softmmu.mak 
> > b/default-configs/i386-softmmu.mak
> > index 8c7d4a0fa0..a900c8f052 100644
> > --- a/default-configs/i386-softmmu.mak
> > +++ b/default-configs/i386-softmmu.mak
> > @@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
> >   CONFIG_PCIE_PORT=y
> >   CONFIG_XIO3130=y
> >   CONFIG_IOH3420=y
> > +CONFIG_PCIE_DOWNSTREAM=y
> >   CONFIG_I82801B11=y
> >   CONFIG_SMBIOS=y
> >   CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
> > diff --git a/default-configs/x86_64-softmmu.mak 
> > b/default-configs/x86_64-softmmu.mak
> > index 0390b4303c..481e4764be 100644
> > --- a/default-configs/x86_64-softmmu.mak
> > +++ b/default-configs/x86_64-softmmu.mak
> > @@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
> >   CONFIG_PCIE_PORT=y
> >   CONFIG_XIO3130=y
> >   CONFIG_IOH3420=y
> > +CONFIG_PCIE_DOWNSTREAM=y
> >   CONFIG_I82801B11=y
> >   CONFIG_SMBIOS=y
> >   CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
> > diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
> > index 47065f87d9..5b42212edc 100644
> > --- a/hw/pci-bridge/Makefile.objs
> > +++ b/hw/pci-bridge/Makefile.objs
> > @@ -3,6 +3,7 @@ common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o 
> > gen_pcie_root_port.o pcie_pci
> >   common-obj-$(CONFIG_PXB) += pci_expander_bridge.o
> >   common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
> >   common-obj-$(CONFIG_IOH3420) += ioh3420.o
> > +common-obj-$(CONFIG_PCIE_DOWNSTREAM) += pcie_downstream.o
> >   common-obj-$(CONFIG_I82801B11) += i82801b11.o
> >   # NewWorld PowerMac
> >   common-obj-$(CONFIG_DEC_PCI) += dec.o
> > diff --git a/hw/pci-bridge/pcie_downstream.c 
> > b/hw/pci-bridge/pcie_downstream.c
> > new file mode 100644
> > index 00..49b0d1e933
> > --- /dev/null
> > +++ b/hw/pci-bridge/pcie_downstream.c
> > @@ -0,0 +1,220 @@
> > +/*
> > + * Red Hat PCI Express downstream port.
> > + *
> > + * pcie_downstream.c
> > + * Most of this code is copied from xio3130_downstream.c
> > + *
> > + * Copyright (c) 2018 Oracle and/or its affiliates.
> > + * Author: Venu Busireddy 
> > + *
> > + * This program is free software; you can redistribute it and/or modify
> > + * it under the terms of the GNU General Public License as published by
> > + * the Free Software Foundation; either version 2 of the License, or
> > + * (at your optio

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...

2018-07-10 Thread Venu Busireddy

On 2018-07-10 05:11:18 +0300, Michael S. Tsirkin wrote:
> On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > The current patch set includes all the feedback received for proposals [3]
> > and [4]. For the sake of completeness, patch for the virtio specification
> > is also included here. Following is the updated proposal.
> > 
> > 1. Extend the virtio specification to include a new virtio PCI capability
> >"VIRTIO_PCI_CAP_GROUP_ID_CFG".
> 
> 
> 
> >  default-configs/arm-softmmu.mak |   1 +
> >  default-configs/i386-softmmu.mak|   1 +
> >  default-configs/x86_64-softmmu.mak  |   1 +
> >  hw/pci-bridge/Makefile.objs |   1 +
> >  hw/pci-bridge/pci_bridge_dev.c  |  10 +
> >  hw/pci-bridge/pcie_downstream.c | 220 
> >  hw/pci-bridge/pcie_downstream.h |  10 +
> >  hw/pci/pci_bridge.c |  34 +++
> >  hw/virtio/virtio-pci.c  |  15 ++
> >  hw/virtio/virtio-pci.h  |   3 +-
> >  include/hw/pci/pci.h|  37 ++--
> >  include/hw/pci/pci_bridge.h |   2 +
> >  include/hw/pci/pcie.h   |   1 +
> >  include/standard-headers/linux/virtio_pci.h |   8 +
> >  14 files changed, 326 insertions(+), 18 deletions(-)
> >  create mode 100644 hw/pci-bridge/pcie_downstream.c
> >  create mode 100644 hw/pci-bridge/pcie_downstream.h
> 
> I don't see the spec patch here.

Since the spec patch is in a different repo, I could not get 'git
format-patch' to combine them together. As a result, I added the spec
patch itself to the patches, but it is not in the cover letter.

I can combine them manually, and repost. Do you want me do that? 

Thanks,

Venu

Re: [Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...

2018-07-03 Thread Venu Busireddy

On 2018-07-03 12:58:25 +0300, Roman Kagan wrote:
> On Mon, Jul 02, 2018 at 02:14:52PM -0700, si-wei liu wrote:
> > On 7/2/2018 9:14 AM, Roman Kagan wrote:
> > > On Fri, Jun 29, 2018 at 05:19:03PM -0500, Venu Busireddy wrote:
> > > > The patch set "Enable virtio_net to act as a standby for a passthru
> > > > device" [1] deals with live migration of guests that use passthrough
> > > > devices. However, that scheme uses the MAC address for pairing
> > > > the virtio device and the passthrough device. The thread "netvsc:
> > > > refactor notifier/event handling code to use the failover framework"
> > > > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > > > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > > > store pairing information..." [4] were made.
> > > > 
> > > > The current patch set includes all the feedback received for proposals 
> > > > [3]
> > > > and [4]. For the sake of completeness, patch for the virtio 
> > > > specification
> > > > is also included here. Following is the updated proposal.
> > > > 
> > > > 1. Extend the virtio specification to include a new virtio PCI 
> > > > capability
> > > > "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 2. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > > > virtio device. The "failover-group-id" is a 64 bit value.
> > > > 
> > > > 3. Enhance the QEMU CLI to include a "failover-group-id" option to the
> > > > Red Hat PCI bridge device (support for the i440FX model).
> > > > 
> > > > 4. Add a new "pcie-downstream" device, with the option
> > > > "failover-group-id" (support for the Q35 model).
> > > > 
> > > > 5. The operator creates a 64 bit unique identifier, failover-group-id.
> > > > 
> > > > 6. When the virtio device is created, the operator uses the
> > > > "failover-group-id" option (for example, '-device
> > > > virtio-net-pci,failover-group-id=') and specifies the
> > > > failover-group-id created in step 4.
> > > > 
> > > > QEMU stores the failover-group-id in the virtio device's 
> > > > configuration
> > > > space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 7. When assigning a PCI device to the guest in passthrough mode, the
> > > > operator first creates a bridge using the "failover-group-id" option
> > > > (for example, '-device 
> > > > pcie-downstream,failover-group-id=')
> > > > to specify the failover-group-id created in step 4, and then 
> > > > attaches
> > > > the passthrough device to the bridge.
> > > > 
> > > > QEMU stores the failover-group-id in the configuration space of the
> > > > bridge as Vendor-Specific capability (0x09). The "Vendor" here is
> > > > not to be confused with a specific organization. Instead, the vendor
> > > > of the bridge is QEMU.
> > > > 
> > > > 8. Patch 4 in patch series "Enable virtio_net to act as a standby for
> > > > a passthru device" [1] needs to be modified to use the UUID values
> > > > present in the bridge's configuration space and the virtio device's
> > > > configuration space instead of the MAC address for pairing the 
> > > > devices.
> > > I'm still missing a few bits in the overall scheme.
> > > 
> > > Is the guest supposed to acknowledge the support for PT-PV failover?
> > 
> > Yes. We are leveraging virtio's feature negotiation mechanism for that.
> > Guest which does not acknowledge the support will not have PT plugged in.
> > 
> > > Should the PT device be visibile to the guest before it acknowledges the
> > > support for failover?
> > No. QEMU will only expose PT device after guest acknowledges the support
> > through virtio's feature negotiation.
> > 
> > >How is this supposed to work with legacy guests that don't support it?
> > Only PV device will be exposed on legacy guest.
> 
> So how is this coordination going to work?  One possibility is that the
> PV

[Qemu-devel] [PATCH v3 3/3] Add "Group Identifier" support to Red Hat PCI Express bridge.

2018-06-29 Thread Venu Busireddy

Add a new bridge device "pcie-downstream" with a
Vendor ID of PCI_VENDOR_ID_REDHAT and a Device ID of
PCI_DEVICE_ID_REDHAT_DOWNPORT_FAILOVER. Also, add the "Vendor-Specific"
capability to the bridge to contain the "Group Identifier" that will be
used to pair a virtio device with the passthrough device attached to
that bridge.

This capability is added to the bridge iff the "failover-group-id"
option is specified for the bridge.

Signed-off-by: Venu Busireddy 
---
 default-configs/arm-softmmu.mak|   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/pci-bridge/Makefile.objs|   1 +
 hw/pci-bridge/pcie_downstream.c| 220 +
 hw/pci-bridge/pcie_downstream.h|  10 ++
 include/hw/pci/pci.h   |   1 +
 7 files changed, 235 insertions(+)
 create mode 100644 hw/pci-bridge/pcie_downstream.c
 create mode 100644 hw/pci-bridge/pcie_downstream.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 834d45cfaf..b86c6fb122 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -139,6 +139,7 @@ CONFIG_IMX_I2C=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_ACPI=y
 CONFIG_SMBIOS=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 8c7d4a0fa0..a900c8f052 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 0390b4303c..481e4764be 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
index 47065f87d9..5b42212edc 100644
--- a/hw/pci-bridge/Makefile.objs
+++ b/hw/pci-bridge/Makefile.objs
@@ -3,6 +3,7 @@ common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o 
gen_pcie_root_port.o pcie_pci
 common-obj-$(CONFIG_PXB) += pci_expander_bridge.o
 common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
 common-obj-$(CONFIG_IOH3420) += ioh3420.o
+common-obj-$(CONFIG_PCIE_DOWNSTREAM) += pcie_downstream.o
 common-obj-$(CONFIG_I82801B11) += i82801b11.o
 # NewWorld PowerMac
 common-obj-$(CONFIG_DEC_PCI) += dec.o
diff --git a/hw/pci-bridge/pcie_downstream.c b/hw/pci-bridge/pcie_downstream.c
new file mode 100644
index 00..49b0d1e933
--- /dev/null
+++ b/hw/pci-bridge/pcie_downstream.c
@@ -0,0 +1,220 @@
+/*
+ * Red Hat PCI Express downstream port.
+ *
+ * pcie_downstream.c
+ * Most of this code is copied from xio3130_downstream.c
+ *
+ * Copyright (c) 2018 Oracle and/or its affiliates.
+ * Author: Venu Busireddy 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci_ids.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "pcie_downstream.h"
+#include "qapi/error.h"
+
+#define REDHAT_PCIE_DOWNSTREAM_REVISION 0x1
+#define REDHAT_PCIE_DOWNSTREAM_MSI_OFFSET   0x70
+#define REDHAT_PCIE_DOWNSTREAM_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_64BIT
+#define REDHAT_PCIE_DOWNSTREAM_MSI_NR_VECTOR1
+#define REDHAT_PCIE_DOWNSTREAM_SSVID_OFFSET 0x80
+#define REDHAT_PCIE_DOWNSTREAM_SSVID_SVID   0
+#define REDHAT_PCIE_DOWNSTREAM_SSVID_SSID   0
+#define REDHAT_PCIE_DOWNSTREAM_EXP_OFFSET   0x90
+#define REDHAT_PCIE_DOWNSTREAM_VENDOR_OFFSET0xCC
+#define REDHAT_PCIE_DOWNSTREAM_AER_OFFSET   0x100
+
+static void pcie_downstream_write_config(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, address, val, len);
+

[Qemu-devel] [PATCH v3 2/3] Add "Group Identifier" support to Red Hat PCI bridge.

2018-06-29 Thread Venu Busireddy

Add the "Vendor-Specific" capability to the Red Hat PCI bridge device
"pci-bridge", to contain the "Group Identifier" that will be used to pair
a virtio device with the passthrough device attached to that bridge. Also,
change the Device ID of the bridge to PCI_DEVICE_ID_REDHAT_BRIDGE_FAILOVER
to avoid confusion with bridges that don't have this capability.

This capability is added to the bridge iff the "failover-group-id"
option is specified for the bridge.

Signed-off-by: Venu Busireddy 
---
 hw/pci-bridge/pci_bridge_dev.c | 10 ++
 hw/pci/pci_bridge.c| 34 +
 include/hw/pci/pci.h   | 35 +-
 include/hw/pci/pci_bridge.h|  2 ++
 4 files changed, 64 insertions(+), 17 deletions(-)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index b2d861d216..d3879071a8 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -71,6 +71,13 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
**errp)
 bridge_dev->msi = ON_OFF_AUTO_OFF;
 }
 
+err = pci_bridge_vendor_init(dev, 0, PCI_DEVICE_ID_REDHAT_BRIDGE_FAILOVER,
+errp);
+if (err < 0) {
+error_append_hint(errp, "Can't init group ID, error %d\n", err);
+goto vendor_cap_err;
+}
+
 err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0, errp);
 if (err) {
 goto slotid_error;
@@ -109,6 +116,7 @@ slotid_error:
 if (shpc_present(dev)) {
 shpc_cleanup(dev, _dev->bar);
 }
+vendor_cap_err:
 shpc_error:
 pci_bridge_exitfn(dev);
 }
@@ -162,6 +170,8 @@ static Property pci_bridge_dev_properties[] = {
 ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
 PCI_BRIDGE_DEV_F_SHPC_REQ, true),
+DEFINE_PROP_UINT64(COMPAT_PROP_FAILOVER_GROUP_ID,
+   PCIDevice, failover_group_id, ULLONG_MAX),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40a39f57cb..68cc619c20 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -40,6 +40,10 @@
 #define PCI_SSVID_SVID  4
 #define PCI_SSVID_SSID  6
 
+#define PCI_VENDOR_SIZEOF 12
+#define PCI_VENDOR_CAP_LEN_OFFSET  2
+#define PCI_VENDOR_GROUP_ID_OFFSET 4
+
 int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
   uint16_t svid, uint16_t ssid,
   Error **errp)
@@ -57,6 +61,36 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
 return pos;
 }
 
+int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset,
+   uint16_t device_id, Error **errp)
+{
+int pos;
+PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
+
+if (d->failover_group_id == ULLONG_MAX) {
+return 0;
+}
+
+pos = pci_add_capability(d, PCI_CAP_ID_VNDR, offset, PCI_VENDOR_SIZEOF,
+errp);
+if (pos < 0) {
+return pos;
+}
+
+/*
+ * Tweak the Device ID to avoid confusion
+ * with bridges that don't have the group id capability.
+ */
+dc->device_id = device_id;
+pci_set_word(d->config + PCI_DEVICE_ID, device_id);
+
+pci_set_word(d->config + pos + PCI_VENDOR_CAP_LEN_OFFSET,
+PCI_VENDOR_SIZEOF);
+memcpy(d->config + pos + PCI_VENDOR_GROUP_ID_OFFSET,
+>failover_group_id, sizeof(uint64_t));
+return pos;
+}
+
 /* Accessor function to get parent bridge device from pci bus. */
 PCIDevice *pci_bridge_get_device(PCIBus *bus)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index b59c3e7e38..bc38032761 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -86,23 +86,24 @@ extern bool pci_available;
 #define PCI_DEVICE_ID_VIRTIO_9P  0x1009
 #define PCI_DEVICE_ID_VIRTIO_VSOCK   0x1012
 
-#define PCI_VENDOR_ID_REDHAT 0x1b36
-#define PCI_DEVICE_ID_REDHAT_BRIDGE  0x0001
-#define PCI_DEVICE_ID_REDHAT_SERIAL  0x0002
-#define PCI_DEVICE_ID_REDHAT_SERIAL2 0x0003
-#define PCI_DEVICE_ID_REDHAT_SERIAL4 0x0004
-#define PCI_DEVICE_ID_REDHAT_TEST0x0005
-#define PCI_DEVICE_ID_REDHAT_ROCKER  0x0006
-#define PCI_DEVICE_ID_REDHAT_SDHCI   0x0007
-#define PCI_DEVICE_ID_REDHAT_PCIE_HOST   0x0008
-#define PCI_DEVICE_ID_REDHAT_PXB 0x0009
-#define PCI_DEVICE_ID_REDHAT_BRIDGE_SEAT 0x000a
-#define PCI_DEVICE_ID_REDHAT_PXB_PCIE0x000b
-#define PCI_DEVICE_ID_REDHAT_PCIE_RP 0x000c
-#define PCI_DEVICE_ID_REDHAT_XHCI0x000d
-#define PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE 0x000e
-#define PCI_DEVICE_ID_REDHAT_MDPY0x000f
-#define PCI_DEVICE_ID_REDHAT_QXL 0x0100
+#define PCI_VENDOR_ID_REDHAT0x1b36
+#define PCI_DEVICE_ID_REDHAT_BRIDGE 0x0001
+#define PCI_DEVICE_ID_REDHAT_SERIAL

[Qemu-devel] [PATCH v3 virtio 1/1] Add "Group Identifier" to virtio PCI capabilities.

2018-06-29 Thread Venu Busireddy

Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
virtio PCI capabilities to allow for the grouping of devices.

Signed-off-by: Venu Busireddy 
---
 content.tex | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/content.tex b/content.tex
index be18234..34627e5 100644
--- a/content.tex
+++ b/content.tex
@@ -599,6 +599,8 @@ The fields are interpreted as follows:
 #define VIRTIO_PCI_CAP_DEVICE_CFG4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG   5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
 \end{lstlisting}
 
 Any other value is reserved for future use.
@@ -997,6 +999,34 @@ address \field{cap.length} bytes within a BAR range
 specified by some other Virtio Structure PCI Capability
 of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
 
+\subsubsection{Group Identifier capability}\label{sec:Virtio Transport Options 
/ Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping
+devices together.
+
+The capability is immediately followed by an identifier of type u64
+as below:
+
+\begin{lstlisting}
+struct virtio_pci_group_id_cap {
+struct virtio_pci_cap cap;
+u64 group_id; /* Group Identifier */
+};
+\end{lstlisting}
+
+\devicenormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
+
+When this capability is present, the device must set the fields
+\field{cap.bar}, \field{cap.offset} and \field{cap.length} to 0, and
+set the group_id to a unique identifier.
+
+\drivernormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The fields \field{cap.bar}, \field{cap.offset}, \field{cap.length}
+and \field{group_id} are read-only for the driver.
+
 \subsubsection{Legacy Interfaces: A Note on PCI Device 
Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device 
Layout / Legacy Interfaces: A Note on PCI Device Layout}
 
 Transitional devices MUST present part of configuration

[Qemu-devel] [PATCH v3 0/3] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-29 Thread Venu Busireddy

The patch set "Enable virtio_net to act as a standby for a passthru
device" [1] deals with live migration of guests that use passthrough
devices. However, that scheme uses the MAC address for pairing
the virtio device and the passthrough device. The thread "netvsc:
refactor notifier/event handling code to use the failover framework"
[2] discusses an alternate mechanism, such as using an UUID, for pairing
the devices. Based on that discussion, proposals "Add "Group Identifier"
to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
store pairing information..." [4] were made.

The current patch set includes all the feedback received for proposals [3]
and [4]. For the sake of completeness, patch for the virtio specification
is also included here. Following is the updated proposal.

1. Extend the virtio specification to include a new virtio PCI capability
   "VIRTIO_PCI_CAP_GROUP_ID_CFG".

2. Enhance the QEMU CLI to include a "failover-group-id" option to the
   virtio device. The "failover-group-id" is a 64 bit value.

3. Enhance the QEMU CLI to include a "failover-group-id" option to the
   Red Hat PCI bridge device (support for the i440FX model).

4. Add a new "pcie-downstream" device, with the option
   "failover-group-id" (support for the Q35 model).

5. The operator creates a 64 bit unique identifier, failover-group-id.

6. When the virtio device is created, the operator uses the
   "failover-group-id" option (for example, '-device
   virtio-net-pci,failover-group-id=') and specifies the
   failover-group-id created in step 4.

   QEMU stores the failover-group-id in the virtio device's configuration
   space in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".

7. When assigning a PCI device to the guest in passthrough mode, the
   operator first creates a bridge using the "failover-group-id" option
   (for example, '-device pcie-downstream,failover-group-id=')
   to specify the failover-group-id created in step 4, and then attaches
   the passthrough device to the bridge.

   QEMU stores the failover-group-id in the configuration space of the
   bridge as Vendor-Specific capability (0x09). The "Vendor" here is
   not to be confused with a specific organization. Instead, the vendor
   of the bridge is QEMU.

8. Patch 4 in patch series "Enable virtio_net to act as a standby for
   a passthru device" [1] needs to be modified to use the UUID values
   present in the bridge's configuration space and the virtio device's
   configuration space instead of the MAC address for pairing the devices.

Thanks!

Venu

[1] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00156.html
[2] https://www.spinics.net/lists/netdev/msg499011.html
[3] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00118.html
[4] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00204.html

Changes in v3:
  - As Michael Tsirkin suggested:
* Changed the failover group identifier from UUID to a 64-bit value.
* Changed the command line option from "uuid" to "failover-group-id".
* Tweaked the "pci-bridge" device's Device ID to
  PCI_DEVICE_ID_REDHAT_BRIDGE_FAILOVER.
* Added to new device "pcie-downstream" with Device ID
  PCI_DEVICE_ID_REDHAT_DOWNPORT_FAILOVER (to support the PCIe case).
  - Changed the patch for virtio specification to reflect the above
changes.

Changes in v2:
  - As Michael Tsirkin suggested, changed the virtio specification
to restrict the group identifier to be a 16-byte field, presented
entirely in the virtio device's configuration space.
  - As Michael Tsirkin suggested, instead of tweaking the ioh3420
device with Red Hat vendor ID, create a new PCIe bridge device
named "pcie-downstream" with Red Hat Vendor ID, and include the
group identifier in this device.
  - Added a new patch to enhance the "pci-bridge" device to support
the group identifier (for the i440FX model).

Venu Busireddy (3):
  Add "Group Identifier" support to virtio devices.
  Add "Group Identifier" support to Red Hat PCI bridge.
  Add "Group Identifier" support to Red Hat PCI Express bridge.

 default-configs/arm-softmmu.mak |   1 +
 default-configs/i386-softmmu.mak|   1 +
 default-configs/x86_64-softmmu.mak  |   1 +
 hw/pci-bridge/Makefile.objs |   1 +
 hw/pci-bridge/pci_bridge_dev.c  |  10 +
 hw/pci-bridge/pcie_downstream.c | 220 
 hw/pci-bridge/pcie_downstream.h |  10 +
 hw/pci/pci_bridge.c |  34 +++
 hw/virtio/virtio-pci.c  |  15 ++
 hw/virtio/virtio-pci.h  |   3 +-
 include/hw/pci/pci.h|  37 ++--
 include/hw/pci/pci_bridge.

[Qemu-devel] [PATCH v3 1/3] Add "Group Identifier" support to virtio devices.

2018-06-29 Thread Venu Busireddy

Use the virtio PCI capability "VIRTIO_PCI_CAP_GROUP_ID_CFG" to
store the "Group Identifier" specified via the command line option
"failover-group-id" for the virtio device. The capability will be
present in the virtio device's configuration space iff the
"failover-group-id" option is specified.

Group Identifier is used to pair a virtio device with a passthrough
device.

Signed-off-by: Venu Busireddy 
---
 hw/virtio/virtio-pci.c  | 15 +++
 hw/virtio/virtio-pci.h  |  3 ++-
 include/hw/pci/pci.h|  1 +
 include/hw/pci/pcie.h   |  1 +
 include/standard-headers/linux/virtio_pci.h |  8 
 5 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3a01fe90f0..cdf907e9c5 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -1638,6 +1638,10 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 .cap.cap_len = sizeof cfg,
 .cap.cfg_type = VIRTIO_PCI_CAP_PCI_CFG,
 };
+struct virtio_pci_group_id_cap group = {
+.cap.cap_len = sizeof group,
+.cap.cfg_type = VIRTIO_PCI_CAP_GROUP_ID_CFG,
+};
 struct virtio_pci_notify_cap notify_pio = {
 .cap.cap_len = sizeof notify,
 .notify_off_multiplier = cpu_to_le32(0x0),
@@ -1647,6 +1651,11 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 
 virtio_pci_modern_regions_init(proxy);
 
+if (proxy->pci_dev.failover_group_id != ULLONG_MAX) {
+group.failover_group_id = proxy->pci_dev.failover_group_id;
+virtio_pci_modern_mem_region_map(proxy, >group, );
+}
+
 virtio_pci_modern_mem_region_map(proxy, >common, );
 virtio_pci_modern_mem_region_map(proxy, >isr, );
 virtio_pci_modern_mem_region_map(proxy, >device, );
@@ -1763,6 +1772,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 proxy->device.size = 0x1000;
 proxy->device.type = VIRTIO_PCI_CAP_DEVICE_CFG;
 
+proxy->group.offset = 0;
+proxy->group.size = 0;
+proxy->group.type = VIRTIO_PCI_CAP_GROUP_ID_CFG;
+
 proxy->notify.offset = 0x3000;
 proxy->notify.size = virtio_pci_queue_mem_mult(proxy) * VIRTIO_QUEUE_MAX;
 proxy->notify.type = VIRTIO_PCI_CAP_NOTIFY_CFG;
@@ -1898,6 +1911,8 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
 DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
+DEFINE_PROP_UINT64(COMPAT_PROP_FAILOVER_GROUP_ID,
+   PCIDevice, failover_group_id, ULLONG_MAX),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..e4592e90bf 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -164,10 +164,11 @@ struct VirtIOPCIProxy {
 VirtIOPCIRegion common;
 VirtIOPCIRegion isr;
 VirtIOPCIRegion device;
+VirtIOPCIRegion group;
 VirtIOPCIRegion notify;
 VirtIOPCIRegion notify_pio;
 };
-VirtIOPCIRegion regs[5];
+VirtIOPCIRegion regs[6];
 };
 MemoryRegion modern_bar;
 MemoryRegion io_bar;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 990d6fcbde..b59c3e7e38 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -343,6 +343,7 @@ struct PCIDevice {
 bool has_rom;
 MemoryRegion rom;
 uint32_t rom_bar;
+uint64_t failover_group_id;
 
 /* INTx routing notifier */
 PCIINTxRoutingNotifier intx_routing_notifier;
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b71e369703..71cd143ee4 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -82,6 +82,7 @@ struct PCIExpressDevice {
 };
 
 #define COMPAT_PROP_PCP "power_controller_present"
+#define COMPAT_PROP_FAILOVER_GROUP_ID "failover-group-id"
 
 /* PCI express capability helper functions */
 int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type,
diff --git a/include/standard-headers/linux/virtio_pci.h 
b/include/standard-headers/linux/virtio_pci.h
index 9262acd130..e46df63e52 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -113,6 +113,8 @@
 #define VIRTIO_PCI_CAP_DEVICE_CFG  4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG 5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG6
 
 /* This is the PCI capability header: */
 struct virtio_pci_cap {
@@ -163,6 +165,12 @@ struct virtio_pci_cfg_cap {
uint8_t pci_cfg_data[4]; /* Data for BAR access. */
 };
 
+/* Fields in VIRTIO_PCI_CAP_GROUP_ID_CFG: */
+struct virtio_pci_group_id_

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-29 Thread Venu Busireddy

On 2018-06-27 22:27:33 -0500, Venu Busireddy wrote:
> On 2018-06-28 04:54:16 +0300, Michael S. Tsirkin wrote:
> > On Wed, Jun 27, 2018 at 05:34:17PM -0500, Venu Busireddy wrote:
> > > On 2018-06-27 23:12:12 +0300, Michael S. Tsirkin wrote:
> > > > On Wed, Jun 27, 2018 at 02:59:01PM -0500, Venu Busireddy wrote:
> > > > > On 2018-06-27 22:47:05 +0300, Michael S. Tsirkin wrote:
> > > > > > On Wed, Jun 27, 2018 at 02:29:58PM -0500, Venu Busireddy wrote:
> > > > > > > On 2018-06-27 15:24:58 +0300, Roman Kagan wrote:
> > > > > > > > On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > > > > > > > > The patch set "Enable virtio_net to act as a standby for a 
> > > > > > > > > passthru
> > > > > > > > > device" [1] deals with live migration of guests that use 
> > > > > > > > > passthrough
> > > > > > > > > devices. However, that scheme uses the MAC address for pairing
> > > > > > > > > the virtio device and the passthrough device. The thread 
> > > > > > > > > "netvsc:
> > > > > > > > > refactor notifier/event handling code to use the failover 
> > > > > > > > > framework"
> > > > > > > > > [2] discusses an alternate mechanism, such as using an UUID, 
> > > > > > > > > for pairing
> > > > > > > > > the devices. Based on that discussion, proposals "Add "Group 
> > > > > > > > > Identifier"
> > > > > > > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge 
> > > > > > > > > devices to
> > > > > > > > > store pairing information..." [4] were made.
> > > > > > > > 
> > > > > > > > I must have missed something in those threads, but where does 
> > > > > > > > this UUID
> > > > > > > > thing come about?  AFAICS this identifier doesn't need to be
> > > > > > > > "universally" unique, nor persistent; it only has to be unique 
> > > > > > > > across
> > > > > > > > the VM and stable throughout the VM lifetime.
> > > > > > > 
> > > > > > > The notion of using UUID came up in the thread
> > > > > > > 
> > > > > > >https://www.spinics.net/lists/netdev/msg499011.html
> > > > > > 
> > > > > > That's probably because it was expected one of standard serial 
> > > > > > number capabilities
> > > > > > (VPD or PCI Express serial #) will be used, which are expected to 
> > > > > > be unique.
> > > > > > 
> > > > > > If you are rolling your own vendor specific one, it's just an ID and
> > > > > > does not have to be unique.
> > > > > > 
> > > > > > > > FWIW Hyper-V uses a 32-bit integer for this purpose, not a UUID 
> > > > > > > > as seems
> > > > > > > > to be implied in the thread you refer to.
> > > > > > > 
> > > > > > > Yes, Hyper-V uses a serial number (but I think it is 64-bit 
> > > > > > > value).
> > > > > > > However, what we are doing is similar to that. Instead of 32 bits,
> > > > > > > we are using 128 bits.
> > > > > > 
> > > > > > That's OK. The name is confusing though. It's a failover group id,
> > > > > > not a UUID.
> > > > > 
> > > > > Sure, we can name it whatever we want. I can change it to
> > > > > "failover-group-id", if that is what we want to call it.
> > > > > 
> > > > > But what is more important is, what is represented by that name? I 
> > > > > thought
> > > > > we were going to use UUID. The QEMU command line changes in this patch
> > > > > set expect the user to specify an UUID as the value for this option
> > > > > (whatever we name it). Are we still in agreement about that, or do you
> > > > > propose something else to be used? If so, what is it? A 32-bit 
> > > > > number, a
> > > > > 64-bit number, or an arbitrary string?
> > > > > 
> &g

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 3/4] Add "Group Identifier" support to Red Hat PCI bridge.

2018-06-27 Thread Venu Busireddy

On 2018-06-28 05:14:50 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2018 at 06:07:59PM -0500, Venu Busireddy wrote:
> > On 2018-06-26 23:08:12 -0500, Venu Busireddy wrote:
> > > On 2018-06-27 07:02:36 +0300, Michael S. Tsirkin wrote:
> > > > On Tue, Jun 26, 2018 at 10:49:33PM -0500, Venu Busireddy wrote:
> > > > > Add the "Vendor-Specific" capability to the Red Hat PCI bridge device
> > > > > "pci-bridge", to contain the "Group Identifier" (UUID) that will be
> > > > > used to pair a virtio device with the passthrough device attached to
> > > > > that bridge.
> > > > > 
> > > > > This capability is added to the bridge iff the "uuid" option is 
> > > > > specified
> > > > > for the bridge.
> > > > 
> > > > I think the name should be more explicit. How about "failover-group-id"?
> > > 
> > > I can change it. But don't you think it is bit long?
> > > 
> > > > > 
> > > > > Signed-off-by: Venu Busireddy 
> > > > 
> > > > I'd like to also tweak the device id in this case,
> > > > to make it easier for guests to know it's a grouping bridge.
> > > 
> > > Could you please recommend a name for the new ID'd definition? Something
> > > in lines of PCI_DEVICE_ID_REDHAT_.
> > 
> > How about these names and values for the device IDs?
> > 
> >   PCI_DEVICE_ID_REDHAT_PCI_BRIDGE_GRP (0x0010) for pci-bridge, and 
> 
> You do not need the second PCI, and group is one of the
> functions of bridge anyway.
> 
> PCI_DEVICE_ID_REDHAT_BRIDGE_FAILOVER?

Sounds good.

> >   PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE_GRP (0x0011) for pcie-downstream
> > 
> > Thanks,
> > 
> > Venu
> 
> PCI_DEVICE_ID_REDHAT_DOWNPORT_FAILOVER?

Sounds good.

> It's becoming annoying though - we'll need one for each type :(
> 
> Ideas/options:
> - use revision ID to distinguish from regular bridge

What would we use for the revision? For example, "pci-bridge" (default
one) currently uses revision 0. Should we use revision 1 for the group
bridge? If we do that, what if in the future we need to up the revision
for the default bridge?

> - use device serial # cap for the bridge when it's an express device

We could do that. But if we are defining a new Device ID for the PCI
case, we might as well do so for the PCIe case too, and be consistent?
The other advantage in defining a new ID is, 'lspci' could right away
tell what type of bridge it is (if we update PCI IDs). No need to map
the revision to a bridge type!

Unless you have strong reservations against, I think the notion of
different Device ID for the grouping bridge sounds cleaner. Do you agree?

Venu

> 
> > > > > ---
> > > > >  hw/pci-bridge/pci_bridge_dev.c |  8 
> > > > >  hw/pci/pci_bridge.c| 26 ++
> > > > >  include/hw/pci/pcie.h  |  1 +
> > > > >  3 files changed, 35 insertions(+)
> > > > > 
> > > > > diff --git a/hw/pci-bridge/pci_bridge_dev.c 
> > > > > b/hw/pci-bridge/pci_bridge_dev.c
> > > > > index b2d861d216..bbbc6fa1c6 100644
> > > > > --- a/hw/pci-bridge/pci_bridge_dev.c
> > > > > +++ b/hw/pci-bridge/pci_bridge_dev.c
> > > > > @@ -71,6 +71,12 @@ static void pci_bridge_dev_realize(PCIDevice *dev, 
> > > > > Error **errp)
> > > > >  bridge_dev->msi = ON_OFF_AUTO_OFF;
> > > > >  }
> > > > >  
> > > > > +err = pci_bridge_vendor_init(dev, 0, errp);
> > > > > +if (err < 0) {
> > > > > +error_append_hint(errp, "Can't init group ID, error %d\n", 
> > > > > err);
> > > > > +goto vendor_cap_err;
> > > > > +}
> > > > > +
> > > > >  err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0, errp);
> > > > >  if (err) {
> > > > >  goto slotid_error;
> > > > > @@ -109,6 +115,7 @@ slotid_error:
> > > > >  if (shpc_present(dev)) {
> > > > >  shpc_cleanup(dev, _dev->bar);
> > > > >  }
> > > > > +vendor_cap_err:
> > > > >  shpc_error:
> > > > >  pci_bridge_exitfn(dev);
> > > > >  }
> > > > > @@ -162,6 +169,7 @@ static Property pci_bridge_dev_properties[] = {
> > > > >

Re: [Qemu-devel] [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-27 Thread Venu Busireddy

On 2018-06-28 04:54:16 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2018 at 05:34:17PM -0500, Venu Busireddy wrote:
> > On 2018-06-27 23:12:12 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2018 at 02:59:01PM -0500, Venu Busireddy wrote:
> > > > On 2018-06-27 22:47:05 +0300, Michael S. Tsirkin wrote:
> > > > > On Wed, Jun 27, 2018 at 02:29:58PM -0500, Venu Busireddy wrote:
> > > > > > On 2018-06-27 15:24:58 +0300, Roman Kagan wrote:
> > > > > > > On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > > > > > > > The patch set "Enable virtio_net to act as a standby for a 
> > > > > > > > passthru
> > > > > > > > device" [1] deals with live migration of guests that use 
> > > > > > > > passthrough
> > > > > > > > devices. However, that scheme uses the MAC address for pairing
> > > > > > > > the virtio device and the passthrough device. The thread 
> > > > > > > > "netvsc:
> > > > > > > > refactor notifier/event handling code to use the failover 
> > > > > > > > framework"
> > > > > > > > [2] discusses an alternate mechanism, such as using an UUID, 
> > > > > > > > for pairing
> > > > > > > > the devices. Based on that discussion, proposals "Add "Group 
> > > > > > > > Identifier"
> > > > > > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge 
> > > > > > > > devices to
> > > > > > > > store pairing information..." [4] were made.
> > > > > > > 
> > > > > > > I must have missed something in those threads, but where does 
> > > > > > > this UUID
> > > > > > > thing come about?  AFAICS this identifier doesn't need to be
> > > > > > > "universally" unique, nor persistent; it only has to be unique 
> > > > > > > across
> > > > > > > the VM and stable throughout the VM lifetime.
> > > > > > 
> > > > > > The notion of using UUID came up in the thread
> > > > > > 
> > > > > >https://www.spinics.net/lists/netdev/msg499011.html
> > > > > 
> > > > > That's probably because it was expected one of standard serial number 
> > > > > capabilities
> > > > > (VPD or PCI Express serial #) will be used, which are expected to be 
> > > > > unique.
> > > > > 
> > > > > If you are rolling your own vendor specific one, it's just an ID and
> > > > > does not have to be unique.
> > > > > 
> > > > > > > FWIW Hyper-V uses a 32-bit integer for this purpose, not a UUID 
> > > > > > > as seems
> > > > > > > to be implied in the thread you refer to.
> > > > > > 
> > > > > > Yes, Hyper-V uses a serial number (but I think it is 64-bit value).
> > > > > > However, what we are doing is similar to that. Instead of 32 bits,
> > > > > > we are using 128 bits.
> > > > > 
> > > > > That's OK. The name is confusing though. It's a failover group id,
> > > > > not a UUID.
> > > > 
> > > > Sure, we can name it whatever we want. I can change it to
> > > > "failover-group-id", if that is what we want to call it.
> > > > 
> > > > But what is more important is, what is represented by that name? I 
> > > > thought
> > > > we were going to use UUID. The QEMU command line changes in this patch
> > > > set expect the user to specify an UUID as the value for this option
> > > > (whatever we name it). Are we still in agreement about that, or do you
> > > > propose something else to be used? If so, what is it? A 32-bit number, a
> > > > 64-bit number, or an arbitrary string?
> > > > 
> > > > Regards,
> > > > 
> > > > Venu
> > > 
> > > If we don't really need a UUID, I'd avoid that requirement.
> > 
> > I don't see the need for a 128-bit UUID. I just took that approach because
> > UUID was mentioned in "https://www.spinics.net/lists/netdev/msg499011.html;.
> > Since it is unlikely to have more than 4 billion devices in the system,
> > a 32-bit

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 3/4] Add "Group Identifier" support to Red Hat PCI bridge.

2018-06-27 Thread Venu Busireddy

On 2018-06-26 23:08:12 -0500, Venu Busireddy wrote:
> On 2018-06-27 07:02:36 +0300, Michael S. Tsirkin wrote:
> > On Tue, Jun 26, 2018 at 10:49:33PM -0500, Venu Busireddy wrote:
> > > Add the "Vendor-Specific" capability to the Red Hat PCI bridge device
> > > "pci-bridge", to contain the "Group Identifier" (UUID) that will be
> > > used to pair a virtio device with the passthrough device attached to
> > > that bridge.
> > > 
> > > This capability is added to the bridge iff the "uuid" option is specified
> > > for the bridge.
> > 
> > I think the name should be more explicit. How about "failover-group-id"?
> 
> I can change it. But don't you think it is bit long?
> 
> > > 
> > > Signed-off-by: Venu Busireddy 
> > 
> > I'd like to also tweak the device id in this case,
> > to make it easier for guests to know it's a grouping bridge.
> 
> Could you please recommend a name for the new ID'd definition? Something
> in lines of PCI_DEVICE_ID_REDHAT_.

How about these names and values for the device IDs?

  PCI_DEVICE_ID_REDHAT_PCI_BRIDGE_GRP (0x0010) for pci-bridge, and 
  PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE_GRP (0x0011) for pcie-downstream

Thanks,

Venu

> > > ---
> > >  hw/pci-bridge/pci_bridge_dev.c |  8 
> > >  hw/pci/pci_bridge.c| 26 ++
> > >  include/hw/pci/pcie.h  |  1 +
> > >  3 files changed, 35 insertions(+)
> > > 
> > > diff --git a/hw/pci-bridge/pci_bridge_dev.c 
> > > b/hw/pci-bridge/pci_bridge_dev.c
> > > index b2d861d216..bbbc6fa1c6 100644
> > > --- a/hw/pci-bridge/pci_bridge_dev.c
> > > +++ b/hw/pci-bridge/pci_bridge_dev.c
> > > @@ -71,6 +71,12 @@ static void pci_bridge_dev_realize(PCIDevice *dev, 
> > > Error **errp)
> > >  bridge_dev->msi = ON_OFF_AUTO_OFF;
> > >  }
> > >  
> > > +err = pci_bridge_vendor_init(dev, 0, errp);
> > > +if (err < 0) {
> > > +error_append_hint(errp, "Can't init group ID, error %d\n", err);
> > > +goto vendor_cap_err;
> > > +}
> > > +
> > >  err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0, errp);
> > >  if (err) {
> > >  goto slotid_error;
> > > @@ -109,6 +115,7 @@ slotid_error:
> > >  if (shpc_present(dev)) {
> > >  shpc_cleanup(dev, _dev->bar);
> > >  }
> > > +vendor_cap_err:
> > >  shpc_error:
> > >  pci_bridge_exitfn(dev);
> > >  }
> > > @@ -162,6 +169,7 @@ static Property pci_bridge_dev_properties[] = {
> > >  ON_OFF_AUTO_AUTO),
> > >  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
> > >  PCI_BRIDGE_DEV_F_SHPC_REQ, true),
> > > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> > >  DEFINE_PROP_END_OF_LIST(),
> > >  };
> > >  
> > > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > > index 40a39f57cb..cb8b3dad2a 100644
> > > --- a/hw/pci/pci_bridge.c
> > > +++ b/hw/pci/pci_bridge.c
> > > @@ -34,12 +34,17 @@
> > >  #include "hw/pci/pci_bus.h"
> > >  #include "qemu/range.h"
> > >  #include "qapi/error.h"
> > > +#include "qemu/uuid.h"
> > >  
> > >  /* PCI bridge subsystem vendor ID helper functions */
> > >  #define PCI_SSVID_SIZEOF8
> > >  #define PCI_SSVID_SVID  4
> > >  #define PCI_SSVID_SSID  6
> > >  
> > > +#define PCI_VENDOR_SIZEOF 20
> > > +#define PCI_VENDOR_CAP_LEN_OFFSET  2
> > > +#define PCI_VENDOR_GROUP_ID_OFFSET 4
> > > +
> > >  int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> > >uint16_t svid, uint16_t ssid,
> > >Error **errp)
> > > @@ -57,6 +62,27 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t 
> > > offset,
> > >  return pos;
> > >  }
> > >  
> > > +int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp)
> > > +{
> > > +int pos;
> > > +
> > > +if (qemu_uuid_is_null(>uuid)) {
> > > +return 0;
> > > +}
> > > +
> > > +pos = pci_add_capability(d, PCI_CAP_ID_VNDR, offset, 
> > > PCI_VENDOR_SIZEOF,
> &g

Re: [Qemu-devel] [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-27 Thread Venu Busireddy

On 2018-06-27 23:12:12 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2018 at 02:59:01PM -0500, Venu Busireddy wrote:
> > On 2018-06-27 22:47:05 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Jun 27, 2018 at 02:29:58PM -0500, Venu Busireddy wrote:
> > > > On 2018-06-27 15:24:58 +0300, Roman Kagan wrote:
> > > > > On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > > > > > The patch set "Enable virtio_net to act as a standby for a passthru
> > > > > > device" [1] deals with live migration of guests that use passthrough
> > > > > > devices. However, that scheme uses the MAC address for pairing
> > > > > > the virtio device and the passthrough device. The thread "netvsc:
> > > > > > refactor notifier/event handling code to use the failover framework"
> > > > > > [2] discusses an alternate mechanism, such as using an UUID, for 
> > > > > > pairing
> > > > > > the devices. Based on that discussion, proposals "Add "Group 
> > > > > > Identifier"
> > > > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > > > > > store pairing information..." [4] were made.
> > > > > 
> > > > > I must have missed something in those threads, but where does this 
> > > > > UUID
> > > > > thing come about?  AFAICS this identifier doesn't need to be
> > > > > "universally" unique, nor persistent; it only has to be unique across
> > > > > the VM and stable throughout the VM lifetime.
> > > > 
> > > > The notion of using UUID came up in the thread
> > > > 
> > > >https://www.spinics.net/lists/netdev/msg499011.html
> > > 
> > > That's probably because it was expected one of standard serial number 
> > > capabilities
> > > (VPD or PCI Express serial #) will be used, which are expected to be 
> > > unique.
> > > 
> > > If you are rolling your own vendor specific one, it's just an ID and
> > > does not have to be unique.
> > > 
> > > > > FWIW Hyper-V uses a 32-bit integer for this purpose, not a UUID as 
> > > > > seems
> > > > > to be implied in the thread you refer to.
> > > > 
> > > > Yes, Hyper-V uses a serial number (but I think it is 64-bit value).
> > > > However, what we are doing is similar to that. Instead of 32 bits,
> > > > we are using 128 bits.
> > > 
> > > That's OK. The name is confusing though. It's a failover group id,
> > > not a UUID.
> > 
> > Sure, we can name it whatever we want. I can change it to
> > "failover-group-id", if that is what we want to call it.
> > 
> > But what is more important is, what is represented by that name? I thought
> > we were going to use UUID. The QEMU command line changes in this patch
> > set expect the user to specify an UUID as the value for this option
> > (whatever we name it). Are we still in agreement about that, or do you
> > propose something else to be used? If so, what is it? A 32-bit number, a
> > 64-bit number, or an arbitrary string?
> > 
> > Regards,
> > 
> > Venu
> 
> If we don't really need a UUID, I'd avoid that requirement.

I don't see the need for a 128-bit UUID. I just took that approach because
UUID was mentioned in "https://www.spinics.net/lists/netdev/msg499011.html;.
Since it is unlikely to have more than 4 billion devices in the system,
a 32-bit value would be more than enough to uniquely identify devices!

I am looking for direction from you :-). Roman already opined that UUID
may be an overkill. It appears that you too are leaning that way. Would
it be acceptable if I change the group identifier ("failover-group-id")
to a 32-bit value? If you concur, I will start reworking my patch. Could
you please confirm?

Thanks,

Venu

> 
> 
> > > 
> > > > > > The current patch set includes all the feedback received for 
> > > > > > proposals [3]
> > > > > > and [4]. For the sake of completeness, patch for the virtio 
> > > > > > specification
> > > > > > is also included here. Following is the updated proposal.
> > > > > > 
> > > > > > 1. Extend the virtio specification to include a new virtio PCI 
> > > > > > capability
> > > > > >"VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > >

Re: [Qemu-devel] [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-27 Thread Venu Busireddy

On 2018-06-27 22:47:05 +0300, Michael S. Tsirkin wrote:
> On Wed, Jun 27, 2018 at 02:29:58PM -0500, Venu Busireddy wrote:
> > On 2018-06-27 15:24:58 +0300, Roman Kagan wrote:
> > > On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > > > The patch set "Enable virtio_net to act as a standby for a passthru
> > > > device" [1] deals with live migration of guests that use passthrough
> > > > devices. However, that scheme uses the MAC address for pairing
> > > > the virtio device and the passthrough device. The thread "netvsc:
> > > > refactor notifier/event handling code to use the failover framework"
> > > > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > > > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > > > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > > > store pairing information..." [4] were made.
> > > 
> > > I must have missed something in those threads, but where does this UUID
> > > thing come about?  AFAICS this identifier doesn't need to be
> > > "universally" unique, nor persistent; it only has to be unique across
> > > the VM and stable throughout the VM lifetime.
> > 
> > The notion of using UUID came up in the thread
> > 
> >https://www.spinics.net/lists/netdev/msg499011.html
> 
> That's probably because it was expected one of standard serial number 
> capabilities
> (VPD or PCI Express serial #) will be used, which are expected to be unique.
> 
> If you are rolling your own vendor specific one, it's just an ID and
> does not have to be unique.
> 
> > > FWIW Hyper-V uses a 32-bit integer for this purpose, not a UUID as seems
> > > to be implied in the thread you refer to.
> > 
> > Yes, Hyper-V uses a serial number (but I think it is 64-bit value).
> > However, what we are doing is similar to that. Instead of 32 bits,
> > we are using 128 bits.
> 
> That's OK. The name is confusing though. It's a failover group id,
> not a UUID.

Sure, we can name it whatever we want. I can change it to
"failover-group-id", if that is what we want to call it.

But what is more important is, what is represented by that name? I thought
we were going to use UUID. The QEMU command line changes in this patch
set expect the user to specify an UUID as the value for this option
(whatever we name it). Are we still in agreement about that, or do you
propose something else to be used? If so, what is it? A 32-bit number, a
64-bit number, or an arbitrary string?

Regards,

Venu

> 
> > > > The current patch set includes all the feedback received for proposals 
> > > > [3]
> > > > and [4]. For the sake of completeness, patch for the virtio 
> > > > specification
> > > > is also included here. Following is the updated proposal.
> > > > 
> > > > 1. Extend the virtio specification to include a new virtio PCI 
> > > > capability
> > > >"VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 2. Enhance the QEMU CLI to include a "uuid" option to the virtio device.
> > > >The "uuid" is a string in UUID format.
> > > > 
> > > > 3. Enhance the QEMU CLI to include a "uuid" option to the bridge device.
> > > >The "uuid" is a string in UUID format. Currently, PCIe bridge for
> > > >the Q35 model is supported.
> > > > 
> > > > 4. The operator creates a unique identifier string using 'uuidgen'.
> > > > 
> > > > 5. When the virtio device is created, the operator uses the "uuid" 
> > > > option
> > > >(for example, '-device virtio-net-pci,uuid="string"') and specifies
> > > >the UUID created in step 4.
> > > > 
> > > >QEMU stores the UUID in the virtio device's configuration space
> > > >in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > > > 
> > > > 6. When assigning a PCI device to the guest in passthrough mode, the
> > > >operator first creates a bridge using the "uuid" option (for example,
> > > >'-device pcie-downstream,uuid="string"') to specify the UUID created
> > > >in step 4, and then attaches the passthrough device to the bridge.
> > > > 
> > > >QEMU stores the UUID in the configuration space of the bridge as
> > > >Vendor-Specific capabili

Re: [Qemu-devel] [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-27 Thread Venu Busireddy

On 2018-06-27 15:24:58 +0300, Roman Kagan wrote:
> On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > The patch set "Enable virtio_net to act as a standby for a passthru
> > device" [1] deals with live migration of guests that use passthrough
> > devices. However, that scheme uses the MAC address for pairing
> > the virtio device and the passthrough device. The thread "netvsc:
> > refactor notifier/event handling code to use the failover framework"
> > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > store pairing information..." [4] were made.
> 
> I must have missed something in those threads, but where does this UUID
> thing come about?  AFAICS this identifier doesn't need to be
> "universally" unique, nor persistent; it only has to be unique across
> the VM and stable throughout the VM lifetime.

The notion of using UUID came up in the thread

   https://www.spinics.net/lists/netdev/msg499011.html

> FWIW Hyper-V uses a 32-bit integer for this purpose, not a UUID as seems
> to be implied in the thread you refer to.

Yes, Hyper-V uses a serial number (but I think it is 64-bit value).
However, what we are doing is similar to that. Instead of 32 bits,
we are using 128 bits.

> > The current patch set includes all the feedback received for proposals [3]
> > and [4]. For the sake of completeness, patch for the virtio specification
> > is also included here. Following is the updated proposal.
> > 
> > 1. Extend the virtio specification to include a new virtio PCI capability
> >"VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > 
> > 2. Enhance the QEMU CLI to include a "uuid" option to the virtio device.
> >The "uuid" is a string in UUID format.
> > 
> > 3. Enhance the QEMU CLI to include a "uuid" option to the bridge device.
> >The "uuid" is a string in UUID format. Currently, PCIe bridge for
> >the Q35 model is supported.
> > 
> > 4. The operator creates a unique identifier string using 'uuidgen'.
> > 
> > 5. When the virtio device is created, the operator uses the "uuid" option
> >(for example, '-device virtio-net-pci,uuid="string"') and specifies
> >the UUID created in step 4.
> > 
> >QEMU stores the UUID in the virtio device's configuration space
> >in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > 
> > 6. When assigning a PCI device to the guest in passthrough mode, the
> >operator first creates a bridge using the "uuid" option (for example,
> >'-device pcie-downstream,uuid="string"') to specify the UUID created
> >in step 4, and then attaches the passthrough device to the bridge.
> > 
> >QEMU stores the UUID in the configuration space of the bridge as
> >Vendor-Specific capability (0x09). The "Vendor" here is not to be
> >confused with a specific organization. Instead, the vendor of the
> >bridge is QEMU. To avoid mixing up with other bridges, the bridge
> >will be created with vendor ID 0x1b36 (PCI_VENDOR_ID_REDHAT) and
> >device ID 0x000e (PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE) if the "uuid"
> >option is specified. Otherwise, current defaults are used.
> 
> I wonder if it makes more sense to drop the concept of failover groups,
> and just refer to the standby device by device-id, like 
> 
>   -device virtio-net-pci,id=foo \
>   -device pcie-downstream,failover=foo

Isn't this the same as what this patch series proposes? In your
suggestion, "foo" is the entity that connects the passthrough device
and the failover device. In this patch set, that "foo" is the UUID,
and the options "id" and "failover" are replaced by "uuid". Do you agree?

Regards,

Venu

> The bridge device will then lookup the failover device, figure out the
> common identifier to expose to the guest, and defer the visibility of
> the PT device behind the bridge until the guest acknowledged the support
> for failover on the PV device.
> 
> Roman.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-26 Thread Venu Busireddy

On 2018-06-27 07:06:42 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2018 at 10:49:30PM -0500, Venu Busireddy wrote:
> > The patch set "Enable virtio_net to act as a standby for a passthru
> > device" [1] deals with live migration of guests that use passthrough
> > devices. However, that scheme uses the MAC address for pairing
> > the virtio device and the passthrough device. The thread "netvsc:
> > refactor notifier/event handling code to use the failover framework"
> > [2] discusses an alternate mechanism, such as using an UUID, for pairing
> > the devices. Based on that discussion, proposals "Add "Group Identifier"
> > to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
> > store pairing information..." [4] were made.
> > 
> > The current patch set includes all the feedback received for proposals [3]
> > and [4]. For the sake of completeness, patch for the virtio specification
> > is also included here. Following is the updated proposal.
> > 
> > 1. Extend the virtio specification to include a new virtio PCI capability
> >"VIRTIO_PCI_CAP_GROUP_ID_CFG".
> 
> There's still discussion around whether it should be
> a virtio pci capability, a virtio net config field or
> a new kind of capability.
> 
> > 2. Enhance the QEMU CLI to include a "uuid" option to the virtio device.
> >The "uuid" is a string in UUID format.
> > 
> > 3. Enhance the QEMU CLI to include a "uuid" option to the bridge device.
> >The "uuid" is a string in UUID format. Currently, PCIe bridge for
> >the Q35 model is supported.
> > 
> > 4. The operator creates a unique identifier string using 'uuidgen'.
> > 
> > 5. When the virtio device is created, the operator uses the "uuid" option
> >(for example, '-device virtio-net-pci,uuid="string"') and specifies
> >the UUID created in step 4.
> > 
> >QEMU stores the UUID in the virtio device's configuration space
> >in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".
> > 
> > 6. When assigning a PCI device to the guest in passthrough mode, the
> >operator first creates a bridge using the "uuid" option (for example,
> >'-device pcie-downstream,uuid="string"') to specify the UUID created
> >in step 4, and then attaches the passthrough device to the bridge.
> > 
> >QEMU stores the UUID in the configuration space of the bridge as
> >Vendor-Specific capability (0x09). The "Vendor" here is not to be
> >confused with a specific organization. Instead, the vendor of the
> >bridge is QEMU. To avoid mixing up with other bridges, the bridge
> >will be created with vendor ID 0x1b36 (PCI_VENDOR_ID_REDHAT) and
> >device ID 0x000e (PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE) if the "uuid"
> >option is specified. Otherwise, current defaults are used.
> > 
> > 7. Patch 4 in patch series "Enable virtio_net to act as a standby for
> >a passthru device" [1] needs to be modified to use the UUID values
> >present in the bridge's configuration space and the virtio device's
> >configuration space instead of the MAC address for pairing the devices.
> > 
> > Thanks!
> > 
> > Venu
> 
> The part where the visibility of a vfio device is controlled by the
> virtio driver acknowledging the backup feature is missing here.

Could you please elaborate?

Thanks,

Venu

>  
> 
> > [1] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00156.html
> > [2] https://www.spinics.net/lists/netdev/msg499011.html
> > [3] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00118.html
> > [4] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00204.html
> > 
> > Changes in v2:
> >   - As Michael Tsirkin suggested, changed the virtio specification
> > to restrict the group identifier to be a 16-byte field, presented
> > entirely in the virtio device's configuration space.
> >   - As Michael Tsirkin suggested, instead of tweaking the ioh3420
> > device with Red Hat vendor ID, create a new PCIe bridge device
> > named "pcie-downstream" with Red Hat Vendor ID, and include the
> > group identifier in this device.
> >   - Added a new patch to enhance the "pci-bridge" device to support
> > the group identifier (for the i440FX model).
> > 
> > Venu Busireddy (4):
> >   Add a true or false option to the DEFINE_PROP_UUID macro.
> >   Add "Group Id

Re: [Qemu-devel] [virtio-dev] Re: [PATCH v2 3/4] Add "Group Identifier" support to Red Hat PCI bridge.

2018-06-26 Thread Venu Busireddy

On 2018-06-27 07:02:36 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 26, 2018 at 10:49:33PM -0500, Venu Busireddy wrote:
> > Add the "Vendor-Specific" capability to the Red Hat PCI bridge device
> > "pci-bridge", to contain the "Group Identifier" (UUID) that will be
> > used to pair a virtio device with the passthrough device attached to
> > that bridge.
> > 
> > This capability is added to the bridge iff the "uuid" option is specified
> > for the bridge.
> 
> I think the name should be more explicit. How about "failover-group-id"?

I can change it. But don't you think it is bit long?

> > 
> > Signed-off-by: Venu Busireddy 
> 
> I'd like to also tweak the device id in this case,
> to make it easier for guests to know it's a grouping bridge.

Could you please recommend a name for the new ID'd definition? Something
in lines of PCI_DEVICE_ID_REDHAT_.

Thanks,

Venu

> 
> > ---
> >  hw/pci-bridge/pci_bridge_dev.c |  8 
> >  hw/pci/pci_bridge.c| 26 ++
> >  include/hw/pci/pcie.h  |  1 +
> >  3 files changed, 35 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
> > index b2d861d216..bbbc6fa1c6 100644
> > --- a/hw/pci-bridge/pci_bridge_dev.c
> > +++ b/hw/pci-bridge/pci_bridge_dev.c
> > @@ -71,6 +71,12 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
> > **errp)
> >  bridge_dev->msi = ON_OFF_AUTO_OFF;
> >  }
> >  
> > +err = pci_bridge_vendor_init(dev, 0, errp);
> > +if (err < 0) {
> > +error_append_hint(errp, "Can't init group ID, error %d\n", err);
> > +goto vendor_cap_err;
> > +}
> > +
> >  err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0, errp);
> >  if (err) {
> >  goto slotid_error;
> > @@ -109,6 +115,7 @@ slotid_error:
> >  if (shpc_present(dev)) {
> >  shpc_cleanup(dev, _dev->bar);
> >  }
> > +vendor_cap_err:
> >  shpc_error:
> >  pci_bridge_exitfn(dev);
> >  }
> > @@ -162,6 +169,7 @@ static Property pci_bridge_dev_properties[] = {
> >  ON_OFF_AUTO_AUTO),
> >  DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
> >  PCI_BRIDGE_DEV_F_SHPC_REQ, true),
> > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> >  DEFINE_PROP_END_OF_LIST(),
> >  };
> >  
> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > index 40a39f57cb..cb8b3dad2a 100644
> > --- a/hw/pci/pci_bridge.c
> > +++ b/hw/pci/pci_bridge.c
> > @@ -34,12 +34,17 @@
> >  #include "hw/pci/pci_bus.h"
> >  #include "qemu/range.h"
> >  #include "qapi/error.h"
> > +#include "qemu/uuid.h"
> >  
> >  /* PCI bridge subsystem vendor ID helper functions */
> >  #define PCI_SSVID_SIZEOF8
> >  #define PCI_SSVID_SVID  4
> >  #define PCI_SSVID_SSID  6
> >  
> > +#define PCI_VENDOR_SIZEOF 20
> > +#define PCI_VENDOR_CAP_LEN_OFFSET  2
> > +#define PCI_VENDOR_GROUP_ID_OFFSET 4
> > +
> >  int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> >uint16_t svid, uint16_t ssid,
> >Error **errp)
> > @@ -57,6 +62,27 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
> >  return pos;
> >  }
> >  
> > +int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp)
> > +{
> > +int pos;
> > +
> > +if (qemu_uuid_is_null(>uuid)) {
> > +return 0;
> > +}
> > +
> > +pos = pci_add_capability(d, PCI_CAP_ID_VNDR, offset, PCI_VENDOR_SIZEOF,
> > +errp);
> > +if (pos < 0) {
> > +return pos;
> > +}
> > +
> > +pci_set_word(d->config + pos + PCI_VENDOR_CAP_LEN_OFFSET,
> > +PCI_VENDOR_SIZEOF);
> > +memcpy(d->config + pos + PCI_VENDOR_GROUP_ID_OFFSET, >uuid,
> > +sizeof(QemuUUID));
> > +return pos;
> > +}
> > +
> >  /* Accessor function to get parent bridge device from pci bus. */
> >  PCIDevice *pci_bridge_get_device(PCIBus *bus)
> >  {
> > diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
> > index b71e369703..b4189d0ce3 100644
> > --- a/include/hw/pci/pcie.h
> > +++ b/include/hw/pci/pcie.h
> > @@ -82,6 +82,7 @@ struct PCIExpressDevice {
> >  };
> >  
> >  #define COMPAT_PROP_PCP "power_controller_present"
> > +#define COMPAT_PROP_UUID "uuid"
> >  
> >  /* PCI express capability helper functions */
> >  int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type,
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>

[Qemu-devel] [PATCH v2 0/4] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-26 Thread Venu Busireddy

The patch set "Enable virtio_net to act as a standby for a passthru
device" [1] deals with live migration of guests that use passthrough
devices. However, that scheme uses the MAC address for pairing
the virtio device and the passthrough device. The thread "netvsc:
refactor notifier/event handling code to use the failover framework"
[2] discusses an alternate mechanism, such as using an UUID, for pairing
the devices. Based on that discussion, proposals "Add "Group Identifier"
to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
store pairing information..." [4] were made.

The current patch set includes all the feedback received for proposals [3]
and [4]. For the sake of completeness, patch for the virtio specification
is also included here. Following is the updated proposal.

1. Extend the virtio specification to include a new virtio PCI capability
   "VIRTIO_PCI_CAP_GROUP_ID_CFG".

2. Enhance the QEMU CLI to include a "uuid" option to the virtio device.
   The "uuid" is a string in UUID format.

3. Enhance the QEMU CLI to include a "uuid" option to the bridge device.
   The "uuid" is a string in UUID format. Currently, PCIe bridge for
   the Q35 model is supported.

4. The operator creates a unique identifier string using 'uuidgen'.

5. When the virtio device is created, the operator uses the "uuid" option
   (for example, '-device virtio-net-pci,uuid="string"') and specifies
   the UUID created in step 4.

   QEMU stores the UUID in the virtio device's configuration space
   in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".

6. When assigning a PCI device to the guest in passthrough mode, the
   operator first creates a bridge using the "uuid" option (for example,
   '-device pcie-downstream,uuid="string"') to specify the UUID created
   in step 4, and then attaches the passthrough device to the bridge.

   QEMU stores the UUID in the configuration space of the bridge as
   Vendor-Specific capability (0x09). The "Vendor" here is not to be
   confused with a specific organization. Instead, the vendor of the
   bridge is QEMU. To avoid mixing up with other bridges, the bridge
   will be created with vendor ID 0x1b36 (PCI_VENDOR_ID_REDHAT) and
   device ID 0x000e (PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE) if the "uuid"
   option is specified. Otherwise, current defaults are used.

7. Patch 4 in patch series "Enable virtio_net to act as a standby for
   a passthru device" [1] needs to be modified to use the UUID values
   present in the bridge's configuration space and the virtio device's
   configuration space instead of the MAC address for pairing the devices.

Thanks!

Venu

[1] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00156.html
[2] https://www.spinics.net/lists/netdev/msg499011.html
[3] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00118.html
[4] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00204.html

Changes in v2:
  - As Michael Tsirkin suggested, changed the virtio specification
to restrict the group identifier to be a 16-byte field, presented
entirely in the virtio device's configuration space.
  - As Michael Tsirkin suggested, instead of tweaking the ioh3420
device with Red Hat vendor ID, create a new PCIe bridge device
named "pcie-downstream" with Red Hat Vendor ID, and include the
group identifier in this device.
  - Added a new patch to enhance the "pci-bridge" device to support
the group identifier (for the i440FX model).

Venu Busireddy (4):
  Add a true or false option to the DEFINE_PROP_UUID macro.
  Add "Group Identifier" support to virtio devices.
  Add "Group Identifier" support to Red Hat PCI bridge.
  Add "Group Identifier" support to Red Hat PCI Express bridge.

 default-configs/arm-softmmu.mak |   1 +
 default-configs/i386-softmmu.mak|   1 +
 default-configs/x86_64-softmmu.mak  |   1 +
 hw/acpi/vmgenid.c   |   2 +-
 hw/pci-bridge/Makefile.objs |   1 +
 hw/pci-bridge/pci_bridge_dev.c  |   8 +
 hw/pci-bridge/pcie_downstream.c | 215 
 hw/pci-bridge/pcie_downstream.h |  10 +
 hw/pci/pci_bridge.c |  26 +++
 hw/virtio/virtio-pci.c  |  15 ++
 hw/virtio/virtio-pci.h  |   3 +-
 include/hw/pci/pci.h|   3 +
 include/hw/pci/pcie.h   |   1 +
 include/hw/qdev-properties.h|   4 +-
 include/standard-headers/linux/virtio_pci.h |   8 +
 15 files changed, 295 insertions(+), 4 deletions(-)
 create mode 100644 hw/pci-bridge/pcie_downstream.c
 create mode 100644 hw/pci-bridge/pcie_downstream.h

[Qemu-devel] [PATCH v2 4/4] Add "Group Identifier" support to Red Hat PCI Express bridge.

2018-06-26 Thread Venu Busireddy

Add a new bridge device "pcie-downstream" with a Vendor ID of
PCI_VENDOR_ID_REDHAT and Device ID of PCI_DEVICE_ID_REDHAT_DOWNSTREAM.
Also add the "Vendor-Specific" capability to the bridge to contain the
"Group Identifier" (UUID) that will be used to pair a virtio device with
the passthrough device attached to that bridge.

This capability is added to the bridge iff the "uuid" option is specified
for the bridge.

Signed-off-by: Venu Busireddy 
---
 default-configs/arm-softmmu.mak|   1 +
 default-configs/i386-softmmu.mak   |   1 +
 default-configs/x86_64-softmmu.mak |   1 +
 hw/pci-bridge/Makefile.objs|   1 +
 hw/pci-bridge/pcie_downstream.c| 215 +
 hw/pci-bridge/pcie_downstream.h|  10 ++
 include/hw/pci/pci.h   |   1 +
 7 files changed, 230 insertions(+)
 create mode 100644 hw/pci-bridge/pcie_downstream.c
 create mode 100644 hw/pci-bridge/pcie_downstream.h

diff --git a/default-configs/arm-softmmu.mak b/default-configs/arm-softmmu.mak
index 834d45cfaf..b86c6fb122 100644
--- a/default-configs/arm-softmmu.mak
+++ b/default-configs/arm-softmmu.mak
@@ -139,6 +139,7 @@ CONFIG_IMX_I2C=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_ACPI=y
 CONFIG_SMBIOS=y
diff --git a/default-configs/i386-softmmu.mak b/default-configs/i386-softmmu.mak
index 8c7d4a0fa0..a900c8f052 100644
--- a/default-configs/i386-softmmu.mak
+++ b/default-configs/i386-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/default-configs/x86_64-softmmu.mak 
b/default-configs/x86_64-softmmu.mak
index 0390b4303c..481e4764be 100644
--- a/default-configs/x86_64-softmmu.mak
+++ b/default-configs/x86_64-softmmu.mak
@@ -56,6 +56,7 @@ CONFIG_ACPI_NVDIMM=y
 CONFIG_PCIE_PORT=y
 CONFIG_XIO3130=y
 CONFIG_IOH3420=y
+CONFIG_PCIE_DOWNSTREAM=y
 CONFIG_I82801B11=y
 CONFIG_SMBIOS=y
 CONFIG_HYPERV_TESTDEV=$(CONFIG_KVM)
diff --git a/hw/pci-bridge/Makefile.objs b/hw/pci-bridge/Makefile.objs
index 47065f87d9..5b42212edc 100644
--- a/hw/pci-bridge/Makefile.objs
+++ b/hw/pci-bridge/Makefile.objs
@@ -3,6 +3,7 @@ common-obj-$(CONFIG_PCIE_PORT) += pcie_root_port.o 
gen_pcie_root_port.o pcie_pci
 common-obj-$(CONFIG_PXB) += pci_expander_bridge.o
 common-obj-$(CONFIG_XIO3130) += xio3130_upstream.o xio3130_downstream.o
 common-obj-$(CONFIG_IOH3420) += ioh3420.o
+common-obj-$(CONFIG_PCIE_DOWNSTREAM) += pcie_downstream.o
 common-obj-$(CONFIG_I82801B11) += i82801b11.o
 # NewWorld PowerMac
 common-obj-$(CONFIG_DEC_PCI) += dec.o
diff --git a/hw/pci-bridge/pcie_downstream.c b/hw/pci-bridge/pcie_downstream.c
new file mode 100644
index 00..78604504ea
--- /dev/null
+++ b/hw/pci-bridge/pcie_downstream.c
@@ -0,0 +1,215 @@
+/*
+ * Red Hat PCI Express downstream port.
+ *
+ * pcie_downstream.c
+ * Most of this code is copied from xio3130_downstream.c
+ *
+ * Copyright (c) 2018 Oracle and/or its affiliates.
+ * Author: Venu Busireddy 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see <http://www.gnu.org/licenses/>.
+ */
+
+#include "qemu/osdep.h"
+#include "hw/pci/pci_ids.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/pcie.h"
+#include "pcie_downstream.h"
+#include "qapi/error.h"
+
+#define REDHAT_PCIE_DS_REVISION0x1
+#define REDHAT_PCIE_DS_MSI_OFFSET  0x70
+#define REDHAT_PCIE_DS_MSI_SUPPORTED_FLAGS PCI_MSI_FLAGS_64BIT
+#define REDHAT_PCIE_DS_MSI_NR_VECTOR   1
+#define REDHAT_PCIE_DS_SSVID_OFFSET0x80
+#define REDHAT_PCIE_DS_SSVID_SVID  0
+#define REDHAT_PCIE_DS_SSVID_SSID  0
+#define REDHAT_PCIE_DS_EXP_OFFSET  0x90
+#define REDHAT_PCIE_DS_VENDOR_OFFSET   0xCC
+#define REDHAT_PCIE_DS_AER_OFFSET  0x100
+
+static void pcie_ds_write_config(PCIDevice *d, uint32_t address,
+ uint32_t val, int len)
+{
+pci_bridge_write_config(d, address, val, len);
+pcie_cap_flr_write_config(d, address, val, len);
+pcie_cap_slot_write_config(d, address, val, len);
+pcie_aer_write_config(d, address, val, len);
+}
+
+static void pcie_ds_reset(DeviceState *qdev)
+{
+PCIDevice *d = PC

[Qemu-devel] [PATCH v2 2/4] Add "Group Identifier" support to virtio devices.

2018-06-26 Thread Venu Busireddy

Use the virtio PCI capability "VIRTIO_PCI_CAP_GROUP_ID_CFG" to store the
"Group Identifier" (UUID) specified via the command line option "uuid"
for the virtio device. The capability will be present in the virtio
device's configuration space iff the "uuid" option is specified.

Group Identifier is used to pair a virtio device with a passthrough
device.

Signed-off-by: Venu Busireddy 
---
 hw/virtio/virtio-pci.c  | 15 +++
 hw/virtio/virtio-pci.h  |  3 ++-
 include/hw/pci/pci.h|  2 ++
 include/standard-headers/linux/virtio_pci.h |  8 
 4 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3a01fe90f0..42703a5567 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -36,6 +36,7 @@
 #include "qemu/range.h"
 #include "hw/virtio/virtio-bus.h"
 #include "qapi/visitor.h"
+#include "qemu/uuid.h"
 
 #define VIRTIO_PCI_REGION_SIZE(dev) 
VIRTIO_PCI_CONFIG_OFF(msix_present(dev))
 
@@ -1638,6 +1639,10 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 .cap.cap_len = sizeof cfg,
 .cap.cfg_type = VIRTIO_PCI_CAP_PCI_CFG,
 };
+struct virtio_pci_group_id_cap group = {
+.cap.cap_len = sizeof group,
+.cap.cfg_type = VIRTIO_PCI_CAP_GROUP_ID_CFG,
+};
 struct virtio_pci_notify_cap notify_pio = {
 .cap.cap_len = sizeof notify,
 .notify_off_multiplier = cpu_to_le32(0x0),
@@ -1647,6 +1652,11 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 
 virtio_pci_modern_regions_init(proxy);
 
+if (!qemu_uuid_is_null(>pci_dev.uuid)) {
+memcpy(group.uuid, >pci_dev.uuid, sizeof(QemuUUID));
+virtio_pci_modern_mem_region_map(proxy, >group, );
+}
+
 virtio_pci_modern_mem_region_map(proxy, >common, );
 virtio_pci_modern_mem_region_map(proxy, >isr, );
 virtio_pci_modern_mem_region_map(proxy, >device, );
@@ -1763,6 +1773,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 proxy->device.size = 0x1000;
 proxy->device.type = VIRTIO_PCI_CAP_DEVICE_CFG;
 
+proxy->group.offset = 0;
+proxy->group.size = 0;
+proxy->group.type = VIRTIO_PCI_CAP_GROUP_ID_CFG;
+
 proxy->notify.offset = 0x3000;
 proxy->notify.size = virtio_pci_queue_mem_mult(proxy) * VIRTIO_QUEUE_MAX;
 proxy->notify.type = VIRTIO_PCI_CAP_NOTIFY_CFG;
@@ -1898,6 +1912,7 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
 DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
+DEFINE_PROP_UUID("uuid", PCIDevice, uuid, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..e4592e90bf 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -164,10 +164,11 @@ struct VirtIOPCIProxy {
 VirtIOPCIRegion common;
 VirtIOPCIRegion isr;
 VirtIOPCIRegion device;
+VirtIOPCIRegion group;
 VirtIOPCIRegion notify;
 VirtIOPCIRegion notify_pio;
 };
-VirtIOPCIRegion regs[5];
+VirtIOPCIRegion regs[6];
 };
 MemoryRegion modern_bar;
 MemoryRegion io_bar;
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 990d6fcbde..ee234c5a6f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -4,6 +4,7 @@
 #include "hw/qdev.h"
 #include "exec/memory.h"
 #include "sysemu/dma.h"
+#include "qemu/uuid.h"
 
 /* PCI includes legacy ISA access.  */
 #include "hw/isa/isa.h"
@@ -343,6 +344,7 @@ struct PCIDevice {
 bool has_rom;
 MemoryRegion rom;
 uint32_t rom_bar;
+QemuUUID uuid;
 
 /* INTx routing notifier */
 PCIINTxRoutingNotifier intx_routing_notifier;
diff --git a/include/standard-headers/linux/virtio_pci.h 
b/include/standard-headers/linux/virtio_pci.h
index 9262acd130..f6de333f1d 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -113,6 +113,8 @@
 #define VIRTIO_PCI_CAP_DEVICE_CFG  4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG 5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG6
 
 /* This is the PCI capability header: */
 struct virtio_pci_cap {
@@ -163,6 +165,12 @@ struct virtio_pci_cfg_cap {
uint8_t pci_cfg_data[4]; /* Data for BAR access. */
 };
 
+/* Fields in VIRTIO_PCI_CAP_GROUP_ID_CFG: */
+struct virtio_pci_group_id_cap {
+   struct virtio_pci_cap cap;
+   uint8_t uuid[16];
+};
+
 /* Macro versions of offsets for the Old Timers! */
 #define VIRTIO_PCI_CAP_VNDR0
 #define VIRTIO_PCI_CAP_NEXT1

[Qemu-devel] [PATCH v2 3/4] Add "Group Identifier" support to Red Hat PCI bridge.

2018-06-26 Thread Venu Busireddy

Add the "Vendor-Specific" capability to the Red Hat PCI bridge device
"pci-bridge", to contain the "Group Identifier" (UUID) that will be
used to pair a virtio device with the passthrough device attached to
that bridge.

This capability is added to the bridge iff the "uuid" option is specified
for the bridge.

Signed-off-by: Venu Busireddy 
---
 hw/pci-bridge/pci_bridge_dev.c |  8 
 hw/pci/pci_bridge.c| 26 ++
 include/hw/pci/pcie.h  |  1 +
 3 files changed, 35 insertions(+)

diff --git a/hw/pci-bridge/pci_bridge_dev.c b/hw/pci-bridge/pci_bridge_dev.c
index b2d861d216..bbbc6fa1c6 100644
--- a/hw/pci-bridge/pci_bridge_dev.c
+++ b/hw/pci-bridge/pci_bridge_dev.c
@@ -71,6 +71,12 @@ static void pci_bridge_dev_realize(PCIDevice *dev, Error 
**errp)
 bridge_dev->msi = ON_OFF_AUTO_OFF;
 }
 
+err = pci_bridge_vendor_init(dev, 0, errp);
+if (err < 0) {
+error_append_hint(errp, "Can't init group ID, error %d\n", err);
+goto vendor_cap_err;
+}
+
 err = slotid_cap_init(dev, 0, bridge_dev->chassis_nr, 0, errp);
 if (err) {
 goto slotid_error;
@@ -109,6 +115,7 @@ slotid_error:
 if (shpc_present(dev)) {
 shpc_cleanup(dev, _dev->bar);
 }
+vendor_cap_err:
 shpc_error:
 pci_bridge_exitfn(dev);
 }
@@ -162,6 +169,7 @@ static Property pci_bridge_dev_properties[] = {
 ON_OFF_AUTO_AUTO),
 DEFINE_PROP_BIT(PCI_BRIDGE_DEV_PROP_SHPC, PCIBridgeDev, flags,
 PCI_BRIDGE_DEV_F_SHPC_REQ, true),
+DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40a39f57cb..cb8b3dad2a 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -34,12 +34,17 @@
 #include "hw/pci/pci_bus.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
+#include "qemu/uuid.h"
 
 /* PCI bridge subsystem vendor ID helper functions */
 #define PCI_SSVID_SIZEOF8
 #define PCI_SSVID_SVID  4
 #define PCI_SSVID_SSID  6
 
+#define PCI_VENDOR_SIZEOF 20
+#define PCI_VENDOR_CAP_LEN_OFFSET  2
+#define PCI_VENDOR_GROUP_ID_OFFSET 4
+
 int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
   uint16_t svid, uint16_t ssid,
   Error **errp)
@@ -57,6 +62,27 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
 return pos;
 }
 
+int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp)
+{
+int pos;
+
+if (qemu_uuid_is_null(>uuid)) {
+return 0;
+}
+
+pos = pci_add_capability(d, PCI_CAP_ID_VNDR, offset, PCI_VENDOR_SIZEOF,
+errp);
+if (pos < 0) {
+return pos;
+}
+
+pci_set_word(d->config + pos + PCI_VENDOR_CAP_LEN_OFFSET,
+PCI_VENDOR_SIZEOF);
+memcpy(d->config + pos + PCI_VENDOR_GROUP_ID_OFFSET, >uuid,
+sizeof(QemuUUID));
+return pos;
+}
+
 /* Accessor function to get parent bridge device from pci bus. */
 PCIDevice *pci_bridge_get_device(PCIBus *bus)
 {
diff --git a/include/hw/pci/pcie.h b/include/hw/pci/pcie.h
index b71e369703..b4189d0ce3 100644
--- a/include/hw/pci/pcie.h
+++ b/include/hw/pci/pcie.h
@@ -82,6 +82,7 @@ struct PCIExpressDevice {
 };
 
 #define COMPAT_PROP_PCP "power_controller_present"
+#define COMPAT_PROP_UUID "uuid"
 
 /* PCI express capability helper functions */
 int pcie_cap_init(PCIDevice *dev, uint8_t offset, uint8_t type,

[Qemu-devel] [PATCH v2 virtio 1/1] Add "Group Identifier" to virtio PCI capabilities.

2018-06-26 Thread Venu Busireddy

Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
virtio PCI capabilities to allow for the grouping of devices.

Signed-off-by: Venu Busireddy 
---
 content.tex | 36 
 1 file changed, 36 insertions(+)

diff --git a/content.tex b/content.tex
index be18234..27581c1 100644
--- a/content.tex
+++ b/content.tex
@@ -599,6 +599,8 @@ The fields are interpreted as follows:
 #define VIRTIO_PCI_CAP_DEVICE_CFG4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG   5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
 \end{lstlisting}
 
 Any other value is reserved for future use.
@@ -997,6 +999,40 @@ address \field{cap.length} bytes within a BAR range
 specified by some other Virtio Structure PCI Capability
 of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
 
+\subsubsection{Group Identifier capability}\label{sec:Virtio Transport Options 
/ Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping devices 
together.
+
+The capability is immediately followed by an identifier of arbitrary size as 
below:
+
+\begin{lstlisting}
+struct virtio_pci_group_id_cap {
+struct virtio_pci_cap cap;
+u8 group_id[]; /* Group Identifier */
+};
+\end{lstlisting}
+
+The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
+and \field{group_id} are read-only for the driver.
+
+The specification does not impose any restrictions on the structure
+or size of group_id[], except that the size must be a multiple of 4.
+Devices are free to declare this array as large as needed, as long as
+the combined size of all capabilities can be accommodated within the
+PCI configuration space.
+
+The field \field{cap.cap_len} indicates the length of the group identifier
+\field{group_id}. The fields \field{cap.bar}, \field{cap.offset} and
+\field{cap.length} should be set to 0.
+
+\devicenormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
+
+\drivernormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The driver MUST NOT write to group_id[] area.
+
 \subsubsection{Legacy Interfaces: A Note on PCI Device 
Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device 
Layout / Legacy Interfaces: A Note on PCI Device Layout}
 
 Transitional devices MUST present part of configuration

[Qemu-devel] [PATCH v2 1/4] Add a true or false option to the DEFINE_PROP_UUID macro.

2018-06-26 Thread Venu Busireddy

It may not always be desirable to have a random UUID stuffed into the
'_field' member. Add a new boolean option '_default' that will allow
the caller to specify if a random UUID needs be generated or not.

Also modified the instance where this macro is used.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/vmgenid.c| 2 +-
 include/hw/qdev-properties.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index d78b579a20..6d53757ee5 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -215,7 +215,7 @@ static void vmgenid_realize(DeviceState *dev, Error **errp)
 }
 
 static Property vmgenid_device_properties[] = {
-DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid),
+DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 4f60cc88f3..7d39a4bdcd 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -218,12 +218,12 @@ extern const PropertyInfo qdev_prop_off_auto_pcibar;
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_off_auto_pcibar, \
 OffAutoPCIBAR)
 
-#define DEFINE_PROP_UUID(_name, _state, _field) {  \
+#define DEFINE_PROP_UUID(_name, _state, _field, _default) {\
 .name  = (_name),  \
 .info  = _prop_uuid,  \
 .offset= offsetof(_state, _field)  \
 + type_check(QemuUUID, typeof_field(_state, _field)),  \
-.set_default = true,   \
+.set_default = _default,   \
 }
 
 #define DEFINE_PROP_END_OF_LIST()   \

Re: [Qemu-devel] [virtio-dev] Re: [PATCH] qemu: Introduce VIRTIO_NET_F_STANDBY feature bit to virtio_net

2018-06-21 Thread Venu Busireddy

On 2018-06-21 18:21:55 -0700, Siwei Liu wrote:
> On Thu, Jun 21, 2018 at 7:59 AM, Cornelia Huck  wrote:
> > On Wed, 20 Jun 2018 22:48:58 +0300
> > "Michael S. Tsirkin"  wrote:
> >
> >> On Wed, Jun 20, 2018 at 06:06:19PM +0200, Cornelia Huck wrote:
> >> > In any case, I'm not sure anymore why we'd want the extra uuid.
> >>
> >> It's mostly so we can have e.g. multiple devices with same MAC
> >> (which some people seem to want in order to then use
> >> then with different containers).
> >>
> >> But it is also handy for when you assign a PF, since then you
> >> can't set the MAC.
> >>
> >
> > OK, so what about the following:
> >
> > - introduce a new feature bit, VIRTIO_NET_F_STANDBY_UUID that indicates
> >   that we have a new uuid field in the virtio-net config space
> > - in QEMU, add a property for virtio-net that allows to specify a uuid,
> >   offer VIRTIO_NET_F_STANDBY_UUID if set
> > - when configuring, set the property to the group UUID of the vfio-pci
> >   device
> 
> If feature negotiation fails on VIRTIO_NET_F_STANDBY_UUID, is it safe
> to still expose UUID in the config space on virtio-pci?

Why is it not safe? When we expose VIRTIO_NET_F_STANDBY_UUID, it is up
to the driver to decide if it wants to use it or not. If it does not
want to use the feature, it would also imply that the driver will not
be interested in the UUID value in the config space. So, the UUID will
be some piece of data that simply sits around; nobody cares for it.

> I'm not even sure if it's sane to expose group UUID on the PCI bridge
> where the corresponding vfio-pci device attached to for a guest which
> doesn't support the feature (legacy).

Unfortunately, you don't know beforehand if the guest will be a legacy
guest that doesn't support the feature. As is the case with the UUID in
the virtio-pci device's config space, the UUID in the bridge's config
space will/should be ignored by the legacy guest.

Venu

> > - in the guest, use the uuid from the virtio-net device's config space
> >   if applicable; else, fall back to matching by MAC as done today
> >
> > That should work for all virtio transports.

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy

On 2018-06-19 21:53:01 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 01:36:17PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > > > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain 
> > > > > > the
> > > > > > "Group Identifier" (UUID) that will be used to pair a virtio device 
> > > > > > with
> > > > > > the passthrough device attached to that bridge.
> > > > > > 
> > > > > > This capability is added to the bridge iff the "uuid" option is 
> > > > > > specified
> > > > > > for the bridge device, via the qemu command line. Also, the bridge's
> > > > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is 
> > > > > > changed
> > > > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when 
> > > > > > the
> > > > > > "uuid" option is present.
> > > > > > 
> > > > > > Signed-off-by: Venu Busireddy 
> > > > > 
> > > > > I don't see why we should add it to all bridges.
> > > > > Let's just add it to ones that already have the RH vendor ID?
> > > > 
> > > > No. I am not adding the capability to all bridges.
> > > > 
> > > > In the earlier discussions, we agreed that the bridge be left as
> > > > Intel bridge if we do not intend to use it for storing the pairing
> > > > information. If we do intend to store the pairing information in the
> > > > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > > > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > > > existence only when there is an intent to store the pairing information
> > > > in the bridge.
> > > > 
> > > > Accordingly, if the "uuid" option is specified for the bridge, it
> > > > is assumed that the user intends to use the bridge for storing the
> > > > pairing information, and hence, the capability is added to the bridge,
> > > > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > > > is not specified, the bridge remains as Intel bridge, and without the
> > > > vendor-specific capability.
> > > > 
> > > > Venu
> > > 
> > > Yes but the way to do it is not to tweak the vendor and device ID,
> > > instead, just add the UUID property to bridges that already have the
> > > correct vendor and device id.
> > 
> > I was using ioh3420 as the bridge device, because that is what is
> > recommended here:
> > 
> >   https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD
> > 
> > ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
> > Vendor ID to RH Vendor ID.
> > 
> > Is there another bridge device other than ioh3420 that I should use?
> > what device do you suggest? 
> > 
> > Thanks,
> > 
> > Venu
> 
> For pci, use hw/pci-bridge/pci_bridge_dev.c
> Maybe allocate a special ID for grouping bridges.
> 
> For express, add your own downstream port.

Specifically, on the command line, what device does the user specify?
For example:

 qemu-system-x86_64 --device ${Bridge_Device},uuid="uuid string",

What does the user specify for ${Bridge_Device} from the following:

"i82801b11-bridge", bus PCI
"ioh3420", bus PCI, desc "Intel IOH device id 3420 PCIE Root Port"
"pci-bridge", bus PCI, desc "Standard PCI Bridge"
"pci-bridge-seat", bus PCI, desc "Standard PCI Bridge (multiseat)"
"pcie-pci-bridge", bus PCI
"pcie-root-port", bus PCI, desc "PCI Express Root Port"
"pxb", bus PCI, desc "PCI Expander Bridge"
"pxb-pcie", bus PCI, desc "PCI Express Expander Bridge"
"usb-host", bus usb-bus
"usb-hub", bus usb-bus
"vfio-pci-igd-lpc-bridge", bus PCI, desc "VFIO dummy ISA/LPC bridge for IGD 
assignment"
"x3130-upstream", bus PCI, desc "TI X3130 Upstream Port of PCI Express Switch"
"xio3130-downstream", bus PCI, desc "TI X3130 Downstream Port of PCI Express 
Swi

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy

On 2018-06-19 21:21:23 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 01:14:06PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > > > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > > > "Group Identifier" (UUID) that will be used to pair a virtio device with
> > > > the passthrough device attached to that bridge.
> > > > 
> > > > This capability is added to the bridge iff the "uuid" option is 
> > > > specified
> > > > for the bridge device, via the qemu command line. Also, the bridge's
> > > > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > > > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
> > > > "uuid" option is present.
> > > > 
> > > > Signed-off-by: Venu Busireddy 
> > > 
> > > I don't see why we should add it to all bridges.
> > > Let's just add it to ones that already have the RH vendor ID?
> > 
> > No. I am not adding the capability to all bridges.
> > 
> > In the earlier discussions, we agreed that the bridge be left as
> > Intel bridge if we do not intend to use it for storing the pairing
> > information. If we do intend to store the pairing information in the
> > bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
> > avoid confusion. In other words, bridge's with RH Vendor ID come into
> > existence only when there is an intent to store the pairing information
> > in the bridge.
> > 
> > Accordingly, if the "uuid" option is specified for the bridge, it
> > is assumed that the user intends to use the bridge for storing the
> > pairing information, and hence, the capability is added to the bridge,
> > and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
> > is not specified, the bridge remains as Intel bridge, and without the
> > vendor-specific capability.
> > 
> > Venu
> 
> Yes but the way to do it is not to tweak the vendor and device ID,
> instead, just add the UUID property to bridges that already have the
> correct vendor and device id.

I was using ioh3420 as the bridge device, because that is what is
recommended here:

  https://git.qemu.org/?p=qemu.git;a=blob_plain;f=docs/pcie.txt;hb=HEAD

ioh3420 defaults to the Intel Vendor ID. Hence the tweak to change the
Vendor ID to RH Vendor ID.

Is there another bridge device other than ioh3420 that I should use?
what device do you suggest? 

Thanks,

Venu

> 
> 
> > > 
> > > 
> > > > ---
> > > >  hw/pci-bridge/ioh3420.c|  2 ++
> > > >  hw/pci-bridge/pcie_root_port.c |  7 +++
> > > >  hw/pci/pci_bridge.c| 32 
> > > >  include/hw/pci/pci.h   |  2 ++
> > > >  include/hw/pci/pcie.h  |  1 +
> > > >  include/hw/pci/pcie_port.h |  1 +
> > > >  6 files changed, 45 insertions(+)
> > > > 
> > > > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > > > index a451d74ee6..b6b9ebc726 100644
> > > > --- a/hw/pci-bridge/ioh3420.c
> > > > +++ b/hw/pci-bridge/ioh3420.c
> > > > @@ -35,6 +35,7 @@
> > > >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> > > >  #define IOH_EP_MSI_NR_VECTOR2
> > > >  #define IOH_EP_EXP_OFFSET   0x90
> > > > +#define IOH_EP_VENDOR_OFFSET0xCC
> > > >  #define IOH_EP_AER_OFFSET   0x100
> > > >  
> > > >  /*
> > > > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, 
> > > > void *data)
> > > >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> > > >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> > > >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > > > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> > > >  rpc->ssid = IOH_EP_SSVID_SSID;
> > > >  }
> > > >  
> > > > diff --git a/hw/pci-bridge/pcie_root_port.c 
> > > > b/hw/pci-bridge/pcie_root_port.c
> > > > index 45f9e8cd4a..ba470c7fda 100644
> > > > --- a/hw/pci-bridge/pcie_root_port.c
> > > > +++ b/hw/pci-bridge/pcie_root_port.c
> > > > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> > > >  goto err_bridge;

Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.

2018-06-19 Thread Venu Busireddy

On 2018-06-19 21:12:17 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 12:54:06PM -0500, Venu Busireddy wrote:
> > On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote:
> > > On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote:
> > > > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
> > > > virtio PCI capabilities to allow for the grouping of devices.
> > > > 
> > > > Signed-off-by: Venu Busireddy 
> > > > ---
> > > >  content.tex | 43 +++
> > > >  1 file changed, 43 insertions(+)
> > > > 
> > > > diff --git a/content.tex b/content.tex
> > > > index 7a92cb1..7ea6267 100644
> > > > --- a/content.tex
> > > > +++ b/content.tex
> > > > @@ -599,6 +599,8 @@ The fields are interpreted as follows:
> > > >  #define VIRTIO_PCI_CAP_DEVICE_CFG4
> > > >  /* PCI configuration access */
> > > >  #define VIRTIO_PCI_CAP_PCI_CFG   5
> > > > +/* Group Identifier */
> > > > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
> > > >  \end{lstlisting}
> > > >  
> > > >  Any other value is reserved for future use.
> > > > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range
> > > >  specified by some other Virtio Structure PCI Capability
> > > >  of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
> > > >  
> > > > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport 
> > > > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > > > capability}
> > > > +
> > > > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping 
> > > > devices together.
> > > > +
> > > > +The capability is immediately followed by an identifier of arbitrary 
> > > > size as below:
> > > > +
> > > > +\begin{lstlisting}
> > > > +struct virtio_pci_group_id_cap {
> > > > +struct virtio_pci_cap cap;
> > > > +u8 group_id[]; /* Group Identifier */
> > > > +};
> > > > +\end{lstlisting}
> > > > +
> > > > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
> > > > +and \field{group_id} are read-only for the driver.
> > > > +
> > > > +The specification does not impose any restrictions on the size or
> > > > +structure of group_id[].
> > > 
> > > I think it must be a multiple of 4 in size, as is
> > > standard for all capabilities.
> > 
> > Sure. Would rephrasing it as below suffice?
> > 
> > The specification does not impose any restrictions on the size or
> > structure of group_id[], except that the size must be a multiple of 4.
> > 
> > > 
> > > 
> > > > Vendors
> 
> Devices

Will correct it in the next version.

> 
> > are free to declare this array as
> > > > +large as needed, as long as the combined size of all capabilities can
> > > > +be accommodated within the PCI configuration space.
> > > > +
> > > > +If there is enough room in the PCI configuration space to accommodate
> > > > +the group identifier, the fields \field{cap.bar}, \field{cap.offset}
> > > > +and \field{cap.length} should be set to 0.
> > > > +
> > > > +If there isn't enough room, some or all of the group identifier can be
> > > > +presented in the BAR region, in which case the fields \field{cap.bar},
> > > > +\field{cap.offset} and \field{cap.length} should be set appropriately.
> > > 
> > > And then how do you glue the two pieces?
> > 
> > How the user glues them up is up to the user. The specification should
> > not impose rules on that, right?
> 
> We need to define how these are matched.
> Let's assume device A has it all in config space, device B
> has part in memory. How would we compare them?

I will go with your suggestion below, and hence, this becomes obsolete.

> 
> 
> 
> > > 
> > > > +
> > > > +In either case, the field \field{cap.cap_len} indicates the length of
> > > > +the group identifier information present in the configuration space
> > > > +itself.
> > > 
> > > It seems like an overkill to me. Isn't it enough to have it in config
> > > space? This would make comparisons easier.
> > 
> > I was trying to make the propo

Re: [Qemu-devel] [virtio-dev] Re: [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy

On 2018-06-19 20:24:12 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 11:32:26AM -0500, Venu Busireddy wrote:
> > Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
> > "Group Identifier" (UUID) that will be used to pair a virtio device with
> > the passthrough device attached to that bridge.
> > 
> > This capability is added to the bridge iff the "uuid" option is specified
> > for the bridge device, via the qemu command line. Also, the bridge's
> > Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
> > to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
> > "uuid" option is present.
> > 
> > Signed-off-by: Venu Busireddy 
> 
> I don't see why we should add it to all bridges.
> Let's just add it to ones that already have the RH vendor ID?

No. I am not adding the capability to all bridges.

In the earlier discussions, we agreed that the bridge be left as
Intel bridge if we do not intend to use it for storing the pairing
information. If we do intend to store the pairing information in the
bridge, we wanted to change the bridge's Vendor ID to RH Vendor ID to
avoid confusion. In other words, bridge's with RH Vendor ID come into
existence only when there is an intent to store the pairing information
in the bridge.

Accordingly, if the "uuid" option is specified for the bridge, it
is assumed that the user intends to use the bridge for storing the
pairing information, and hence, the capability is added to the bridge,
and the Vendor ID is changed to RH Vendor ID. If the "uuid" option
is not specified, the bridge remains as Intel bridge, and without the
vendor-specific capability.

Venu

> 
> 
> > ---
> >  hw/pci-bridge/ioh3420.c|  2 ++
> >  hw/pci-bridge/pcie_root_port.c |  7 +++
> >  hw/pci/pci_bridge.c| 32 
> >  include/hw/pci/pci.h   |  2 ++
> >  include/hw/pci/pcie.h  |  1 +
> >  include/hw/pci/pcie_port.h |  1 +
> >  6 files changed, 45 insertions(+)
> > 
> > diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
> > index a451d74ee6..b6b9ebc726 100644
> > --- a/hw/pci-bridge/ioh3420.c
> > +++ b/hw/pci-bridge/ioh3420.c
> > @@ -35,6 +35,7 @@
> >  #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
> >  #define IOH_EP_MSI_NR_VECTOR2
> >  #define IOH_EP_EXP_OFFSET   0x90
> > +#define IOH_EP_VENDOR_OFFSET0xCC
> >  #define IOH_EP_AER_OFFSET   0x100
> >  
> >  /*
> > @@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, void 
> > *data)
> >  rpc->exp_offset = IOH_EP_EXP_OFFSET;
> >  rpc->aer_offset = IOH_EP_AER_OFFSET;
> >  rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
> > +rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
> >  rpc->ssid = IOH_EP_SSVID_SSID;
> >  }
> >  
> > diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
> > index 45f9e8cd4a..ba470c7fda 100644
> > --- a/hw/pci-bridge/pcie_root_port.c
> > +++ b/hw/pci-bridge/pcie_root_port.c
> > @@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
> >  goto err_bridge;
> >  }
> >  
> > +rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
> > +if (rc < 0) {
> > +error_append_hint(errp, "Can't init group ID, error %d\n", rc);
> > +goto err_bridge;
> > +}
> > +
> >  if (rpc->interrupts_init) {
> >  rc = rpc->interrupts_init(d, errp);
> >  if (rc < 0) {
> > @@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d)
> >  static Property rp_props[] = {
> >  DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
> >  QEMU_PCIE_SLTCAP_PCP_BITNR, true),
> > +DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
> >  DEFINE_PROP_END_OF_LIST()
> >  };
> >  
> > diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
> > index 40a39f57cb..c63bc439f7 100644
> > --- a/hw/pci/pci_bridge.c
> > +++ b/hw/pci/pci_bridge.c
> > @@ -34,12 +34,17 @@
> >  #include "hw/pci/pci_bus.h"
> >  #include "qemu/range.h"
> >  #include "qapi/error.h"
> > +#include "qemu/uuid.h"
> >  
> >  /* PCI bridge subsystem vendor ID helper functions */
> >  #define PCI_SSVID_SIZEOF8
> >  #define PCI_SSVID_SVID  4
> >  #define PCI_SSVID_SSID  6
> >  
> >

Re: [Qemu-devel] [virtio-dev] Re: [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.

2018-06-19 Thread Venu Busireddy

On 2018-06-19 20:30:06 +0300, Michael S. Tsirkin wrote:
> On Tue, Jun 19, 2018 at 11:32:28AM -0500, Venu Busireddy wrote:
> > Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
> > virtio PCI capabilities to allow for the grouping of devices.
> > 
> > Signed-off-by: Venu Busireddy 
> > ---
> >  content.tex | 43 +++
> >  1 file changed, 43 insertions(+)
> > 
> > diff --git a/content.tex b/content.tex
> > index 7a92cb1..7ea6267 100644
> > --- a/content.tex
> > +++ b/content.tex
> > @@ -599,6 +599,8 @@ The fields are interpreted as follows:
> >  #define VIRTIO_PCI_CAP_DEVICE_CFG4
> >  /* PCI configuration access */
> >  #define VIRTIO_PCI_CAP_PCI_CFG   5
> > +/* Group Identifier */
> > +#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
> >  \end{lstlisting}
> >  
> >  Any other value is reserved for future use.
> > @@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range
> >  specified by some other Virtio Structure PCI Capability
> >  of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
> >  
> > +\subsubsection{Group Identifier capability}\label{sec:Virtio Transport 
> > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > capability}
> > +
> > +The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping 
> > devices together.
> > +
> > +The capability is immediately followed by an identifier of arbitrary size 
> > as below:
> > +
> > +\begin{lstlisting}
> > +struct virtio_pci_group_id_cap {
> > +struct virtio_pci_cap cap;
> > +u8 group_id[]; /* Group Identifier */
> > +};
> > +\end{lstlisting}
> > +
> > +The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
> > +and \field{group_id} are read-only for the driver.
> > +
> > +The specification does not impose any restrictions on the size or
> > +structure of group_id[].
> 
> I think it must be a multiple of 4 in size, as is
> standard for all capabilities.

Sure. Would rephrasing it as below suffice?

The specification does not impose any restrictions on the size or
structure of group_id[], except that the size must be a multiple of 4.

> 
> 
> > Vendors are free to declare this array as
> > +large as needed, as long as the combined size of all capabilities can
> > +be accommodated within the PCI configuration space.
> > +
> > +If there is enough room in the PCI configuration space to accommodate
> > +the group identifier, the fields \field{cap.bar}, \field{cap.offset}
> > +and \field{cap.length} should be set to 0.
> > +
> > +If there isn't enough room, some or all of the group identifier can be
> > +presented in the BAR region, in which case the fields \field{cap.bar},
> > +\field{cap.offset} and \field{cap.length} should be set appropriately.
> 
> And then how do you glue the two pieces?

How the user glues them up is up to the user. The specification should
not impose rules on that, right?

> 
> > +
> > +In either case, the field \field{cap.cap_len} indicates the length of
> > +the group identifier information present in the configuration space
> > +itself.
> 
> It seems like an overkill to me. Isn't it enough to have it in config
> space? This would make comparisons easier.

I was trying to make the proposal permissive for expansion, in case
the user needs the size to be larger than what can be accommodated in
the config space. Would you like me to restrict that the capability be
entirely present in the config space? I am fine with it. Please confirm,
and I will change it so.

> 
> > +
> > +\devicenormative{\paragraph}{Group Identifier capability}{Virtio Transport 
> > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > capability}
> > +
> > +The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
> > +
> > +\drivernormative{\paragraph}{Group Identifier capability}{Virtio Transport 
> > Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier 
> > capability}
> > +
> > +The driver MUST NOT write to group_id[] area or the BAR region.
> > +
> >  \subsubsection{Legacy Interfaces: A Note on PCI Device 
> > Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI 
> > Device Layout / Legacy Interfaces: A Note on PCI Device Layout}
> >  
> >  Transitional devices MUST present part of configuration
> 
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>

[Qemu-devel] [PATCH 1/3] Add a true or false option to the DEFINE_PROP_UUID macro.

2018-06-19 Thread Venu Busireddy

It may not always be desirable to have a random UUID stuffed into the
'_field' member. Add a new option '_default' to the macro, that will
allow the caller to specify if a random UUID needs be generated or not.

Also modified the instance where this macro is used.

Signed-off-by: Venu Busireddy 
---
 hw/acpi/vmgenid.c| 2 +-
 include/hw/qdev-properties.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/acpi/vmgenid.c b/hw/acpi/vmgenid.c
index d78b579a20..6d53757ee5 100644
--- a/hw/acpi/vmgenid.c
+++ b/hw/acpi/vmgenid.c
@@ -215,7 +215,7 @@ static void vmgenid_realize(DeviceState *dev, Error **errp)
 }
 
 static Property vmgenid_device_properties[] = {
-DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid),
+DEFINE_PROP_UUID(VMGENID_GUID, VmGenIdState, guid, true),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/include/hw/qdev-properties.h b/include/hw/qdev-properties.h
index 4f60cc88f3..7d39a4bdcd 100644
--- a/include/hw/qdev-properties.h
+++ b/include/hw/qdev-properties.h
@@ -218,12 +218,12 @@ extern const PropertyInfo qdev_prop_off_auto_pcibar;
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_off_auto_pcibar, \
 OffAutoPCIBAR)
 
-#define DEFINE_PROP_UUID(_name, _state, _field) {  \
+#define DEFINE_PROP_UUID(_name, _state, _field, _default) {\
 .name  = (_name),  \
 .info  = _prop_uuid,  \
 .offset= offsetof(_state, _field)  \
 + type_check(QemuUUID, typeof_field(_state, _field)),  \
-.set_default = true,   \
+.set_default = _default,   \
 }
 
 #define DEFINE_PROP_END_OF_LIST()   \

[Qemu-devel] [PATCH 3/3] Add "Group Identifier" support to virtio devices.

2018-06-19 Thread Venu Busireddy

Use the virtio PCI capability "VIRTIO_PCI_CAP_GROUP_ID_CFG" to store the
"Group Identifier" (UUID) specified via the command line option "uuid"
for the virtio device. The capability will be present in the virtio
device's configuration space iff the "uuid" option is specified.

Group Identifier is used to pair a virtio device with a passthrough
device.

Signed-off-by: Venu Busireddy 
---
 hw/virtio/virtio-pci.c  | 15 +++
 hw/virtio/virtio-pci.h  |  3 ++-
 include/standard-headers/linux/virtio_pci.h |  8 
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 3a01fe90f0..9c2ef16773 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -36,6 +36,7 @@
 #include "qemu/range.h"
 #include "hw/virtio/virtio-bus.h"
 #include "qapi/visitor.h"
+#include "qemu/uuid.h"
 
 #define VIRTIO_PCI_REGION_SIZE(dev) 
VIRTIO_PCI_CONFIG_OFF(msix_present(dev))
 
@@ -1638,6 +1639,10 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 .cap.cap_len = sizeof cfg,
 .cap.cfg_type = VIRTIO_PCI_CAP_PCI_CFG,
 };
+struct virtio_pci_group_id_cap group = {
+.cap.cap_len = sizeof group,
+.cap.cfg_type = VIRTIO_PCI_CAP_GROUP_ID_CFG,
+};
 struct virtio_pci_notify_cap notify_pio = {
 .cap.cap_len = sizeof notify,
 .notify_off_multiplier = cpu_to_le32(0x0),
@@ -1647,6 +1652,11 @@ static void virtio_pci_device_plugged(DeviceState *d, 
Error **errp)
 
 virtio_pci_modern_regions_init(proxy);
 
+if (!qemu_uuid_is_null(>pci_dev.uuid)) {
+memcpy(group.group_id, >pci_dev.uuid, sizeof(QemuUUID));
+virtio_pci_modern_mem_region_map(proxy, >group, );
+}
+
 virtio_pci_modern_mem_region_map(proxy, >common, );
 virtio_pci_modern_mem_region_map(proxy, >isr, );
 virtio_pci_modern_mem_region_map(proxy, >device, );
@@ -1763,6 +1773,10 @@ static void virtio_pci_realize(PCIDevice *pci_dev, Error 
**errp)
 proxy->device.size = 0x1000;
 proxy->device.type = VIRTIO_PCI_CAP_DEVICE_CFG;
 
+proxy->group.offset = 0;
+proxy->group.size = 0;
+proxy->group.type = VIRTIO_PCI_CAP_GROUP_ID_CFG;
+
 proxy->notify.offset = 0x3000;
 proxy->notify.size = virtio_pci_queue_mem_mult(proxy) * VIRTIO_QUEUE_MAX;
 proxy->notify.type = VIRTIO_PCI_CAP_NOTIFY_CFG;
@@ -1898,6 +1912,7 @@ static Property virtio_pci_properties[] = {
 VIRTIO_PCI_FLAG_INIT_LNKCTL_BIT, true),
 DEFINE_PROP_BIT("x-pcie-pm-init", VirtIOPCIProxy, flags,
 VIRTIO_PCI_FLAG_INIT_PM_BIT, true),
+DEFINE_PROP_UUID("uuid", PCIDevice, uuid, false),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/virtio/virtio-pci.h b/hw/virtio/virtio-pci.h
index 813082b0d7..e4592e90bf 100644
--- a/hw/virtio/virtio-pci.h
+++ b/hw/virtio/virtio-pci.h
@@ -164,10 +164,11 @@ struct VirtIOPCIProxy {
 VirtIOPCIRegion common;
 VirtIOPCIRegion isr;
 VirtIOPCIRegion device;
+VirtIOPCIRegion group;
 VirtIOPCIRegion notify;
 VirtIOPCIRegion notify_pio;
 };
-VirtIOPCIRegion regs[5];
+VirtIOPCIRegion regs[6];
 };
 MemoryRegion modern_bar;
 MemoryRegion io_bar;
diff --git a/include/standard-headers/linux/virtio_pci.h 
b/include/standard-headers/linux/virtio_pci.h
index 9262acd130..e5b6c6f3a6 100644
--- a/include/standard-headers/linux/virtio_pci.h
+++ b/include/standard-headers/linux/virtio_pci.h
@@ -113,6 +113,8 @@
 #define VIRTIO_PCI_CAP_DEVICE_CFG  4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG 5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG6
 
 /* This is the PCI capability header: */
 struct virtio_pci_cap {
@@ -163,6 +165,12 @@ struct virtio_pci_cfg_cap {
uint8_t pci_cfg_data[4]; /* Data for BAR access. */
 };
 
+/* Fields in VIRTIO_PCI_CAP_GROUP_ID_CFG: */
+struct virtio_pci_group_id_cap {
+   struct virtio_pci_cap cap;
+   uint8_t group_id[16];
+};
+
 /* Macro versions of offsets for the Old Timers! */
 #define VIRTIO_PCI_CAP_VNDR0
 #define VIRTIO_PCI_CAP_NEXT1

[Qemu-devel] [PATCH 2/3] Add "Group Identifier" support to PCIe bridges.

2018-06-19 Thread Venu Busireddy

Add a "Vendor-Specific" capability to the PCIe bridge, to contain the
"Group Identifier" (UUID) that will be used to pair a virtio device with
the passthrough device attached to that bridge.

This capability is added to the bridge iff the "uuid" option is specified
for the bridge device, via the qemu command line. Also, the bridge's
Device ID is changed to PCI_VENDOR_ID_REDHAT, and Vendor ID is changed
to PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE (from the default values), when the
"uuid" option is present.

Signed-off-by: Venu Busireddy 
---
 hw/pci-bridge/ioh3420.c|  2 ++
 hw/pci-bridge/pcie_root_port.c |  7 +++
 hw/pci/pci_bridge.c| 32 
 include/hw/pci/pci.h   |  2 ++
 include/hw/pci/pcie.h  |  1 +
 include/hw/pci/pcie_port.h |  1 +
 6 files changed, 45 insertions(+)

diff --git a/hw/pci-bridge/ioh3420.c b/hw/pci-bridge/ioh3420.c
index a451d74ee6..b6b9ebc726 100644
--- a/hw/pci-bridge/ioh3420.c
+++ b/hw/pci-bridge/ioh3420.c
@@ -35,6 +35,7 @@
 #define IOH_EP_MSI_SUPPORTED_FLAGS  PCI_MSI_FLAGS_MASKBIT
 #define IOH_EP_MSI_NR_VECTOR2
 #define IOH_EP_EXP_OFFSET   0x90
+#define IOH_EP_VENDOR_OFFSET0xCC
 #define IOH_EP_AER_OFFSET   0x100
 
 /*
@@ -111,6 +112,7 @@ static void ioh3420_class_init(ObjectClass *klass, void 
*data)
 rpc->exp_offset = IOH_EP_EXP_OFFSET;
 rpc->aer_offset = IOH_EP_AER_OFFSET;
 rpc->ssvid_offset = IOH_EP_SSVID_OFFSET;
+rpc->vendor_offset = IOH_EP_VENDOR_OFFSET;
 rpc->ssid = IOH_EP_SSVID_SSID;
 }
 
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 45f9e8cd4a..ba470c7fda 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -71,6 +71,12 @@ static void rp_realize(PCIDevice *d, Error **errp)
 goto err_bridge;
 }
 
+rc = pci_bridge_vendor_init(d, rpc->vendor_offset, errp);
+if (rc < 0) {
+error_append_hint(errp, "Can't init group ID, error %d\n", rc);
+goto err_bridge;
+}
+
 if (rpc->interrupts_init) {
 rc = rpc->interrupts_init(d, errp);
 if (rc < 0) {
@@ -137,6 +143,7 @@ static void rp_exit(PCIDevice *d)
 static Property rp_props[] = {
 DEFINE_PROP_BIT(COMPAT_PROP_PCP, PCIDevice, cap_present,
 QEMU_PCIE_SLTCAP_PCP_BITNR, true),
+DEFINE_PROP_UUID(COMPAT_PROP_UUID, PCIDevice, uuid, false),
 DEFINE_PROP_END_OF_LIST()
 };
 
diff --git a/hw/pci/pci_bridge.c b/hw/pci/pci_bridge.c
index 40a39f57cb..c63bc439f7 100644
--- a/hw/pci/pci_bridge.c
+++ b/hw/pci/pci_bridge.c
@@ -34,12 +34,17 @@
 #include "hw/pci/pci_bus.h"
 #include "qemu/range.h"
 #include "qapi/error.h"
+#include "qemu/uuid.h"
 
 /* PCI bridge subsystem vendor ID helper functions */
 #define PCI_SSVID_SIZEOF8
 #define PCI_SSVID_SVID  4
 #define PCI_SSVID_SSID  6
 
+#define PCI_VENDOR_SIZEOF 20
+#define PCI_VENDOR_CAP_LEN_OFFSET  2
+#define PCI_VENDOR_GROUP_ID_OFFSET 4
+
 int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
   uint16_t svid, uint16_t ssid,
   Error **errp)
@@ -57,6 +62,33 @@ int pci_bridge_ssvid_init(PCIDevice *dev, uint8_t offset,
 return pos;
 }
 
+int pci_bridge_vendor_init(PCIDevice *d, uint8_t offset, Error **errp)
+{
+int pos;
+PCIDeviceClass *dc = PCI_DEVICE_GET_CLASS(d);
+
+if (qemu_uuid_is_null(>uuid)) {
+return 0;
+}
+
+pos = pci_add_capability(d, PCI_CAP_ID_VNDR, offset, PCI_VENDOR_SIZEOF,
+errp);
+if (pos < 0) {
+return pos;
+}
+
+dc->vendor_id = PCI_VENDOR_ID_REDHAT;
+dc->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
+pci_set_word(d->config + PCI_VENDOR_ID, PCI_VENDOR_ID_REDHAT);
+pci_set_word(d->config + PCI_DEVICE_ID, PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE);
+
+pci_set_word(d->config + pos + PCI_VENDOR_CAP_LEN_OFFSET,
+PCI_VENDOR_SIZEOF);
+memcpy(d->config + offset + PCI_VENDOR_GROUP_ID_OFFSET, >uuid,
+sizeof(QemuUUID));
+return pos;
+}
+
 /* Accessor function to get parent bridge device from pci bus. */
 PCIDevice *pci_bridge_get_device(PCIBus *bus)
 {
diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index 990d6fcbde..ee234c5a6f 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -4,6 +4,7 @@
 #include "hw/qdev.h"
 #include "exec/memory.h"
 #include "sysemu/dma.h"
+#include "qemu/uuid.h"
 
 /* PCI includes legacy ISA access.  */
 #include "hw/isa/isa.h"
@@ -343,6 +344,7 @@ struct PCIDevice {
 bool has_rom;
 MemoryRegion rom;
 uint32_t rom_bar;
+QemuUUID uuid;
 
 /* INTx routing notifier */
 PCIINTxRoutingNotifier intx_routing_notifier;
diff --git a/include/hw/pci/

[Qemu-devel] [PATCH 0/3] Use of unique identifier for pairing virtio and passthrough devices...

2018-06-19 Thread Venu Busireddy

The patch set "Enable virtio_net to act as a standby for a passthru
device" [1] deals with live migration of guests that use passthrough
devices. However, that scheme uses the MAC address for pairing
the virtio device and the passthrough device. The thread "netvsc:
refactor notifier/event handling code to use the failover framework"
[2] discusses an alternate mechanism, such as using an UUID, for pairing
the devices. Based on that discussion, proposals "Add "Group Identifier"
to virtio PCI capabilities." [3] and "RFC: Use of bridge devices to
store pairing information..." [4] were made.

The current patch set includes all the feedback received for proposals [3]
and [4]. For the sake of completeness, patch for the virtio specification
is also included here. Following is the updated proposal.

1. Extend the virtio specification to include a new virtio PCI capability
   "VIRTIO_PCI_CAP_GROUP_ID_CFG".

2. Enhance the QEMU CLI to include a "uuid" option to the virtio device.
   The "uuid" is a string in UUID format.

3. Enhance the QEMU CLI to include a "uuid" option to the bridge device.
   The "uuid" is a string in UUID format. Currently, PCIe bridge for
   the Q35 model is supported.

4. The operator creates a unique identifier string using 'uuidgen'.

5. When the virtio device is created, the operator uses the "uuid" option
   (for example, '-device virtio-net-pci,uuid="string"') and specifies
   the UUID created in step 4.

   QEMU stores the UUID in the virtio device's configuration space
   in the capability "VIRTIO_PCI_CAP_GROUP_ID_CFG".

6. When assigning a PCI device to the guest in passthrough mode, the
   operator first creates a bridge using the "uuid" option (for example,
   '-device ioh3420,uuid="string"') to specify the UUID created in step 4,
   and then attaches the passthrough device to the bridge.

   QEMU stores the UUID in the configuration space of the bridge as
   Vendor-Specific capability (0x09). The "Vendor" here is not to be
   confused with a specific organization. Instead, the vendor of the
   bridge is QEMU. To avoid mixing up with other bridges, the bridge
   will be created with vendor ID 0x1b36 (PCI_VENDOR_ID_REDHAT) and
   device ID 0x000e (PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE) if the "uuid"
   option is specified. Otherwise, current defaults are used.

7. Patch 4 in patch series "Enable virtio_net to act as a standby for
   a passthru device" [1] needs to be modified to use the UUID values
   present in the bridge's configuration space and the virtio device's
   configuration space instead of the MAC address for pairing the devices.

Thanks!

Venu

[1] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00156.html
[2] https://www.spinics.net/lists/netdev/msg499011.html
[3] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00118.html
[4] https://lists.oasis-open.org/archives/virtio-dev/201805/msg00204.html


Venu Busireddy (3):
  Add a true or false option to the DEFINE_PROP_UUID macro.
  Add "Group Identifier" support to PCIe bridges.
  Add "Group Identifier" support to virtio devices.

 hw/acpi/vmgenid.c   |  2 +-
 hw/pci-bridge/ioh3420.c |  2 ++
 hw/pci-bridge/pcie_root_port.c  |  7 +++
 hw/pci/pci_bridge.c | 32 +
 hw/virtio/virtio-pci.c  | 15 ++
 hw/virtio/virtio-pci.h  |  3 ++-
 include/hw/pci/pci.h|  2 ++
 include/hw/pci/pcie.h   |  1 +
 include/hw/pci/pcie_port.h  |  1 +
 include/hw/qdev-properties.h|  4 ++--
 include/standard-headers/linux/virtio_pci.h |  8 
 11 files changed, 73 insertions(+), 4 deletions(-)

[Qemu-devel] [PATCH virtio 1/1] Add "Group Identifier" support to virtio PCI capabilities.

2018-06-19 Thread Venu Busireddy

Add VIRTIO_PCI_CAP_GROUP_ID_CFG (Group Identifier) capability to the
virtio PCI capabilities to allow for the grouping of devices.

Signed-off-by: Venu Busireddy 
---
 content.tex | 43 +++
 1 file changed, 43 insertions(+)

diff --git a/content.tex b/content.tex
index 7a92cb1..7ea6267 100644
--- a/content.tex
+++ b/content.tex
@@ -599,6 +599,8 @@ The fields are interpreted as follows:
 #define VIRTIO_PCI_CAP_DEVICE_CFG4
 /* PCI configuration access */
 #define VIRTIO_PCI_CAP_PCI_CFG   5
+/* Group Identifier */
+#define VIRTIO_PCI_CAP_GROUP_ID_CFG  6
 \end{lstlisting}
 
 Any other value is reserved for future use.
@@ -997,6 +999,47 @@ address \field{cap.length} bytes within a BAR range
 specified by some other Virtio Structure PCI Capability
 of type other than \field{VIRTIO_PCI_CAP_PCI_CFG}.
 
+\subsubsection{Group Identifier capability}\label{sec:Virtio Transport Options 
/ Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The VIRTIO_PCI_CAP_GROUP_ID_CFG capability provides means for grouping devices 
together.
+
+The capability is immediately followed by an identifier of arbitrary size as 
below:
+
+\begin{lstlisting}
+struct virtio_pci_group_id_cap {
+struct virtio_pci_cap cap;
+u8 group_id[]; /* Group Identifier */
+};
+\end{lstlisting}
+
+The fields \field{cap.bar}, \field{cap.length}, \field{cap.offset}
+and \field{group_id} are read-only for the driver.
+
+The specification does not impose any restrictions on the size or
+structure of group_id[]. Vendors are free to declare this array as
+large as needed, as long as the combined size of all capabilities can
+be accommodated within the PCI configuration space.
+
+If there is enough room in the PCI configuration space to accommodate
+the group identifier, the fields \field{cap.bar}, \field{cap.offset}
+and \field{cap.length} should be set to 0.
+
+If there isn't enough room, some or all of the group identifier can be
+presented in the BAR region, in which case the fields \field{cap.bar},
+\field{cap.offset} and \field{cap.length} should be set appropriately.
+
+In either case, the field \field{cap.cap_len} indicates the length of
+the group identifier information present in the configuration space
+itself.
+
+\devicenormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The device MAY present the VIRTIO_PCI_CAP_GROUP_ID_CFG capability.
+
+\drivernormative{\paragraph}{Group Identifier capability}{Virtio Transport 
Options / Virtio Over PCI Bus / PCI Device Layout / Group Identifier capability}
+
+The driver MUST NOT write to group_id[] area or the BAR region.
+
 \subsubsection{Legacy Interfaces: A Note on PCI Device 
Layout}\label{sec:Virtio Transport Options / Virtio Over PCI Bus / PCI Device 
Layout / Legacy Interfaces: A Note on PCI Device Layout}
 
 Transitional devices MUST present part of configuration

75 matches

Mail list logo