Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
I only recently saw this email. On Thu, Jun 06, 2013 at 06:10:12PM +0300, Gleb Natapov wrote: > On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote: > > For seabios itself this isn't a big issue, see pci_{readl,writel} in > > src/pci.c. When called in 16bit mode it goes into 32bit mode > > temporarily, just for accessing the mmio register. ahci driver uses it, > > xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers > > in seabios can do the same. > > > Isn't this approach broken? How can SeaBIOS be sure it restores real > mode registers to exactly same state they were before entering 32bit > mode? You are correct - SeaBIOS can't fully restore the "hidden" segment registers. So, in a way it is broken. In practice, it seems to work on modern bootloaders (eg, ntldr, grub, lilo). It definitely doesn't work with EMM386 (old DOS stuff), but does seem to work okay with FreeDOS as long as one doesn't run EMM386. The AHCI code uses this 32bit/16bit trampoline because it would not be possible to support AHCI otherwise. I haven't seen any complaints of failures with the AHCI code - probably because people using AHCI are using modern guests. I explored this a bit some time back and the only way I could think of to reliably restore the 16bit registers would be via SMM. Unfortunately, using SMM introduces a whole host of complexity and problems. -Kevin ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" writes: > > On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: > >> "Michael S. Tsirkin" writes: > >> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: > >> >> Rusty Russell writes: > >> >> > >> >> > Anthony Liguori writes: > >> >> >> Forcing a guest driver change is a really big > >> >> >> deal and I see no reason to do that unless there's a compelling > >> >> >> reason > >> >> >> to. > >> >> >> > >> >> >> So we're stuck with the 1.0 config layout for a very long time. > >> >> > > >> >> > We definitely must not force a guest change. The explicit aim of the > >> >> > standard is that "legacy" and 1.0 be backward compatible. One > >> >> > deliverable is a document detailing how this is done (effectively a > >> >> > summary of changes between what we have and 1.0). > >> >> > >> >> If 2.0 is fully backwards compatible, great. It seems like such a > >> >> difference that that would be impossible but I need to investigate > >> >> further. > >> >> > >> >> Regards, > >> >> > >> >> Anthony Liguori > >> > > >> > If you look at my patches you'll see how it works. > >> > Basically old guests use BAR0 new ones don't, so > >> > it's easy: BAR0 access means legacy guest. > >> > Only started testing but things seem to work > >> > fine with old guests so far. > >> > > >> > I think we need a spec, not just driver code. > >> > > >> > Rusty what's the plan? Want me to write it? > >> > >> We need both, of course, but the spec work will happen in the OASIS WG. > >> A draft is good, but let's not commit anything to upstream QEMU until we > >> get the spec finalized. And that is proposed to be late this year. > > > > Well that would be quite sad really. > > > > This means we can't make virtio a spec compliant pci express device, > > and we can't add any more feature bits, so no > > flexible buffer optimizations for virtio net. > > > > There are probably more projects that will be blocked. > > > > So how about we keep extending legacy layout for a bit longer: > > - add a way to access device with MMIO > > - use feature bit 31 to signal 64 bit features > > (and shift device config accordingly) > > By my count, net still has 7 feature bits left, so I don't think the > feature bits are likely to be a limitation in the next 6 months? Yes but you wanted a generic transport feature bit for flexible SG layout. Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then? > MMIO is a bigger problem. Linux guests are happy with it: does it break > the Windows drivers? > > Thanks, > Rusty. You mean make BAR0 an MMIO BAR? Yes, it would break current windows guests. Further, as long as we use same address to notify all queues, we would also need to decode the instruction on x86 and that's measureably slower than PIO. We could go back to discussing hypercall use for notifications, but that has its own set of issues... -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: >> "Michael S. Tsirkin" writes: >> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: >> >> Rusty Russell writes: >> >> >> >> > Anthony Liguori writes: >> >> >> Forcing a guest driver change is a really big >> >> >> deal and I see no reason to do that unless there's a compelling reason >> >> >> to. >> >> >> >> >> >> So we're stuck with the 1.0 config layout for a very long time. >> >> > >> >> > We definitely must not force a guest change. The explicit aim of the >> >> > standard is that "legacy" and 1.0 be backward compatible. One >> >> > deliverable is a document detailing how this is done (effectively a >> >> > summary of changes between what we have and 1.0). >> >> >> >> If 2.0 is fully backwards compatible, great. It seems like such a >> >> difference that that would be impossible but I need to investigate >> >> further. >> >> >> >> Regards, >> >> >> >> Anthony Liguori >> > >> > If you look at my patches you'll see how it works. >> > Basically old guests use BAR0 new ones don't, so >> > it's easy: BAR0 access means legacy guest. >> > Only started testing but things seem to work >> > fine with old guests so far. >> > >> > I think we need a spec, not just driver code. >> > >> > Rusty what's the plan? Want me to write it? >> >> We need both, of course, but the spec work will happen in the OASIS WG. >> A draft is good, but let's not commit anything to upstream QEMU until we >> get the spec finalized. And that is proposed to be late this year. > > Well that would be quite sad really. > > This means we can't make virtio a spec compliant pci express device, > and we can't add any more feature bits, so no > flexible buffer optimizations for virtio net. > > There are probably more projects that will be blocked. > > So how about we keep extending legacy layout for a bit longer: > - add a way to access device with MMIO > - use feature bit 31 to signal 64 bit features > (and shift device config accordingly) By my count, net still has 7 feature bits left, so I don't think the feature bits are likely to be a limitation in the next 6 months? MMIO is a bigger problem. Linux guests are happy with it: does it break the Windows drivers? Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote: > "Michael S. Tsirkin" writes: > > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: > >> Rusty Russell writes: > >> > >> > Anthony Liguori writes: > >> >> Forcing a guest driver change is a really big > >> >> deal and I see no reason to do that unless there's a compelling reason > >> >> to. > >> >> > >> >> So we're stuck with the 1.0 config layout for a very long time. > >> > > >> > We definitely must not force a guest change. The explicit aim of the > >> > standard is that "legacy" and 1.0 be backward compatible. One > >> > deliverable is a document detailing how this is done (effectively a > >> > summary of changes between what we have and 1.0). > >> > >> If 2.0 is fully backwards compatible, great. It seems like such a > >> difference that that would be impossible but I need to investigate > >> further. > >> > >> Regards, > >> > >> Anthony Liguori > > > > If you look at my patches you'll see how it works. > > Basically old guests use BAR0 new ones don't, so > > it's easy: BAR0 access means legacy guest. > > Only started testing but things seem to work > > fine with old guests so far. > > > > I think we need a spec, not just driver code. > > > > Rusty what's the plan? Want me to write it? > > We need both, of course, but the spec work will happen in the OASIS WG. > A draft is good, but let's not commit anything to upstream QEMU until we > get the spec finalized. And that is proposed to be late this year. Well that would be quite sad really. This means we can't make virtio a spec compliant pci express device, and we can't add any more feature bits, so no flexible buffer optimizations for virtio net. There are probably more projects that will be blocked. So how about we keep extending legacy layout for a bit longer: - add a way to access device with MMIO - use feature bit 31 to signal 64 bit features (and shift device config accordingly) No endian-ness rework, no per queue enable etc. Then when we start discussions we will have a working express and working 64 bit feature support, and it will be that much easier to make it pretty it. > Since I'm going to have to reformat the spec and adapt it into OASIS > style anyway, perhaps you should prepare a description as a standalone > text document. Easier to email and work with... > > Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we > call them; I used the term "legacy" for existing implementations in the > OASIS WG proposal), you add capabilities and don't point them into (the > start of?) BAR0. Old drivers use BAR0 as now. > > One trick to note: while drivers shouldn't use both old and new style on > the same device, you need to allow it for kexec, particularly reset via > BAR0. > > Cheers, > Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: >> Rusty Russell writes: >> >> > Anthony Liguori writes: >> >> Forcing a guest driver change is a really big >> >> deal and I see no reason to do that unless there's a compelling reason >> >> to. >> >> >> >> So we're stuck with the 1.0 config layout for a very long time. >> > >> > We definitely must not force a guest change. The explicit aim of the >> > standard is that "legacy" and 1.0 be backward compatible. One >> > deliverable is a document detailing how this is done (effectively a >> > summary of changes between what we have and 1.0). >> >> If 2.0 is fully backwards compatible, great. It seems like such a >> difference that that would be impossible but I need to investigate >> further. >> >> Regards, >> >> Anthony Liguori > > If you look at my patches you'll see how it works. > Basically old guests use BAR0 new ones don't, so > it's easy: BAR0 access means legacy guest. > Only started testing but things seem to work > fine with old guests so far. > > I think we need a spec, not just driver code. > > Rusty what's the plan? Want me to write it? We need both, of course, but the spec work will happen in the OASIS WG. A draft is good, but let's not commit anything to upstream QEMU until we get the spec finalized. And that is proposed to be late this year. Since I'm going to have to reformat the spec and adapt it into OASIS style anyway, perhaps you should prepare a description as a standalone text document. Easier to email and work with... Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we call them; I used the term "legacy" for existing implementations in the OASIS WG proposal), you add capabilities and don't point them into (the start of?) BAR0. Old drivers use BAR0 as now. One trick to note: while drivers shouldn't use both old and new style on the same device, you need to allow it for kexec, particularly reset via BAR0. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote: >> >> Yet the structure definitions are descriptive, capturing layout, size >> >> and endianness in natural a format readable by any C programmer. >> > >> >>From an API design point of view, here are the problems I see: >> > >> > 1) C makes no guarantees about structure layout beyond the first >> >member. Yes, if it's naturally aligned or has a packed attribute, >> >GCC does the right thing. But this isn't kernel land anymore, >> >portability matters and there are more compilers than GCC. >> >> [ I argued in detail here, then deleted. Damn bikeshedding... ] >> >> I think the best thing is to add the decoded integer versions as macros, >> and have a heap of BUILD_BUG_ON() in the kernel source to make sure they >> match. > > Hmm you want to have both in the header? > > That would confuse users for sure :) > > I guess we could put in a big comment explaning > why do we have both, and which version should > userspace use. I tried to write it: > > /* > * On all known C compilers, you can use the > * offsetof macro to find the correct offset of each field - > * if your compiler doesn't implement this correctly, use the > * integer versions below, instead. > * In that case please don't use the structures above, to avoid confusion. > */ My version was a little less verbose :) Subject: virtio_pci: macros for PCI layout offsets. QEMU wants it, so why not? Trust, but verify. Signed-off-by: Rusty Russell diff --git a/drivers/virtio/virtio_pci_modern.c b/drivers/virtio/virtio_pci_modern.c index 24bcd9b..6510009 100644 --- a/drivers/virtio/virtio_pci_modern.c +++ b/drivers/virtio/virtio_pci_modern.c @@ -453,6 +453,64 @@ static void __iomem *map_capability(struct pci_dev *dev, int off, size_t minlen, return p; } +/* This is part of the ABI. Don't screw with it. */ +static inline void check_offsets(void) +{ + /* Note: disk space was harmed in compilation of this function. */ + BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR != +offsetof(struct virtio_pci_cap, cap_vndr)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT != +offsetof(struct virtio_pci_cap, cap_next)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN != +offsetof(struct virtio_pci_cap, cap_len)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_TYPE_AND_BAR != +offsetof(struct virtio_pci_cap, type_and_bar)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET != +offsetof(struct virtio_pci_cap, offset)); + BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH != +offsetof(struct virtio_pci_cap, length)); + BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT != +offsetof(struct virtio_pci_notify_cap, + notify_off_multiplier)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT != +offsetof(struct virtio_pci_common_cfg, + device_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF != +offsetof(struct virtio_pci_common_cfg, device_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT != +offsetof(struct virtio_pci_common_cfg, + guest_feature_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF != +offsetof(struct virtio_pci_common_cfg, guest_feature)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX != +offsetof(struct virtio_pci_common_cfg, msix_config)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ != +offsetof(struct virtio_pci_common_cfg, num_queues)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS != +offsetof(struct virtio_pci_common_cfg, device_status)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT != +offsetof(struct virtio_pci_common_cfg, queue_select)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE != +offsetof(struct virtio_pci_common_cfg, queue_size)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX != +offsetof(struct virtio_pci_common_cfg, queue_msix_vector)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE != +offsetof(struct virtio_pci_common_cfg, queue_enable)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF != +offsetof(struct virtio_pci_common_cfg, queue_notify_off)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO != +offsetof(struct virtio_pci_common_cfg, queue_desc_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI != +offsetof(struct virtio_pci_common_cfg, queue_desc_hi)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO != +offsetof(struct virtio_pci_common_cfg, queue_avail_lo)); + BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI != +offsetof(struct virtio_pci_common_cfg, queue_avail_hi)); + BUILD_BUG_O
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote: > Rusty Russell writes: > > > Anthony Liguori writes: > >> Forcing a guest driver change is a really big > >> deal and I see no reason to do that unless there's a compelling reason > >> to. > >> > >> So we're stuck with the 1.0 config layout for a very long time. > > > > We definitely must not force a guest change. The explicit aim of the > > standard is that "legacy" and 1.0 be backward compatible. One > > deliverable is a document detailing how this is done (effectively a > > summary of changes between what we have and 1.0). > > If 2.0 is fully backwards compatible, great. It seems like such a > difference that that would be impossible but I need to investigate > further. > > Regards, > > Anthony Liguori If you look at my patches you'll see how it works. Basically old guests use BAR0 new ones don't, so it's easy: BAR0 access means legacy guest. Only started testing but things seem to work fine with old guests so far. I think we need a spec, not just driver code. Rusty what's the plan? Want me to write it? > > > > It's a delicate balancing act. My plan is to accompany any changes in > > the standard with a qemu implementation, so we can see how painful those > > changes are. And if there are performance implications, measure them. > > > > Cheers, > > Rusty. > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell writes: > Anthony Liguori writes: >> Forcing a guest driver change is a really big >> deal and I see no reason to do that unless there's a compelling reason >> to. >> >> So we're stuck with the 1.0 config layout for a very long time. > > We definitely must not force a guest change. The explicit aim of the > standard is that "legacy" and 1.0 be backward compatible. One > deliverable is a document detailing how this is done (effectively a > summary of changes between what we have and 1.0). If 2.0 is fully backwards compatible, great. It seems like such a difference that that would be impossible but I need to investigate further. Regards, Anthony Liguori > > It's a delicate balancing act. My plan is to accompany any changes in > the standard with a qemu implementation, so we can see how painful those > changes are. And if there are performance implications, measure them. > > Cheers, > Rusty. > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote: > >> Yet the structure definitions are descriptive, capturing layout, size > >> and endianness in natural a format readable by any C programmer. > > > >>From an API design point of view, here are the problems I see: > > > > 1) C makes no guarantees about structure layout beyond the first > >member. Yes, if it's naturally aligned or has a packed attribute, > >GCC does the right thing. But this isn't kernel land anymore, > >portability matters and there are more compilers than GCC. > > [ I argued in detail here, then deleted. Damn bikeshedding... ] > > I think the best thing is to add the decoded integer versions as macros, > and have a heap of BUILD_BUG_ON() in the kernel source to make sure they > match. Hmm you want to have both in the header? That would confuse users for sure :) I guess we could put in a big comment explaning why do we have both, and which version should userspace use. I tried to write it: /* * On all known C compilers, you can use the * offsetof macro to find the correct offset of each field - * if your compiler doesn't implement this correctly, use the * integer versions below, instead. * In that case please don't use the structures above, to avoid confusion. */ -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote: > Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto: > > You expect a compiler to pad this structure: > > > > struct foo { > > uint8_t a; > > uint8_t b; > > uint16_t c; > > uint32_t d; > > }; > > > > I'm guessing any compiler that decides to waste memory in this way > > will quickly get dropped by users and then we won't worry > > about building QEMU with it. > > You know the virtio-pci config structures are padded, but not all of > them are. For example, virtio_balloon_stat is not padded and indeed has > an __attribute__((__packed__)) in the spec. > > For this reason I prefer to have the attribute everywhere. So people > don't have to wonder why it's here and not there. > > Paolo BTW we don't even do this consistently everywhere in QEMU. It would have been better to have a rule to avoid packed as much as possible, then we'd have found the misaligned field bug in balloon before it's too late. That would be a good rule to adopt I think: any pack if something is misaligned, and document the reason. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote: > Anthony Liguori writes: > > Rusty Russell writes: > > > >> Anthony Liguori writes: > >>> "Michael S. Tsirkin" writes: > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > +return proxy->device_feature_select; > >>> > >>> Oh dear no... Please use defines like the rest of QEMU. > >> > >> It is pretty ugly. > > > > I think beauty is in the eye of the beholder here... > > > > Pretty much every device we have has a switch statement like this. > > Consistency wins when it comes to qualitative arguments like this. > > I was agreeing with you here, actually. > > >> Yet the structure definitions are descriptive, capturing layout, size > >> and endianness in natural a format readable by any C programmer. > > > >>From an API design point of view, here are the problems I see: > > > > 1) C makes no guarantees about structure layout beyond the first > >member. Yes, if it's naturally aligned or has a packed attribute, > >GCC does the right thing. But this isn't kernel land anymore, > >portability matters and there are more compilers than GCC. > > [ I argued in detail here, then deleted. Damn bikeshedding... ] > > I think the best thing is to add the decoded integer versions as macros, > and have a heap of BUILD_BUG_ON() in the kernel source to make sure they > match. On the qemu side, fine with me, all I want is ability to stay close to kernel headers. Let's have XXX_SIZE macros as well, so access size can be easily validated? On the kernel side, for 'struct virtio_pci_cap', since we only ever do offsetof on this struct, it might be easier to just use the numeric constants directly. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori writes: > Rusty Russell writes: > >> Anthony Liguori writes: >>> "Michael S. Tsirkin" writes: +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy->device_feature_select; >>> >>> Oh dear no... Please use defines like the rest of QEMU. >> >> It is pretty ugly. > > I think beauty is in the eye of the beholder here... > > Pretty much every device we have has a switch statement like this. > Consistency wins when it comes to qualitative arguments like this. I was agreeing with you here, actually. >> Yet the structure definitions are descriptive, capturing layout, size >> and endianness in natural a format readable by any C programmer. > >>From an API design point of view, here are the problems I see: > > 1) C makes no guarantees about structure layout beyond the first >member. Yes, if it's naturally aligned or has a packed attribute, >GCC does the right thing. But this isn't kernel land anymore, >portability matters and there are more compilers than GCC. [ I argued in detail here, then deleted. Damn bikeshedding... ] I think the best thing is to add the decoded integer versions as macros, and have a heap of BUILD_BUG_ON() in the kernel source to make sure they match. > 3) It suspect it's harder to review because a subtle change could more >easily have broad impact. If someone changed the type of a field >from u32 to u16, it changes the offset of every other field. That's >not terribly obvious in the patch itself unless you understand how >the structure is used elsewhere. MST's patch just did this, so point taken. (MST: I'm going to combine the cfg_type and bar bytes to fix this, patch coming). >> No change, but there's an open question on whether we should nail it to >> little endian (or define the endian by the transport). >> >> Of course, I can't rule out that the 1.0 standard *may* decide to frob >> the ring layout somehow, > > Well, given that virtio is widely deployed today, I would think the 1.0 > standard should strictly reflect what's deployed today, no? That will be up to the committee. I think we want to fix some obvious pain points, though qemu will not benefit from them in the next 5 years. > Any new config layout would be 2.0 material, right? > > Re: the new config layout, I don't think we would want to use it for > anything but new devices. There are enough concrete reasons that I think we want it for existing devices: 1) Negotiated ring size/alignment. Coreboot wants smaller, others want larger. 2) Remove assertion that it has to be an I/O bar. PowerPC wants this. 3) Notification location flexibility. MST wanted this for performance. 4) More feature bits. > Forcing a guest driver change is a really big > deal and I see no reason to do that unless there's a compelling reason > to. > > So we're stuck with the 1.0 config layout for a very long time. We definitely must not force a guest change. The explicit aim of the standard is that "legacy" and 1.0 be backward compatible. One deliverable is a document detailing how this is done (effectively a summary of changes between what we have and 1.0). It's a delicate balancing act. My plan is to accompany any changes in the standard with a qemu implementation, so we can see how painful those changes are. And if there are performance implications, measure them. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote: > > Not we. The BIOS can disable IO BAR: it can do this already > > but the device won't be functional. > > But the only way to expose the device over PCI express is to disable the > IO BAR, right? I think this is the source of the misunderstanding: 1.3.2.2. PCI Express Endpoint Rules ... PCI Express Endpoint must not depend on operating system allocation of I/O resources claimed through BAR(s). The real meaning here is *not* that an Endpoint should not have an I/O BAR. An Endpoint can have an I/O BAR, and PCI Express spec supports I/O transactions. The meaning is that an Endpoint should work even if *the OS* decides to disable the I/O BAR. Note: it's up to the guest. We support I/O as a device with full compatibility for old guests, but must be prepared to handle a guest which does not enable I/O. This likely means that as time goes on, future OSes will start disabling I/O BARs, relying on this not hurting functionality. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote: > "Michael S. Tsirkin" writes: > > > On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote: > >> "Michael S. Tsirkin" writes: > >> > I'm guessing any compiler that decides to waste memory in this way > >> > will quickly get dropped by users and then we won't worry > >> > about building QEMU with it. > >> > >> There are other responses in the thread here and I don't really care to > >> bikeshed on this issue. > > > > Great. Let's make the bikeshed blue then? > > It's fun to argue about stuff like this and I certainly have an opinion, > but I honestly don't care all that much about the offsetof thing. > However... > > > > > >> >> Well, given that virtio is widely deployed today, I would think the 1.0 > >> >> standard should strictly reflect what's deployed today, no? > >> >> Any new config layout would be 2.0 material, right? > >> > > >> > Not as it's currently planned. Devices can choose > >> > to support a legacy layout in addition to the new one, > >> > and if you look at the patch you will see that that > >> > is exactly what it does. > >> > >> Adding a new BAR most certainly requires bumping the revision ID or > >> changing the device ID, no? > > > > No, why would it? > > If we change the programming interface for a device in a way that is > incompatible, we are required to change the revision ID and/or device > ID. It's compatible. > > If a device dropped BAR0, that would be a good reason > > to bump revision ID. > > We don't do this yet. > > But we have to drop BAR0 to put it behind a PCI express bus, right? No, no. The PCIe spec states that failure to allocate an I/O BAR should still allow the device to function. That's *guest* allocating an I/O BAR. > If that's the case, then device that's exposed on the PCI express bus > must use a different device ID and/or revision ID. > > That means a new driver is needed in the guest. > > >> Didn't we run into this problem with the virtio-win drivers with just > >> the BAR size changing? > > > > Because they had a bug: they validated BAR0 size. AFAIK they don't care > > what happens with other bars. > > I think there's a grey area with respect to the assumptions a device can > make about the programming interface. > > But very concretely, we cannot expose virtio-pci-net via PCI express > with BAR0 disabled because that will result in existing virtio-pci Linux > drivers breaking. > > > Not we. The BIOS can disable IO BAR: it can do this already > > but the device won't be functional. > > But the only way to expose the device over PCI express is to disable the > IO BAR, right? > > Regards, > > Anthony Liguori No :) ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote: >> "Michael S. Tsirkin" writes: >> > I'm guessing any compiler that decides to waste memory in this way >> > will quickly get dropped by users and then we won't worry >> > about building QEMU with it. >> >> There are other responses in the thread here and I don't really care to >> bikeshed on this issue. > > Great. Let's make the bikeshed blue then? It's fun to argue about stuff like this and I certainly have an opinion, but I honestly don't care all that much about the offsetof thing. However... > >> >> Well, given that virtio is widely deployed today, I would think the 1.0 >> >> standard should strictly reflect what's deployed today, no? >> >> Any new config layout would be 2.0 material, right? >> > >> > Not as it's currently planned. Devices can choose >> > to support a legacy layout in addition to the new one, >> > and if you look at the patch you will see that that >> > is exactly what it does. >> >> Adding a new BAR most certainly requires bumping the revision ID or >> changing the device ID, no? > > No, why would it? If we change the programming interface for a device in a way that is incompatible, we are required to change the revision ID and/or device ID. > If a device dropped BAR0, that would be a good reason > to bump revision ID. > We don't do this yet. But we have to drop BAR0 to put it behind a PCI express bus, right? If that's the case, then device that's exposed on the PCI express bus must use a different device ID and/or revision ID. That means a new driver is needed in the guest. >> Didn't we run into this problem with the virtio-win drivers with just >> the BAR size changing? > > Because they had a bug: they validated BAR0 size. AFAIK they don't care > what happens with other bars. I think there's a grey area with respect to the assumptions a device can make about the programming interface. But very concretely, we cannot expose virtio-pci-net via PCI express with BAR0 disabled because that will result in existing virtio-pci Linux drivers breaking. > Not we. The BIOS can disable IO BAR: it can do this already > but the device won't be functional. But the only way to expose the device over PCI express is to disable the IO BAR, right? Regards, Anthony Liguori ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 04:32:31PM +0200, Paolo Bonzini wrote: > Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto: > >> > There are other responses in the thread here and I don't really care to > >> > bikeshed on this issue. > > Great. Let's make the bikeshed blue then? > > Yes, let's make it blue, but please promise to check into Peter's > register API so we can remove the case offsetof. I think it's overkill here. virtio was designed to be easy to implement with a simple switch statement. > I checked that it works on RHEL5, which is 4.1 and probably the oldest > compiler we care about (any other 4.1 lacks the __sync builtins; > upstream added them in 4.2). > > Paolo Are you sure? Documentation says 4.0 ... -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto: >> > There are other responses in the thread here and I don't really care to >> > bikeshed on this issue. > Great. Let's make the bikeshed blue then? Yes, let's make it blue, but please promise to check into Peter's register API so we can remove the case offsetof. I checked that it works on RHEL5, which is 4.1 and probably the oldest compiler we care about (any other 4.1 lacks the __sync builtins; upstream added them in 4.2). Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote: > "Michael S. Tsirkin" writes: > > > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote: > >> 1) C makes no guarantees about structure layout beyond the first > >>member. Yes, if it's naturally aligned or has a packed attribute, > >>GCC does the right thing. But this isn't kernel land anymore, > >>portability matters and there are more compilers than GCC. > > > > You expect a compiler to pad this structure: > > > > struct foo { > > uint8_t a; > > uint8_t b; > > uint16_t c; > > uint32_t d; > > }; > > > > I'm guessing any compiler that decides to waste memory in this way > > will quickly get dropped by users and then we won't worry > > about building QEMU with it. > > There are other responses in the thread here and I don't really care to > bikeshed on this issue. Great. Let's make the bikeshed blue then? > >> Well, given that virtio is widely deployed today, I would think the 1.0 > >> standard should strictly reflect what's deployed today, no? > >> Any new config layout would be 2.0 material, right? > > > > Not as it's currently planned. Devices can choose > > to support a legacy layout in addition to the new one, > > and if you look at the patch you will see that that > > is exactly what it does. > > Adding a new BAR most certainly requires bumping the revision ID or > changing the device ID, no? No, why would it? If a device dropped BAR0, that would be a good reason to bump revision ID. We don't do this yet. > Didn't we run into this problem with the virtio-win drivers with just > the BAR size changing? Because they had a bug: they validated BAR0 size. AFAIK they don't care what happens with other bars. > >> Re: the new config layout, I don't think we would want to use it for > >> anything but new devices. Forcing a guest driver change > > > > There's no forcing. > > If you look at the patches closely, you will see that > > we still support the old layout on BAR0. > > > > > >> is a really big > >> deal and I see no reason to do that unless there's a compelling reason > >> to. > > > > There are many a compelling reasons, and they are well known > > limitations of virtio PCI: > > > > - PCI spec compliance (madates device operation with IO memory > > disabled). > > PCI express spec. We are fully compliant with the PCI spec. And what's > the user visible advantage of pointing an emulated virtio device behind > a PCI-e bus verses a legacy PCI bus? Native hotplug support. > This is a very good example because if we have to disable BAR0, then > it's an ABI breaker plan and simple. Not we. The BIOS can disable IO BAR: it can do this already but the device won't be functional. > > - support 64 bit addressing > > We currently support 44-bit addressing for the ring. While I agree we > need to bump it, there's no immediate problem with 44-bit addressing. I heard developers (though not users) complaining. > > - add more than 32 feature bits. > > - individually disable queues. > > - sanely support cross-endian systems. > > - support very small (<1 PAGE) for virtio rings. > > - support a separate page for each vq kick. > > - make each device place config at flexible offset. > > None of these things are holding us back today. All of them do, to bigger or lesser degree. > I'm not saying we shouldn't introduce a new device. But adoption of > that device will be slow and realistically will be limited to new > devices only. > > We'll be supporting both devices for a very, very long time. This is true for any new feature. What are you trying to say here? We won't add new features to old config: for once, we have run out of feature bits. > Compatibility is the fundamental value that we provide. We need to go > out of our way to make sure that existing guests work and work as well > as possible. What are you trying to say? There's nothing here that breaks compatibility. Have you looked at the patch? I'm wasting my time arguing on the mailing list, but once I tear myself away from this occupation, I intend to verify that I can run an old guest on qemu with this patch without issues. > Sticking virtio devices behind a PCI-e bus just for the hell of it isn't > a compelling reason to break existing guests. > > Regards, > > Anthony Liguori That's why my patch does not break existing guests. > > > Addressing any one of these would cause us to add a substantially new > > way to operate virtio devices. > > > > And since it's a guest change anyway, it seemed like a > > good time to do the new layout and fix everything in one go. > > > > And they are needed like yesterday. > > > > > >> So we're stuck with the 1.0 config layout for a very long time. > >> > >> Regards, > >> > >> Anthony Liguori > > > > Absolutely. This patch let us support both which will allow for > > a gradual transition over the next 10 years or so. > > > >> > reason. I suggest that's 2.0 material... > >> > > >> > Cheer
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Paolo Bonzini writes: > Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto: >> You expect a compiler to pad this structure: >> >> struct foo { >> uint8_t a; >> uint8_t b; >> uint16_t c; >> uint32_t d; >> }; >> >> I'm guessing any compiler that decides to waste memory in this way >> will quickly get dropped by users and then we won't worry >> about building QEMU with it. > > You know the virtio-pci config structures are padded, but not all of > them are. For example, virtio_balloon_stat is not padded and indeed has > an __attribute__((__packed__)) in the spec. Not that these structures are actually used for something. We store the config in these structures so they are actually used for something. The proposed structures only serve as a way to express offsets. You would never actually have a variable of this type. Regards, Anthony Liguori > > For this reason I prefer to have the attribute everywhere. So people > don't have to wonder why it's here and not there. > > Paolo > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote: >> 1) C makes no guarantees about structure layout beyond the first >>member. Yes, if it's naturally aligned or has a packed attribute, >>GCC does the right thing. But this isn't kernel land anymore, >>portability matters and there are more compilers than GCC. > > You expect a compiler to pad this structure: > > struct foo { > uint8_t a; > uint8_t b; > uint16_t c; > uint32_t d; > }; > > I'm guessing any compiler that decides to waste memory in this way > will quickly get dropped by users and then we won't worry > about building QEMU with it. There are other responses in the thread here and I don't really care to bikeshed on this issue. >> Well, given that virtio is widely deployed today, I would think the 1.0 >> standard should strictly reflect what's deployed today, no? >> Any new config layout would be 2.0 material, right? > > Not as it's currently planned. Devices can choose > to support a legacy layout in addition to the new one, > and if you look at the patch you will see that that > is exactly what it does. Adding a new BAR most certainly requires bumping the revision ID or changing the device ID, no? Didn't we run into this problem with the virtio-win drivers with just the BAR size changing? >> Re: the new config layout, I don't think we would want to use it for >> anything but new devices. Forcing a guest driver change > > There's no forcing. > If you look at the patches closely, you will see that > we still support the old layout on BAR0. > > >> is a really big >> deal and I see no reason to do that unless there's a compelling reason >> to. > > There are many a compelling reasons, and they are well known > limitations of virtio PCI: > > - PCI spec compliance (madates device operation with IO memory > disabled). PCI express spec. We are fully compliant with the PCI spec. And what's the user visible advantage of pointing an emulated virtio device behind a PCI-e bus verses a legacy PCI bus? This is a very good example because if we have to disable BAR0, then it's an ABI breaker plan and simple. > - support 64 bit addressing We currently support 44-bit addressing for the ring. While I agree we need to bump it, there's no immediate problem with 44-bit addressing. > - add more than 32 feature bits. > - individually disable queues. > - sanely support cross-endian systems. > - support very small (<1 PAGE) for virtio rings. > - support a separate page for each vq kick. > - make each device place config at flexible offset. None of these things are holding us back today. I'm not saying we shouldn't introduce a new device. But adoption of that device will be slow and realistically will be limited to new devices only. We'll be supporting both devices for a very, very long time. Compatibility is the fundamental value that we provide. We need to go out of our way to make sure that existing guests work and work as well as possible. Sticking virtio devices behind a PCI-e bus just for the hell of it isn't a compelling reason to break existing guests. Regards, Anthony Liguori > Addressing any one of these would cause us to add a substantially new > way to operate virtio devices. > > And since it's a guest change anyway, it seemed like a > good time to do the new layout and fix everything in one go. > > And they are needed like yesterday. > > >> So we're stuck with the 1.0 config layout for a very long time. >> >> Regards, >> >> Anthony Liguori > > Absolutely. This patch let us support both which will allow for > a gradual transition over the next 10 years or so. > >> > reason. I suggest that's 2.0 material... >> > >> > Cheers, >> > Rusty. >> > >> > -- >> > To unsubscribe from this list: send the line "unsubscribe kvm" in >> > the body of a message to majord...@vger.kernel.org >> > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote: > Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto: > > You expect a compiler to pad this structure: > > > > struct foo { > > uint8_t a; > > uint8_t b; > > uint16_t c; > > uint32_t d; > > }; > > > > I'm guessing any compiler that decides to waste memory in this way > > will quickly get dropped by users and then we won't worry > > about building QEMU with it. > > You know the virtio-pci config structures are padded, but not all of > them are. For example, virtio_balloon_stat is not padded and indeed has > an __attribute__((__packed__)) in the spec. > > For this reason I prefer to have the attribute everywhere. So people > don't have to wonder why it's here and not there. > > Paolo FWIW I think it was a mistake to lay the balloon structure out like that. We should have padded it manually. __attribute__((__packed__)) is really easy to misuse. If you get into such a situation, just use offset enums ... -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto: > You expect a compiler to pad this structure: > > struct foo { > uint8_t a; > uint8_t b; > uint16_t c; > uint32_t d; > }; > > I'm guessing any compiler that decides to waste memory in this way > will quickly get dropped by users and then we won't worry > about building QEMU with it. You know the virtio-pci config structures are padded, but not all of them are. For example, virtio_balloon_stat is not padded and indeed has an __attribute__((__packed__)) in the spec. For this reason I prefer to have the attribute everywhere. So people don't have to wonder why it's here and not there. Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 29 May 2013 14:24, Michael S. Tsirkin wrote: > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote: >> 1) C makes no guarantees about structure layout beyond the first >>member. Yes, if it's naturally aligned or has a packed attribute, >>GCC does the right thing. But this isn't kernel land anymore, >>portability matters and there are more compilers than GCC. > > You expect a compiler to pad this structure: > > struct foo { > uint8_t a; > uint8_t b; > uint16_t c; > uint32_t d; > }; > > I'm guessing any compiler that decides to waste memory in this way > will quickly get dropped by users and then we won't worry > about building QEMU with it. Structure alignment is a platform ABI choice. Oddly enough people choose operating systems on more criteria than the amount of padding their ABI mandates in structures. In any case it's really tedious to read a structure and wade through working out whether it's going to have all its members naturally aligned, especially once it's more than a handful of members in size. And getting it wrong is "silent portability problem". thanks -- PMM ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote: > Rusty Russell writes: > > > Anthony Liguori writes: > >> "Michael S. Tsirkin" writes: > >>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >>> +return proxy->device_feature_select; > >> > >> Oh dear no... Please use defines like the rest of QEMU. > > > > It is pretty ugly. > > I think beauty is in the eye of the beholder here... > > Pretty much every device we have has a switch statement like this. > Consistency wins when it comes to qualitative arguments like this. > > > Yet the structure definitions are descriptive, capturing layout, size > > and endianness in natural a format readable by any C programmer. > > >From an API design point of view, here are the problems I see: > > 1) C makes no guarantees about structure layout beyond the first >member. Yes, if it's naturally aligned or has a packed attribute, >GCC does the right thing. But this isn't kernel land anymore, >portability matters and there are more compilers than GCC. You expect a compiler to pad this structure: struct foo { uint8_t a; uint8_t b; uint16_t c; uint32_t d; }; I'm guessing any compiler that decides to waste memory in this way will quickly get dropped by users and then we won't worry about building QEMU with it. > 2) If we every introduce anything like latching, this doesn't work out >so well anymore because it's hard to express in a single C structure >the register layout at that point. Perhaps a union could be used but >padding may make it a bit challenging. Then linux won't use it either. > 3) It suspect it's harder to review because a subtle change could more >easily have broad impact. If someone changed the type of a field >from u32 to u16, it changes the offset of every other field. That's >not terribly obvious in the patch itself unless you understand how >the structure is used elsewhere. > >This may not be a problem for virtio because we all understand that >the structures are part of an ABI, but if we used this pattern more >in QEMU, it would be a lot less obvious. So let's not use it more in QEMU. > > So AFAICT the question is, do we put the required > > > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > > > etc. in the kernel headers or qemu? > > I'm pretty sure we would end up just having our own integer defines. We > carry our own virtio headers today because we can't easily import the > kernel headers. Yes we can easily import them. And at least we copy headers verbatim. > >> Haven't looked at the proposed new ring layout yet. > > > > No change, but there's an open question on whether we should nail it to > > little endian (or define the endian by the transport). > > > > Of course, I can't rule out that the 1.0 standard *may* decide to frob > > the ring layout somehow, > > Well, given that virtio is widely deployed today, I would think the 1.0 > standard should strictly reflect what's deployed today, no? > Any new config layout would be 2.0 material, right? Not as it's currently planned. Devices can choose to support a legacy layout in addition to the new one, and if you look at the patch you will see that that is exactly what it does. > Re: the new config layout, I don't think we would want to use it for > anything but new devices. Forcing a guest driver change There's no forcing. If you look at the patches closely, you will see that we still support the old layout on BAR0. > is a really big > deal and I see no reason to do that unless there's a compelling reason > to. There are many a compelling reasons, and they are well known limitations of virtio PCI: - PCI spec compliance (madates device operation with IO memory disabled). - support 64 bit addressing - add more than 32 feature bits. - individually disable queues. - sanely support cross-endian systems. - support very small (<1 PAGE) for virtio rings. - support a separate page for each vq kick. - make each device place config at flexible offset. Addressing any one of these would cause us to add a substantially new way to operate virtio devices. And since it's a guest change anyway, it seemed like a good time to do the new layout and fix everything in one go. And they are needed like yesterday. > So we're stuck with the 1.0 config layout for a very long time. > > Regards, > > Anthony Liguori Absolutely. This patch let us support both which will allow for a gradual transition over the next 10 years or so. > > reason. I suggest that's 2.0 material... > > > > Cheers, > > Rusty. > > > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Rusty Russell writes: > Anthony Liguori writes: >> "Michael S. Tsirkin" writes: >>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>> +return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. > > It is pretty ugly. I think beauty is in the eye of the beholder here... Pretty much every device we have has a switch statement like this. Consistency wins when it comes to qualitative arguments like this. > Yet the structure definitions are descriptive, capturing layout, size > and endianness in natural a format readable by any C programmer. >From an API design point of view, here are the problems I see: 1) C makes no guarantees about structure layout beyond the first member. Yes, if it's naturally aligned or has a packed attribute, GCC does the right thing. But this isn't kernel land anymore, portability matters and there are more compilers than GCC. 2) If we every introduce anything like latching, this doesn't work out so well anymore because it's hard to express in a single C structure the register layout at that point. Perhaps a union could be used but padding may make it a bit challenging. 3) It suspect it's harder to review because a subtle change could more easily have broad impact. If someone changed the type of a field from u32 to u16, it changes the offset of every other field. That's not terribly obvious in the patch itself unless you understand how the structure is used elsewhere. This may not be a problem for virtio because we all understand that the structures are part of an ABI, but if we used this pattern more in QEMU, it would be a lot less obvious. > So AFAICT the question is, do we put the required > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > etc. in the kernel headers or qemu? I'm pretty sure we would end up just having our own integer defines. We carry our own virtio headers today because we can't easily import the kernel headers. >> Haven't looked at the proposed new ring layout yet. > > No change, but there's an open question on whether we should nail it to > little endian (or define the endian by the transport). > > Of course, I can't rule out that the 1.0 standard *may* decide to frob > the ring layout somehow, Well, given that virtio is widely deployed today, I would think the 1.0 standard should strictly reflect what's deployed today, no? Any new config layout would be 2.0 material, right? Re: the new config layout, I don't think we would want to use it for anything but new devices. Forcing a guest driver change is a really big deal and I see no reason to do that unless there's a compelling reason to. So we're stuck with the 1.0 config layout for a very long time. Regards, Anthony Liguori > but I'd think it would require a compelling > reason. I suggest that's 2.0 material... > > Cheers, > Rusty. > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 02:28:32PM +0200, Paolo Bonzini wrote: > Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto: > > >> If you really want to use offsetof like this you're > > >> going to need to decorate the structs with QEMU_PACKED. > >> > > >>> > > Nope. > >>> > > These structs are carefully designed not to have any padding. > >> > > >> > ...on every compiler and OS combination that QEMU builds for? > > Yes. All the way back to EGCS and before. > > GCC has been like this for many many years. > > I would still prefer to have QEMU_PACKED (or __attribute((__packed__)) > in the kernel). > > >>> > > And if there was a bug and there was some padding, we still > >>> > > can't fix it with PACKED because this structure > >>> > > is used to interact with the guest code which does not > >>> > > have the packed attribute. > >> > > >> > The guest code has to use a set of structure offsets and > >> > sizes which is fixed by the virtio ABI -- how it implements > >> > this is up to the guest (and if it misimplements it that is > >> > a guest bug and not our problem). > > On the other hand, encouraging both userspace and guest to reuse a > single set of headers means that the bad offset becomes a spec bug more > than a guest bug. Heh > > Deviating from driver in random ways is an endless source > > of hard to debug issues, and it's a very practical > > consideration because virtio spec is constantly > > being extended (unlike hardware which is mostly fixed). > > Agreed---but the driver should use __attribute__((__packed__)) too. > > Paolo For these specific structures I don't mind, we never dereference them anyway. If you do dereference a structure, using packed is generally a mistake. In particular because GCC generates code that is much slower. You can also get nasty surprises e.g. struct foo { char a; int b; } __attribute__((packed)); struct foo foo; int *a = &foo.a; Last line compiles fine but isn't legal C. So I think there are two cases with packed: 1. it does not change logical behaviour but results in bad code generated 2. it does change logical behaviour and leads to unsafe code There aren't many good reasons to use packed: if you must treat unaligned data, best to use constants. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto: > >> If you really want to use offsetof like this you're > >> going to need to decorate the structs with QEMU_PACKED. >> > >>> > > Nope. >>> > > These structs are carefully designed not to have any padding. >> > >> > ...on every compiler and OS combination that QEMU builds for? > Yes. All the way back to EGCS and before. > GCC has been like this for many many years. I would still prefer to have QEMU_PACKED (or __attribute((__packed__)) in the kernel). >>> > > And if there was a bug and there was some padding, we still >>> > > can't fix it with PACKED because this structure >>> > > is used to interact with the guest code which does not >>> > > have the packed attribute. >> > >> > The guest code has to use a set of structure offsets and >> > sizes which is fixed by the virtio ABI -- how it implements >> > this is up to the guest (and if it misimplements it that is >> > a guest bug and not our problem). On the other hand, encouraging both userspace and guest to reuse a single set of headers means that the bad offset becomes a spec bug more than a guest bug. > Deviating from driver in random ways is an endless source > of hard to debug issues, and it's a very practical > consideration because virtio spec is constantly > being extended (unlike hardware which is mostly fixed). Agreed---but the driver should use __attribute__((__packed__)) too. Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 11:53:17AM +0100, Peter Maydell wrote: > On 29 May 2013 11:08, Michael S. Tsirkin wrote: > > On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote: > >> Asserting is definitely the wrong thing here, since the > >> guest can trigger it. > > > > So? > > So "guest should not be able to crash QEMU" is a standard > rule: assert() is for QEMU bugs, not guest bugs. It's quite likely a QEMU bug - guests normally do fixed-size accesses so it's hard for them to access a field with a wrong size. It is guest triggerable but this does not contradict the fact that it never happens in practice. > Virtio isn't any different to any other device model. > > > It's a driver bug. It can reset or crash guest with the same effect, > > and it likely will if we let it continue. > > Letting a misbehaving guest crash itself is fine. Killing > QEMU isn't. *Why* isn't it? It has the advantage of making sure the misbehaving system is stopped before it does any damage. Why keep QEMU running even though we know there's a memory corruption somewhere? > > assert makes sure we don't let it escalate into some > > hard to debug security problem. > > If you want to make guest bugs easier to spot and debug > this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for. We really want something that would also stop guest and stop device model as well - we don't know where the bug is. > >> If you really want to use offsetof like this you're > >> going to need to decorate the structs with QEMU_PACKED. > > > Nope. > > These structs are carefully designed not to have any padding. > > ...on every compiler and OS combination that QEMU builds for? Yes. All the way back to EGCS and before. GCC has been like this for many many years. > > And if there was a bug and there was some padding, we still > > can't fix it with PACKED because this structure > > is used to interact with the guest code which does not > > have the packed attribute. > > The guest code has to use a set of structure offsets and > sizes which is fixed by the virtio ABI -- how it implements > this is up to the guest (and if it misimplements it that is > a guest bug and not our problem). Note also that the ABI > is not defined by a set of C source struct definitions > (except in as far as they are accompanied by a set of > restrictions on alignment, padding, etc that completely > determine the numerical alignments and offsets). In practical terms, we should all talk and agree on what's best for driver *and* QEMU, not have QEMU just ignore driver and do it's own thing. In practical terms, virtio in QEMU should use exactly same code to define layout as virtio in guest. Preferably same header file, we'll get there too, once we comple the discussion over the bikeshed color. Deviating from driver in random ways is an endless source of hard to debug issues, and it's a very practical consideration because virtio spec is constantly being extended (unlike hardware which is mostly fixed). > How QEMU implements the set of offsets and sizes specified > by the ABI is definitely our problem, and is exactly what > this discussion is about. The simplest and safest way to > get the offsets right on all platforms is just to use a set > of #defines, which is why that's what almost all of our > device models do. Maybe you don't feel safe when you see offsetof. I review a struct and see fields are aligned properly at a glance and I feel safe. But I don't feel safe when I see headers in guest and qemu have different code. > Where we do define a struct matching > guest memory layout, we tag it with QEMU_PACKED as our > best means of ensuring consistency on all hosts. > > thanks > -- PMM Further, packed structures produce terrible code in GCC, in all versions that QEMU cares about :). -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 29 May 2013 11:08, Michael S. Tsirkin wrote: > On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote: >> Asserting is definitely the wrong thing here, since the >> guest can trigger it. > > So? So "guest should not be able to crash QEMU" is a standard rule: assert() is for QEMU bugs, not guest bugs. Virtio isn't any different to any other device model. > It's a driver bug. It can reset or crash guest with the same effect, > and it likely will if we let it continue. Letting a misbehaving guest crash itself is fine. Killing QEMU isn't. > assert makes sure we don't let it escalate into some > hard to debug security problem. If you want to make guest bugs easier to spot and debug this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for. >> If you really want to use offsetof like this you're >> going to need to decorate the structs with QEMU_PACKED. > Nope. > These structs are carefully designed not to have any padding. ...on every compiler and OS combination that QEMU builds for? > And if there was a bug and there was some padding, we still > can't fix it with PACKED because this structure > is used to interact with the guest code which does not > have the packed attribute. The guest code has to use a set of structure offsets and sizes which is fixed by the virtio ABI -- how it implements this is up to the guest (and if it misimplements it that is a guest bug and not our problem). Note also that the ABI is not defined by a set of C source struct definitions (except in as far as they are accompanied by a set of restrictions on alignment, padding, etc that completely determine the numerical alignments and offsets). How QEMU implements the set of offsets and sizes specified by the ABI is definitely our problem, and is exactly what this discussion is about. The simplest and safest way to get the offsets right on all platforms is just to use a set of #defines, which is why that's what almost all of our device models do. Where we do define a struct matching guest memory layout, we tag it with QEMU_PACKED as our best means of ensuring consistency on all hosts. thanks -- PMM ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote: > On 29 May 2013 09:24, Michael S. Tsirkin wrote: > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > > index f4db224..fd09ea7 100644 > > --- a/hw/virtio/virtio-pci.c > > +++ b/hw/virtio/virtio-pci.c > > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void > > *opaque, hwaddr addr, > > { > > VirtIOPCIProxy *proxy = opaque; > > VirtIODevice *vdev = proxy->vdev; > > +struct virtio_pci_common_cfg cfg; > > > > uint64_t low = 0xull; > > > > switch (addr) { > > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > > +assert(size == sizeof cfg.device_feature_select); > > return proxy->device_feature_select; > > Asserting is definitely the wrong thing here, since the > guest can trigger it. So? It's a driver bug. It can reset or crash guest with the same effect, and it likely will if we let it continue. assert makes sure we don't let it escalate into some hard to debug security problem. > If you really want to use offsetof like this you're > going to need to decorate the structs with QEMU_PACKED. > > thanks > -- PMM Nope. These structs are carefully designed not to have any padding. And if there was a bug and there was some padding, we still can't fix it with PACKED because this structure is used to interact with the guest code which does not have the packed attribute. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 05/29/13 09:27, Paolo Bonzini wrote: > Il 29/05/2013 06:33, Rusty Russell ha scritto: >> Paolo Bonzini writes: >>> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: + +switch (addr) { +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. Any good reason not to use offsetof? >>> >>> I'm not sure it's portable to use it in case labels. IIRC, the >>> definition ((int)&(((T *)0)->field)) is not a valid C integer constant >>> expression. Laszlo? >> >> It's defined to yield an integer constant expression in the ISO standard >> (and I think ANSI too, though that's not at hand): > > It's not in C89. It is, see 7.1.6 Common definitions . > The oldest compiler QEMU cares about is GCC 4.2. I > don't know if it has a builtin offsetof, probably it does. Talking about nothing else but the ISO C standard(s), it doesn't matter how the C implementation deals with offsetof() internally. "C implementation" in this sense includes both compiler and standard library. If the standard library header ( or something internal included by it) expands the offsetof() macro to replacement text that does pointer black magic, magic that would be otherwise nonportable or undefined, it is alright as long as the accompanying compiler guarantees that the replacement text will work. That is, offsetof() gives the C implementor leeway to implement it in wherever "distribution" between compiler and standard library; the main thing is that the C program use the one offsetof() provided by the C implementation. Of course in the FLOSS world OS distros might mix and match gcc and glibc statistically randomly; normally some documentation should enable the user to put his/her system into ISO C or even SUSv[1234] mode. ( An interesting example for, say, "non-unified implementation" is "getconf" vs. "c99". I'll pick SUSv3 for this example. http://pubs.opengroup.org/onlinepubs/95399/utilities/getconf.html http://pubs.opengroup.org/onlinepubs/95399/utilities/c99.html In a nutshell one can interrogate getconf for CFLAGS, LDFLAGS and LIBS so that c99 builds a program in ILP32_OFF32 / ILP32_OFFBIG / LP64_OFF64 / LPBIG_OFFBIG mode ("programming environment"). On my x86_64 RHEL-6.4 laptop, "getconf" is part of glibc (glibc-common-2.12-1.107.el6.x86_64), while c99 is part of gcc (gcc-4.4.7-3.el6.x86_64). Supposing I'd like to build a 32-bit program with 64-bit off_t: getconf _POSIX_V6_ILP32_OFFBIG 1 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_CFLAGS -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LDFLAGS -m32 getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LIBS [empty string] However if I try to actually build a program using these flags: #define _XOPEN_SOURCE 600 int main(void) { return 0; } c99 \ -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \ -o mytest \ -m32 \ mytest.c I get /usr/bin/ld: crt1.o: No such file: No such file or directory collect2: ld returned 1 exit status unless I install "glibc-devel.i686". This problem is rooted in "getconf" (a glibc utility) being unaware of RHEL packaging choices: if the system can't guarantee the final c99 invocation to succeed, then the very first "getconf" invocation should write "-1\n" or "undefined\n" to stdout. (I'm aware that RHEL-6 is not certified UNIX 03; this is just an example for problems caused by various parts of a standard's (quasi-)implementation coming from separate projects. In that sense, caring about the offsetof() internals may have merit.) ) Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 29 May 2013 09:24, Michael S. Tsirkin wrote: > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index f4db224..fd09ea7 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void > *opaque, hwaddr addr, > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = proxy->vdev; > +struct virtio_pci_common_cfg cfg; > > uint64_t low = 0xull; > > switch (addr) { > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > +assert(size == sizeof cfg.device_feature_select); > return proxy->device_feature_select; Asserting is definitely the wrong thing here, since the guest can trigger it. If you really want to use offsetof like this you're going to need to decorate the structs with QEMU_PACKED. thanks -- PMM ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 10:24, Michael S. Tsirkin ha scritto: > On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote: >> Anthony Liguori writes: >>> "Michael S. Tsirkin" writes: +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy->device_feature_select; >>> >>> Oh dear no... Please use defines like the rest of QEMU. >> >> It is pretty ugly. >> >> Yet the structure definitions are descriptive, capturing layout, size >> and endianness in natural a format readable by any C programmer. >> >> So AFAICT the question is, do we put the required >> >> #define VIRTIO_PCI_CFG_FEATURE_SEL \ >> (offsetof(struct virtio_pci_common_cfg, device_feature_select)) >> >> etc. in the kernel headers or qemu? > > > I forgot that we need to validate size (different fields > have different sizes so memory core does not validate > for us). > > And that is much cleaner if we use offsetof directly: > You can see that you are checking the correct > size matching the offset, at a glance. I wonder if we can use and possibly extend Peter Crosthwaite's "register API" to support this better. Paolo > ---> > > virtio: new layout: add offset validation > > Signed-off-by: Michael S. Tsirkin > > --- > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index f4db224..fd09ea7 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void > *opaque, hwaddr addr, > { > VirtIOPCIProxy *proxy = opaque; > VirtIODevice *vdev = proxy->vdev; > +struct virtio_pci_common_cfg cfg; > > uint64_t low = 0xull; > > switch (addr) { > case offsetof(struct virtio_pci_common_cfg, device_feature_select): > +assert(size == sizeof cfg.device_feature_select); > return proxy->device_feature_select; > case offsetof(struct virtio_pci_common_cfg, device_feature): > +assert(size == sizeof cfg.device_feature); > /* TODO: 64-bit features */ > return proxy->device_feature_select ? 0 : proxy->host_features; > case offsetof(struct virtio_pci_common_cfg, guest_feature_select): > +assert(size == sizeof cfg.guest_feature_select); > return proxy->guest_feature_select; > case offsetof(struct virtio_pci_common_cfg, guest_feature): > +assert(size == sizeof cfg.guest_feature); > /* TODO: 64-bit features */ > return proxy->guest_feature_select ? 0 : vdev->guest_features; > case offsetof(struct virtio_pci_common_cfg, msix_config): > +assert(size == sizeof cfg.msix_config); > return vdev->config_vector; > case offsetof(struct virtio_pci_common_cfg, num_queues): > +assert(size == sizeof cfg.num_queues); > /* TODO: more exact limit? */ > return VIRTIO_PCI_QUEUE_MAX; > case offsetof(struct virtio_pci_common_cfg, device_status): > +assert(size == sizeof cfg.device_status); > return vdev->status; > > /* About a specific virtqueue. */ > case offsetof(struct virtio_pci_common_cfg, queue_select): > +assert(size == sizeof cfg.queue_select); > return vdev->queue_sel; > case offsetof(struct virtio_pci_common_cfg, queue_size): > +assert(size == sizeof cfg.queue_size); > return virtio_queue_get_num(vdev, vdev->queue_sel); > case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): > +assert(size == sizeof cfg.queue_msix_vector); > return virtio_queue_vector(vdev, vdev->queue_sel); > case offsetof(struct virtio_pci_common_cfg, queue_enable): > +assert(size == sizeof cfg.queue_enable); > /* TODO */ > return 0; > case offsetof(struct virtio_pci_common_cfg, queue_notify_off): > +assert(size == sizeof cfg.queue_notify_off); > return vdev->queue_sel; > case offsetof(struct virtio_pci_common_cfg, queue_desc): > +assert(size == 4); > return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: > +assert(size == 4); > return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32; > case offsetof(struct virtio_pci_common_cfg, queue_avail): > +assert(size == 4); > return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: > +assert(size == 4); > return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32; > case offsetof(struct virtio_pci_common_cfg, queue_used): > +assert(size == 4); > return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; > case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: > +assert(size == 4); > return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32; > default: > return 0; > @@ -523,76 +542,
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote: > Anthony Liguori writes: > > "Michael S. Tsirkin" writes: > >> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >> +return proxy->device_feature_select; > > > > Oh dear no... Please use defines like the rest of QEMU. > > It is pretty ugly. > > Yet the structure definitions are descriptive, capturing layout, size > and endianness in natural a format readable by any C programmer. > > So AFAICT the question is, do we put the required > > #define VIRTIO_PCI_CFG_FEATURE_SEL \ > (offsetof(struct virtio_pci_common_cfg, device_feature_select)) > > etc. in the kernel headers or qemu? I forgot that we need to validate size (different fields have different sizes so memory core does not validate for us). And that is much cleaner if we use offsetof directly: You can see that you are checking the correct size matching the offset, at a glance. ---> virtio: new layout: add offset validation Signed-off-by: Michael S. Tsirkin --- diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index f4db224..fd09ea7 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, { VirtIOPCIProxy *proxy = opaque; VirtIODevice *vdev = proxy->vdev; +struct virtio_pci_common_cfg cfg; uint64_t low = 0xull; switch (addr) { case offsetof(struct virtio_pci_common_cfg, device_feature_select): +assert(size == sizeof cfg.device_feature_select); return proxy->device_feature_select; case offsetof(struct virtio_pci_common_cfg, device_feature): +assert(size == sizeof cfg.device_feature); /* TODO: 64-bit features */ return proxy->device_feature_select ? 0 : proxy->host_features; case offsetof(struct virtio_pci_common_cfg, guest_feature_select): +assert(size == sizeof cfg.guest_feature_select); return proxy->guest_feature_select; case offsetof(struct virtio_pci_common_cfg, guest_feature): +assert(size == sizeof cfg.guest_feature); /* TODO: 64-bit features */ return proxy->guest_feature_select ? 0 : vdev->guest_features; case offsetof(struct virtio_pci_common_cfg, msix_config): +assert(size == sizeof cfg.msix_config); return vdev->config_vector; case offsetof(struct virtio_pci_common_cfg, num_queues): +assert(size == sizeof cfg.num_queues); /* TODO: more exact limit? */ return VIRTIO_PCI_QUEUE_MAX; case offsetof(struct virtio_pci_common_cfg, device_status): +assert(size == sizeof cfg.device_status); return vdev->status; /* About a specific virtqueue. */ case offsetof(struct virtio_pci_common_cfg, queue_select): +assert(size == sizeof cfg.queue_select); return vdev->queue_sel; case offsetof(struct virtio_pci_common_cfg, queue_size): +assert(size == sizeof cfg.queue_size); return virtio_queue_get_num(vdev, vdev->queue_sel); case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): +assert(size == sizeof cfg.queue_msix_vector); return virtio_queue_vector(vdev, vdev->queue_sel); case offsetof(struct virtio_pci_common_cfg, queue_enable): +assert(size == sizeof cfg.queue_enable); /* TODO */ return 0; case offsetof(struct virtio_pci_common_cfg, queue_notify_off): +assert(size == sizeof cfg.queue_notify_off); return vdev->queue_sel; case offsetof(struct virtio_pci_common_cfg, queue_desc): +assert(size == 4); return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: +assert(size == 4); return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32; case offsetof(struct virtio_pci_common_cfg, queue_avail): +assert(size == 4); return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: +assert(size == 4); return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32; case offsetof(struct virtio_pci_common_cfg, queue_used): +assert(size == 4); return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; case offsetof(struct virtio_pci_common_cfg, queue_used) + 4: +assert(size == 4); return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32; default: return 0; @@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque, hwaddr addr, { VirtIOPCIProxy *proxy = opaque; VirtIODevice *vdev = proxy->vdev; +struct virtio_pci_common_cfg cfg; uint64_t low = 0xull; uint64_t high = ~low; switch (addr) { case offsetof(struct virtio_pci_common_cfg, device_feature_
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Wed, May 29, 2013 at 09:27:02AM +0200, Paolo Bonzini wrote: > Il 29/05/2013 06:33, Rusty Russell ha scritto: > > Paolo Bonzini writes: > >> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: > >>> + > >>> +switch (addr) { > >>> +case offsetof(struct virtio_pci_common_cfg, > >>> device_feature_select): > >>> +return proxy->device_feature_select; > > > > Oh dear no... Please use defines like the rest of QEMU. > >>> Any good reason not to use offsetof? > >> > >> I'm not sure it's portable to use it in case labels. IIRC, the > >> definition ((int)&(((T *)0)->field)) is not a valid C integer constant > >> expression. Laszlo? > > > > It's defined to yield an integer constant expression in the ISO standard > > (and I think ANSI too, though that's not at hand): > > It's not in C89. The oldest compiler QEMU cares about is GCC 4.2. I > don't know if it has a builtin offsetof, probably it does. Yes, I think __builtin_offsetof was added in 4.0: http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Offsetof.html > But I'm not sure about other users of virtio headers. > > Paolo > > > > 7.19, para 3: > > > > ...offsetof(type, member-designator) > > which expands to an integer constant expression that has type > > size_t, ... > > > > The real question is whether compilers qemu cares about meet the > > standard (there's some evidence that older compilers fail this). If > > not, we'll have to define them as raw offsets... So I think the question is whether we care about GCC 3. We used to when mingw in debian stable was GCC 3, but not anymore ... > > > > Cheers, > > Rusty. > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 29/05/2013 06:33, Rusty Russell ha scritto: > Paolo Bonzini writes: >> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: >>> + >>> +switch (addr) { >>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>> +return proxy->device_feature_select; > > Oh dear no... Please use defines like the rest of QEMU. >>> Any good reason not to use offsetof? >> >> I'm not sure it's portable to use it in case labels. IIRC, the >> definition ((int)&(((T *)0)->field)) is not a valid C integer constant >> expression. Laszlo? > > It's defined to yield an integer constant expression in the ISO standard > (and I think ANSI too, though that's not at hand): It's not in C89. The oldest compiler QEMU cares about is GCC 4.2. I don't know if it has a builtin offsetof, probably it does. But I'm not sure about other users of virtio headers. Paolo > > 7.19, para 3: > > ...offsetof(type, member-designator) > which expands to an integer constant expression that has type > size_t, ... > > The real question is whether compilers qemu cares about meet the > standard (there's some evidence that older compilers fail this). If > not, we'll have to define them as raw offsets... > > Cheers, > Rusty. > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Anthony Liguori writes: > "Michael S. Tsirkin" writes: >> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >> +return proxy->device_feature_select; > > Oh dear no... Please use defines like the rest of QEMU. It is pretty ugly. Yet the structure definitions are descriptive, capturing layout, size and endianness in natural a format readable by any C programmer. So AFAICT the question is, do we put the required #define VIRTIO_PCI_CFG_FEATURE_SEL \ (offsetof(struct virtio_pci_common_cfg, device_feature_select)) etc. in the kernel headers or qemu? > Haven't looked at the proposed new ring layout yet. No change, but there's an open question on whether we should nail it to little endian (or define the endian by the transport). Of course, I can't rule out that the 1.0 standard *may* decide to frob the ring layout somehow, but I'd think it would require a compelling reason. I suggest that's 2.0 material... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Paolo Bonzini writes: > Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: > > + > > +switch (addr) { > > +case offsetof(struct virtio_pci_common_cfg, > > device_feature_select): > > +return proxy->device_feature_select; >>> > >>> > Oh dear no... Please use defines like the rest of QEMU. >> Any good reason not to use offsetof? > > I'm not sure it's portable to use it in case labels. IIRC, the > definition ((int)&(((T *)0)->field)) is not a valid C integer constant > expression. Laszlo? It's defined to yield an integer constant expression in the ISO standard (and I think ANSI too, though that's not at hand): 7.19, para 3: ...offsetof(type, member-designator) which expands to an integer constant expression that has type size_t, ... The real question is whether compilers qemu cares about meet the standard (there's some evidence that older compilers fail this). If not, we'll have to define them as raw offsets... Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On 05/28/13 19:43, Paolo Bonzini wrote: > Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: >> + >> +switch (addr) { >> +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >> +return proxy->device_feature_select; Oh dear no... Please use defines like the rest of QEMU. >> Any good reason not to use offsetof? > > I'm not sure it's portable to use it in case labels. IIRC, the > definition ((int)&(((T *)0)->field)) is not a valid C integer constant > expression. Laszlo? As long as we use the offsetof() that comes with the C implementation (ie. we don't hack it up ourselves), we should be safe; ISO C99 elegantly defines offsetof() in "7.17 Common definitions " p3 as The macros are [...] offsetof(type, member-designator) which expands to an integer constant expression that has type *size_t*, the value of which is the offset in bytes, to the structure member (designated by /member-designator/), from the beginning of its structure (designated by /type/). The type and member designator shall be such that given static type t; then the expression &(t.member-designator) evaluates to an address constant. (If the specified member is a bit-field, the behavior is undefined.) ("Address constant" is described in 6.6p9 but quoting even that here doesn't seem necesary.) >From 6.8.4.2p3, "The expression of each case label shall be an integer constant expression [...]". So, (a) if we use the standard offsetof(), (b) and you can also write, for example, static struct virtio_pci_common_cfg t; static void *p = &t.device_feature_select; then the construct seems valid. (Consistently with the above mention of UB if the specified member is a bit-field: the address-of operator's constraints also forbid a bit-field object as operand.) Laszlo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, May 28, 2013 at 01:53:35PM -0500, Anthony Liguori wrote: > "Michael S. Tsirkin" writes: > > > On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote: > >> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, > >> > hwaddr addr, > >> > } > >> > } > >> > > >> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > >> > + unsigned size) > >> > +{ > >> > +VirtIOPCIProxy *proxy = opaque; > >> > +VirtIODevice *vdev = proxy->vdev; > >> > + > >> > +uint64_t low = 0xull; > >> > + > >> > +switch (addr) { > >> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > >> > +return proxy->device_feature_select; > >> > >> Oh dear no... Please use defines like the rest of QEMU. > > > > Any good reason not to use offsetof? > > I see about 138 examples in qemu. > > There are exactly zero: > > $ find . -name "*.c" -exec grep -l "case offset" {} \; > $ > > > Anyway, that's the way Rusty wrote it in the kernel header - > > I started with defines. > > If you convince Rusty to switch I can switch too, > > We have 300+ devices in QEMU that use #defines. We're not using this > kind of thing just because you want to copy code from the kernel. Rusty, let's switch to defines? > >> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 > >> > >> And: > >> > >> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb > >> > >> Which lets virtio-pci be subclassable and then remaps the config space to > >> BAR2. > > > > > > Interesting. Have the spec anywhere? > > Not yet, but working on that. > > > You are saying this is going to conflict because > > of BAR2 usage, yes? > > No, this whole thing is flexible. I had to use BAR2 because BAR0 has to > be the vram mapping. It also had to be an MMIO bar. > > The new layout might make it easier to implement a device like this. Absolutely, you can put thing anywhere you like. > I > shared it mainly because I wanted to show the subclassing idea vs. just > tacking an option onto the existing virtio-pci code in QEMU. > > Regards, > > Anthony Liguori I don't think a subclass will work for the new layout. This should be completely transparent to users, just have more devices work with more flexibility. > > So let's only do this virtio-fb only for new layout, so we don't need > > to maintain compatibility. In particular, we are working > > on making memory BAR access fast for virtio devices > > in a generic way. At the moment they are measureably slower > > than PIO on x86. > > > > > >> Haven't looked at the proposed new ring layout yet. > >> > >> Regards, > > > > No new ring layout. It's new config layout. > > > > > > -- > > MST > > -- > > To unsubscribe from this list: send the line "unsubscribe kvm" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote: >> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, >> > hwaddr addr, >> > } >> > } >> > >> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, >> > + unsigned size) >> > +{ >> > +VirtIOPCIProxy *proxy = opaque; >> > +VirtIODevice *vdev = proxy->vdev; >> > + >> > +uint64_t low = 0xull; >> > + >> > +switch (addr) { >> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >> > +return proxy->device_feature_select; >> >> Oh dear no... Please use defines like the rest of QEMU. > > Any good reason not to use offsetof? > I see about 138 examples in qemu. There are exactly zero: $ find . -name "*.c" -exec grep -l "case offset" {} \; $ > Anyway, that's the way Rusty wrote it in the kernel header - > I started with defines. > If you convince Rusty to switch I can switch too, We have 300+ devices in QEMU that use #defines. We're not using this kind of thing just because you want to copy code from the kernel. >> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 >> >> And: >> >> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb >> >> Which lets virtio-pci be subclassable and then remaps the config space to >> BAR2. > > > Interesting. Have the spec anywhere? Not yet, but working on that. > You are saying this is going to conflict because > of BAR2 usage, yes? No, this whole thing is flexible. I had to use BAR2 because BAR0 has to be the vram mapping. It also had to be an MMIO bar. The new layout might make it easier to implement a device like this. I shared it mainly because I wanted to show the subclassing idea vs. just tacking an option onto the existing virtio-pci code in QEMU. Regards, Anthony Liguori > So let's only do this virtio-fb only for new layout, so we don't need > to maintain compatibility. In particular, we are working > on making memory BAR access fast for virtio devices > in a generic way. At the moment they are measureably slower > than PIO on x86. > > >> Haven't looked at the proposed new ring layout yet. >> >> Regards, > > No new ring layout. It's new config layout. > > > -- > MST > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto: >>> > > + >>> > > +switch (addr) { >>> > > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): >>> > > +return proxy->device_feature_select; >> > >> > Oh dear no... Please use defines like the rest of QEMU. > Any good reason not to use offsetof? I'm not sure it's portable to use it in case labels. IIRC, the definition ((int)&(((T *)0)->field)) is not a valid C integer constant expression. Laszlo? > I see about 138 examples in qemu. Almost all of them are about fields in the host, not the guest, and none of them are in case statements. Paolo > > Anyway, that's the way Rusty wrote it in the kernel header - > I started with defines. > If you convince Rusty to switch I can switch too, > but I prefer reusing same structures as linux even if > for now I've given up on reusing same headers. > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote: > > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, > > hwaddr addr, > > } > > } > > > > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > > + unsigned size) > > +{ > > +VirtIOPCIProxy *proxy = opaque; > > +VirtIODevice *vdev = proxy->vdev; > > + > > +uint64_t low = 0xull; > > + > > +switch (addr) { > > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > > +return proxy->device_feature_select; > > Oh dear no... Please use defines like the rest of QEMU. Any good reason not to use offsetof? I see about 138 examples in qemu. Anyway, that's the way Rusty wrote it in the kernel header - I started with defines. If you convince Rusty to switch I can switch too, but I prefer reusing same structures as linux even if for now I've given up on reusing same headers. > >From a QEMU pov, take a look at: > > https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 > > And: > > https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb > > Which lets virtio-pci be subclassable and then remaps the config space to > BAR2. Interesting. Have the spec anywhere? You are saying this is going to conflict because of BAR2 usage, yes? So let's only do this virtio-fb only for new layout, so we don't need to maintain compatibility. In particular, we are working on making memory BAR access fast for virtio devices in a generic way. At the moment they are measureably slower than PIO on x86. > Haven't looked at the proposed new ring layout yet. > > Regards, No new ring layout. It's new config layout. -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
"Michael S. Tsirkin" writes: > This adds support for new config, and is designed to work with > the new layout code in Rusty's new layout branch. > > At the moment all fields are in the same memory BAR (bar 2). > This will be used to test performance and compare > memory, io and hypercall latency. > > Compiles but does not work yet. > Migration isn't handled yet. > > It's not clear what do queue_enable/queue_disable > fields do, not yet implemented. > > Gateway for config access with config cycles > not yet implemented. > > Sending out for early review/flames. > > Signed-off-by: Michael S. Tsirkin > --- > hw/virtio/virtio-pci.c | 393 > +++-- > hw/virtio/virtio-pci.h | 55 +++ > hw/virtio/virtio.c | 20 +++ > include/hw/virtio/virtio.h | 4 + > 4 files changed, 458 insertions(+), 14 deletions(-) > > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c > index 752991a..f4db224 100644 > --- a/hw/virtio/virtio-pci.c > +++ b/hw/virtio/virtio-pci.c > @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy > *proxy) > proxy->ioeventfd_started = false; > } > > +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val) > +{ > +VirtIODevice *vdev = proxy->vdev; > + > +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > +virtio_pci_stop_ioeventfd(proxy); > +} > + > +virtio_set_status(vdev, val & 0xFF); > + > +if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > +virtio_pci_start_ioeventfd(proxy); > +} > + > +if (vdev->status == 0) { > +virtio_reset(proxy->vdev); > +msix_unuse_all_vectors(&proxy->pci_dev); > +} > +} > + > static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) > { > VirtIOPCIProxy *proxy = opaque; > @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t > addr, uint32_t val) > } > break; > case VIRTIO_PCI_STATUS: > -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { > -virtio_pci_stop_ioeventfd(proxy); > -} > - > -virtio_set_status(vdev, val & 0xFF); > - > -if (val & VIRTIO_CONFIG_S_DRIVER_OK) { > -virtio_pci_start_ioeventfd(proxy); > -} > - > -if (vdev->status == 0) { > -virtio_reset(proxy->vdev); > -msix_unuse_all_vectors(&proxy->pci_dev); > -} > +virtio_pci_set_status(proxy, val); > > /* Linux before 2.6.34 sets the device as OK without enabling > the PCI device bus master bit. In this case we need to disable > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, > hwaddr addr, > } > } > > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, > + unsigned size) > +{ > +VirtIOPCIProxy *proxy = opaque; > +VirtIODevice *vdev = proxy->vdev; > + > +uint64_t low = 0xull; > + > +switch (addr) { > +case offsetof(struct virtio_pci_common_cfg, device_feature_select): > +return proxy->device_feature_select; Oh dear no... Please use defines like the rest of QEMU. >From a QEMU pov, take a look at: https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659 And: https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb Which lets virtio-pci be subclassable and then remaps the config space to BAR2. Haven't looked at the proposed new ring layout yet. Regards, Anthony Liguori > +case offsetof(struct virtio_pci_common_cfg, device_feature): > +/* TODO: 64-bit features */ > + return proxy->device_feature_select ? 0 : proxy->host_features; > +case offsetof(struct virtio_pci_common_cfg, guest_feature_select): > +return proxy->guest_feature_select; > +case offsetof(struct virtio_pci_common_cfg, guest_feature): > +/* TODO: 64-bit features */ > + return proxy->guest_feature_select ? 0 : vdev->guest_features; > +case offsetof(struct virtio_pci_common_cfg, msix_config): > + return vdev->config_vector; > +case offsetof(struct virtio_pci_common_cfg, num_queues): > +/* TODO: more exact limit? */ > + return VIRTIO_PCI_QUEUE_MAX; > +case offsetof(struct virtio_pci_common_cfg, device_status): > +return vdev->status; > + > + /* About a specific virtqueue. */ > +case offsetof(struct virtio_pci_common_cfg, queue_select): > +return vdev->queue_sel; > +case offsetof(struct virtio_pci_common_cfg, queue_size): > +return virtio_queue_get_num(vdev, vdev->queue_sel); > +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): > + return virtio_queue_vector(vdev, vdev->queue_sel); > +case offsetof(struct virtio_pci_common_cfg, queue_enable): > +/* TODO */ > + return 0; > +case offsetof(struct virtio_pci_common_cfg, queue_notify_off): > +return vdev->queue_sel;
[PATCH RFC] virtio-pci: new config layout: using memory BAR
This adds support for new config, and is designed to work with the new layout code in Rusty's new layout branch. At the moment all fields are in the same memory BAR (bar 2). This will be used to test performance and compare memory, io and hypercall latency. Compiles but does not work yet. Migration isn't handled yet. It's not clear what do queue_enable/queue_disable fields do, not yet implemented. Gateway for config access with config cycles not yet implemented. Sending out for early review/flames. Signed-off-by: Michael S. Tsirkin --- hw/virtio/virtio-pci.c | 393 +++-- hw/virtio/virtio-pci.h | 55 +++ hw/virtio/virtio.c | 20 +++ include/hw/virtio/virtio.h | 4 + 4 files changed, 458 insertions(+), 14 deletions(-) diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c index 752991a..f4db224 100644 --- a/hw/virtio/virtio-pci.c +++ b/hw/virtio/virtio-pci.c @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy *proxy) proxy->ioeventfd_started = false; } +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val) +{ +VirtIODevice *vdev = proxy->vdev; + +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { +virtio_pci_stop_ioeventfd(proxy); +} + +virtio_set_status(vdev, val & 0xFF); + +if (val & VIRTIO_CONFIG_S_DRIVER_OK) { +virtio_pci_start_ioeventfd(proxy); +} + +if (vdev->status == 0) { +virtio_reset(proxy->vdev); +msix_unuse_all_vectors(&proxy->pci_dev); +} +} + static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) { VirtIOPCIProxy *proxy = opaque; @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val) } break; case VIRTIO_PCI_STATUS: -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) { -virtio_pci_stop_ioeventfd(proxy); -} - -virtio_set_status(vdev, val & 0xFF); - -if (val & VIRTIO_CONFIG_S_DRIVER_OK) { -virtio_pci_start_ioeventfd(proxy); -} - -if (vdev->status == 0) { -virtio_reset(proxy->vdev); -msix_unuse_all_vectors(&proxy->pci_dev); -} +virtio_pci_set_status(proxy, val); /* Linux before 2.6.34 sets the device as OK without enabling the PCI device bus master bit. In this case we need to disable @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, hwaddr addr, } } +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr, + unsigned size) +{ +VirtIOPCIProxy *proxy = opaque; +VirtIODevice *vdev = proxy->vdev; + +uint64_t low = 0xull; + +switch (addr) { +case offsetof(struct virtio_pci_common_cfg, device_feature_select): +return proxy->device_feature_select; +case offsetof(struct virtio_pci_common_cfg, device_feature): +/* TODO: 64-bit features */ + return proxy->device_feature_select ? 0 : proxy->host_features; +case offsetof(struct virtio_pci_common_cfg, guest_feature_select): +return proxy->guest_feature_select; +case offsetof(struct virtio_pci_common_cfg, guest_feature): +/* TODO: 64-bit features */ + return proxy->guest_feature_select ? 0 : vdev->guest_features; +case offsetof(struct virtio_pci_common_cfg, msix_config): + return vdev->config_vector; +case offsetof(struct virtio_pci_common_cfg, num_queues): +/* TODO: more exact limit? */ + return VIRTIO_PCI_QUEUE_MAX; +case offsetof(struct virtio_pci_common_cfg, device_status): +return vdev->status; + + /* About a specific virtqueue. */ +case offsetof(struct virtio_pci_common_cfg, queue_select): +return vdev->queue_sel; +case offsetof(struct virtio_pci_common_cfg, queue_size): +return virtio_queue_get_num(vdev, vdev->queue_sel); +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector): + return virtio_queue_vector(vdev, vdev->queue_sel); +case offsetof(struct virtio_pci_common_cfg, queue_enable): +/* TODO */ + return 0; +case offsetof(struct virtio_pci_common_cfg, queue_notify_off): +return vdev->queue_sel; +case offsetof(struct virtio_pci_common_cfg, queue_desc): +return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low; +case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4: +return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32; +case offsetof(struct virtio_pci_common_cfg, queue_avail): +return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low; +case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4: +return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32; +case offsetof(struct virtio_pci_common_cfg, queue_used): +return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low; +