Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-07-07 Thread Kevin O'Connor
I only recently saw this email.

On Thu, Jun 06, 2013 at 06:10:12PM +0300, Gleb Natapov wrote:
> On Thu, Jun 06, 2013 at 05:06:32PM +0200, Gerd Hoffmann wrote:
> > For seabios itself this isn't a big issue, see pci_{readl,writel} in
> > src/pci.c.  When called in 16bit mode it goes into 32bit mode
> > temporarily, just for accessing the mmio register.  ahci driver uses it,
> > xhci driver (wip atm) will use that too, and virtio-{blk,scsi} drivers
> > in seabios can do the same.
> > 
> Isn't this approach broken? How can SeaBIOS be sure it restores real
> mode registers to exactly same state they were before entering 32bit
> mode?

You are correct - SeaBIOS can't fully restore the "hidden" segment
registers.  So, in a way it is broken.

In practice, it seems to work on modern bootloaders (eg, ntldr, grub,
lilo).  It definitely doesn't work with EMM386 (old DOS stuff), but
does seem to work okay with FreeDOS as long as one doesn't run EMM386.

The AHCI code uses this 32bit/16bit trampoline because it would not be
possible to support AHCI otherwise.  I haven't seen any complaints of
failures with the AHCI code - probably because people using AHCI are
using modern guests.

I explored this a bit some time back and the only way I could think of
to reliably restore the 16bit registers would be via SMM.
Unfortunately, using SMM introduces a whole host of complexity and
problems.

-Kevin
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-03 Thread Michael S. Tsirkin
On Tue, Jun 04, 2013 at 03:01:50PM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
> >> "Michael S. Tsirkin"  writes:
> >> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
> >> >> Rusty Russell  writes:
> >> >> 
> >> >> > Anthony Liguori  writes:
> >> >> >> Forcing a guest driver change is a really big
> >> >> >> deal and I see no reason to do that unless there's a compelling 
> >> >> >> reason
> >> >> >> to.
> >> >> >>
> >> >> >> So we're stuck with the 1.0 config layout for a very long time.
> >> >> >
> >> >> > We definitely must not force a guest change.  The explicit aim of the
> >> >> > standard is that "legacy" and 1.0 be backward compatible.  One
> >> >> > deliverable is a document detailing how this is done (effectively a
> >> >> > summary of changes between what we have and 1.0).
> >> >> 
> >> >> If 2.0 is fully backwards compatible, great.  It seems like such a
> >> >> difference that that would be impossible but I need to investigate
> >> >> further.
> >> >> 
> >> >> Regards,
> >> >> 
> >> >> Anthony Liguori
> >> >
> >> > If you look at my patches you'll see how it works.
> >> > Basically old guests use BAR0 new ones don't, so
> >> > it's easy: BAR0 access means legacy guest.
> >> > Only started testing but things seem to work
> >> > fine with old guests so far.
> >> >
> >> > I think we need a spec, not just driver code.
> >> >
> >> > Rusty what's the plan? Want me to write it?
> >> 
> >> We need both, of course, but the spec work will happen in the OASIS WG.
> >> A draft is good, but let's not commit anything to upstream QEMU until we
> >> get the spec finalized.  And that is proposed to be late this year.
> >
> > Well that would be quite sad really.
> > 
> > This means we can't make virtio a spec compliant pci express device,
> > and we can't add any more feature bits, so no
> > flexible buffer optimizations for virtio net.
> >
> > There are probably more projects that will be blocked.
> >
> > So how about we keep extending legacy layout for a bit longer:
> > - add a way to access device with MMIO
> > - use feature bit 31 to signal 64 bit features
> >   (and shift device config accordingly)
> 
> By my count, net still has 7 feature bits left, so I don't think the
> feature bits are likely to be a limitation in the next 6 months?

Yes but you wanted a generic transport feature bit
for flexible SG layout.
Are you happy with VIRTIO_NET_F_ANY_HEADER_SG then?

> MMIO is a bigger problem.  Linux guests are happy with it: does it break
> the Windows drivers?
> 
> Thanks,
> Rusty.

You mean make BAR0 an MMIO BAR?
Yes, it would break current windows guests.
Further, as long as we use same address to notify all queues,
we would also need to decode the instruction on x86 and that's
measureably slower than PIO.
We could go back to discussing hypercall use for notifications,
but that has its own set of issues...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-03 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
>> "Michael S. Tsirkin"  writes:
>> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
>> >> Rusty Russell  writes:
>> >> 
>> >> > Anthony Liguori  writes:
>> >> >> Forcing a guest driver change is a really big
>> >> >> deal and I see no reason to do that unless there's a compelling reason
>> >> >> to.
>> >> >>
>> >> >> So we're stuck with the 1.0 config layout for a very long time.
>> >> >
>> >> > We definitely must not force a guest change.  The explicit aim of the
>> >> > standard is that "legacy" and 1.0 be backward compatible.  One
>> >> > deliverable is a document detailing how this is done (effectively a
>> >> > summary of changes between what we have and 1.0).
>> >> 
>> >> If 2.0 is fully backwards compatible, great.  It seems like such a
>> >> difference that that would be impossible but I need to investigate
>> >> further.
>> >> 
>> >> Regards,
>> >> 
>> >> Anthony Liguori
>> >
>> > If you look at my patches you'll see how it works.
>> > Basically old guests use BAR0 new ones don't, so
>> > it's easy: BAR0 access means legacy guest.
>> > Only started testing but things seem to work
>> > fine with old guests so far.
>> >
>> > I think we need a spec, not just driver code.
>> >
>> > Rusty what's the plan? Want me to write it?
>> 
>> We need both, of course, but the spec work will happen in the OASIS WG.
>> A draft is good, but let's not commit anything to upstream QEMU until we
>> get the spec finalized.  And that is proposed to be late this year.
>
> Well that would be quite sad really.
> 
> This means we can't make virtio a spec compliant pci express device,
> and we can't add any more feature bits, so no
> flexible buffer optimizations for virtio net.
>
> There are probably more projects that will be blocked.
>
> So how about we keep extending legacy layout for a bit longer:
> - add a way to access device with MMIO
> - use feature bit 31 to signal 64 bit features
>   (and shift device config accordingly)

By my count, net still has 7 feature bits left, so I don't think the
feature bits are likely to be a limitation in the next 6 months?

MMIO is a bigger problem.  Linux guests are happy with it: does it break
the Windows drivers?

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-03 Thread Michael S. Tsirkin
On Mon, Jun 03, 2013 at 09:56:15AM +0930, Rusty Russell wrote:
> "Michael S. Tsirkin"  writes:
> > On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
> >> Rusty Russell  writes:
> >> 
> >> > Anthony Liguori  writes:
> >> >> Forcing a guest driver change is a really big
> >> >> deal and I see no reason to do that unless there's a compelling reason
> >> >> to.
> >> >>
> >> >> So we're stuck with the 1.0 config layout for a very long time.
> >> >
> >> > We definitely must not force a guest change.  The explicit aim of the
> >> > standard is that "legacy" and 1.0 be backward compatible.  One
> >> > deliverable is a document detailing how this is done (effectively a
> >> > summary of changes between what we have and 1.0).
> >> 
> >> If 2.0 is fully backwards compatible, great.  It seems like such a
> >> difference that that would be impossible but I need to investigate
> >> further.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > If you look at my patches you'll see how it works.
> > Basically old guests use BAR0 new ones don't, so
> > it's easy: BAR0 access means legacy guest.
> > Only started testing but things seem to work
> > fine with old guests so far.
> >
> > I think we need a spec, not just driver code.
> >
> > Rusty what's the plan? Want me to write it?
> 
> We need both, of course, but the spec work will happen in the OASIS WG.
> A draft is good, but let's not commit anything to upstream QEMU until we
> get the spec finalized.  And that is proposed to be late this year.

Well that would be quite sad really.

This means we can't make virtio a spec compliant pci express device,
and we can't add any more feature bits, so no
flexible buffer optimizations for virtio net.

There are probably more projects that will be blocked.

So how about we keep extending legacy layout for a bit longer:
- add a way to access device with MMIO
- use feature bit 31 to signal 64 bit features
  (and shift device config accordingly)

No endian-ness rework, no per queue enable etc.


Then when we start discussions we will have a working
express and working 64 bit feature support,
and it will be that much easier to make it pretty it.


> Since I'm going to have to reformat the spec and adapt it into OASIS
> style anyway, perhaps you should prepare a description as a standalone
> text document.  Easier to email and work with...
> 
> Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we
> call them; I used the term "legacy" for existing implementations in the
> OASIS WG proposal), you add capabilities and don't point them into (the
> start of?) BAR0.  Old drivers use BAR0 as now.
> 
> One trick to note: while drivers shouldn't use both old and new style on
> the same device, you need to allow it for kexec, particularly reset via
> BAR0.
> 
> Cheers,
> Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-02 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
>> Rusty Russell  writes:
>> 
>> > Anthony Liguori  writes:
>> >> Forcing a guest driver change is a really big
>> >> deal and I see no reason to do that unless there's a compelling reason
>> >> to.
>> >>
>> >> So we're stuck with the 1.0 config layout for a very long time.
>> >
>> > We definitely must not force a guest change.  The explicit aim of the
>> > standard is that "legacy" and 1.0 be backward compatible.  One
>> > deliverable is a document detailing how this is done (effectively a
>> > summary of changes between what we have and 1.0).
>> 
>> If 2.0 is fully backwards compatible, great.  It seems like such a
>> difference that that would be impossible but I need to investigate
>> further.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> If you look at my patches you'll see how it works.
> Basically old guests use BAR0 new ones don't, so
> it's easy: BAR0 access means legacy guest.
> Only started testing but things seem to work
> fine with old guests so far.
>
> I think we need a spec, not just driver code.
>
> Rusty what's the plan? Want me to write it?

We need both, of course, but the spec work will happen in the OASIS WG.
A draft is good, but let's not commit anything to upstream QEMU until we
get the spec finalized.  And that is proposed to be late this year.

Since I'm going to have to reformat the spec and adapt it into OASIS
style anyway, perhaps you should prepare a description as a standalone
text document.  Easier to email and work with...

Now, the idea is that if you want to support 0.9 and 1.0 (or whatever we
call them; I used the term "legacy" for existing implementations in the
OASIS WG proposal), you add capabilities and don't point them into (the
start of?) BAR0.  Old drivers use BAR0 as now.

One trick to note: while drivers shouldn't use both old and new style on
the same device, you need to allow it for kexec, particularly reset via
BAR0.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-06-02 Thread Rusty Russell
"Michael S. Tsirkin"  writes:
> On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
>> >> Yet the structure definitions are descriptive, capturing layout, size
>> >> and endianness in natural a format readable by any C programmer.
>> >
>> >>From an API design point of view, here are the problems I see:
>> >
>> > 1) C makes no guarantees about structure layout beyond the first
>> >member.  Yes, if it's naturally aligned or has a packed attribute,
>> >GCC does the right thing.  But this isn't kernel land anymore,
>> >portability matters and there are more compilers than GCC.
>> 
>> [ I argued in detail here, then deleted.  Damn bikeshedding... ]
>> 
>> I think the best thing is to add the decoded integer versions as macros,
>> and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
>> match.
>
> Hmm you want to have both in the header?
>
> That would confuse users for sure :)
>
> I guess we could put in a big comment explaning
> why do we have both, and which version should
> userspace use. I tried to write it:
>
> /*
>  * On all known C compilers, you can use the
>  * offsetof macro to find the correct offset of each field -
>  * if your compiler doesn't implement this correctly, use the
>  * integer versions below, instead.
>  * In that case please don't use the structures above, to avoid confusion.
>  */

My version was a little less verbose :)

Subject: virtio_pci: macros for PCI layout offsets.

QEMU wants it, so why not?  Trust, but verify.

Signed-off-by: Rusty Russell 

diff --git a/drivers/virtio/virtio_pci_modern.c 
b/drivers/virtio/virtio_pci_modern.c
index 24bcd9b..6510009 100644
--- a/drivers/virtio/virtio_pci_modern.c
+++ b/drivers/virtio/virtio_pci_modern.c
@@ -453,6 +453,64 @@ static void __iomem *map_capability(struct pci_dev *dev, 
int off, size_t minlen,
return p;
 }
 
+/* This is part of the ABI.  Don't screw with it. */
+static inline void check_offsets(void)
+{
+   /* Note: disk space was harmed in compilation of this function. */
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_VNDR !=
+offsetof(struct virtio_pci_cap, cap_vndr));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_NEXT !=
+offsetof(struct virtio_pci_cap, cap_next));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_LEN !=
+offsetof(struct virtio_pci_cap, cap_len));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_TYPE_AND_BAR !=
+offsetof(struct virtio_pci_cap, type_and_bar));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_OFFSET !=
+offsetof(struct virtio_pci_cap, offset));
+   BUILD_BUG_ON(VIRTIO_PCI_CAP_LENGTH !=
+offsetof(struct virtio_pci_cap, length));
+   BUILD_BUG_ON(VIRTIO_PCI_NOTIFY_CAP_MULT !=
+offsetof(struct virtio_pci_notify_cap,
+ notify_off_multiplier));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_DFSELECT != 
+offsetof(struct virtio_pci_common_cfg,
+ device_feature_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_DF != 
+offsetof(struct virtio_pci_common_cfg, device_feature));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_GFSELECT != 
+offsetof(struct virtio_pci_common_cfg,
+ guest_feature_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_GF != 
+offsetof(struct virtio_pci_common_cfg, guest_feature));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_MSIX != 
+offsetof(struct virtio_pci_common_cfg, msix_config));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_NUMQ != 
+offsetof(struct virtio_pci_common_cfg, num_queues));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_STATUS != 
+offsetof(struct virtio_pci_common_cfg, device_status));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SELECT != 
+offsetof(struct virtio_pci_common_cfg, queue_select));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_SIZE != 
+offsetof(struct virtio_pci_common_cfg, queue_size));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_MSIX != 
+offsetof(struct virtio_pci_common_cfg, queue_msix_vector));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_ENABLE != 
+offsetof(struct virtio_pci_common_cfg, queue_enable));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_NOFF != 
+offsetof(struct virtio_pci_common_cfg, queue_notify_off));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCLO != 
+offsetof(struct virtio_pci_common_cfg, queue_desc_lo));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_DESCHI != 
+offsetof(struct virtio_pci_common_cfg, queue_desc_hi));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILLO != 
+offsetof(struct virtio_pci_common_cfg, queue_avail_lo));
+   BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_AVAILHI != 
+offsetof(struct virtio_pci_common_cfg, queue_avail_hi));
+   BUILD_BUG_O

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 08:53:45AM -0500, Anthony Liguori wrote:
> Rusty Russell  writes:
> 
> > Anthony Liguori  writes:
> >> Forcing a guest driver change is a really big
> >> deal and I see no reason to do that unless there's a compelling reason
> >> to.
> >>
> >> So we're stuck with the 1.0 config layout for a very long time.
> >
> > We definitely must not force a guest change.  The explicit aim of the
> > standard is that "legacy" and 1.0 be backward compatible.  One
> > deliverable is a document detailing how this is done (effectively a
> > summary of changes between what we have and 1.0).
> 
> If 2.0 is fully backwards compatible, great.  It seems like such a
> difference that that would be impossible but I need to investigate
> further.
> 
> Regards,
> 
> Anthony Liguori

If you look at my patches you'll see how it works.
Basically old guests use BAR0 new ones don't, so
it's easy: BAR0 access means legacy guest.
Only started testing but things seem to work
fine with old guests so far.

I think we need a spec, not just driver code.

Rusty what's the plan? Want me to write it?

> >
> > It's a delicate balancing act.  My plan is to accompany any changes in
> > the standard with a qemu implementation, so we can see how painful those
> > changes are.  And if there are performance implications, measure them.
> >
> > Cheers,
> > Rusty.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> Forcing a guest driver change is a really big
>> deal and I see no reason to do that unless there's a compelling reason
>> to.
>>
>> So we're stuck with the 1.0 config layout for a very long time.
>
> We definitely must not force a guest change.  The explicit aim of the
> standard is that "legacy" and 1.0 be backward compatible.  One
> deliverable is a document detailing how this is done (effectively a
> summary of changes between what we have and 1.0).

If 2.0 is fully backwards compatible, great.  It seems like such a
difference that that would be impossible but I need to investigate
further.

Regards,

Anthony Liguori

>
> It's a delicate balancing act.  My plan is to accompany any changes in
> the standard with a qemu implementation, so we can see how painful those
> changes are.  And if there are performance implications, measure them.
>
> Cheers,
> Rusty.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
> >> Yet the structure definitions are descriptive, capturing layout, size
> >> and endianness in natural a format readable by any C programmer.
> >
> >>From an API design point of view, here are the problems I see:
> >
> > 1) C makes no guarantees about structure layout beyond the first
> >member.  Yes, if it's naturally aligned or has a packed attribute,
> >GCC does the right thing.  But this isn't kernel land anymore,
> >portability matters and there are more compilers than GCC.
> 
> [ I argued in detail here, then deleted.  Damn bikeshedding... ]
> 
> I think the best thing is to add the decoded integer versions as macros,
> and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
> match.

Hmm you want to have both in the header?

That would confuse users for sure :)

I guess we could put in a big comment explaning
why do we have both, and which version should
userspace use. I tried to write it:

/*
 * On all known C compilers, you can use the
 * offsetof macro to find the correct offset of each field -
 * if your compiler doesn't implement this correctly, use the
 * integer versions below, instead.
 * In that case please don't use the structures above, to avoid confusion.
 */


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote:
> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
> > You expect a compiler to pad this structure:
> > 
> > struct foo {
> > uint8_t a;
> > uint8_t b;
> > uint16_t c;
> > uint32_t d;
> > };
> > 
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.
> 
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
> 
> Paolo

BTW we don't even do this consistently everywhere in QEMU.

It would have been better to have a rule to avoid packed
as much as possible, then we'd have found the misaligned field bug
in balloon before it's too late.

That would be a good rule to adopt I think:
any pack if something is misaligned, and document
the reason.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Thu, May 30, 2013 at 01:28:26PM +0930, Rusty Russell wrote:
> Anthony Liguori  writes:
> > Rusty Russell  writes:
> >
> >> Anthony Liguori  writes:
> >>> "Michael S. Tsirkin"  writes:
>  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>  +return proxy->device_feature_select;
> >>>
> >>> Oh dear no...  Please use defines like the rest of QEMU.
> >>
> >> It is pretty ugly.
> >
> > I think beauty is in the eye of the beholder here...
> >
> > Pretty much every device we have has a switch statement like this.
> > Consistency wins when it comes to qualitative arguments like this.
> 
> I was agreeing with you here, actually.
> 
> >> Yet the structure definitions are descriptive, capturing layout, size
> >> and endianness in natural a format readable by any C programmer.
> >
> >>From an API design point of view, here are the problems I see:
> >
> > 1) C makes no guarantees about structure layout beyond the first
> >member.  Yes, if it's naturally aligned or has a packed attribute,
> >GCC does the right thing.  But this isn't kernel land anymore,
> >portability matters and there are more compilers than GCC.
> 
> [ I argued in detail here, then deleted.  Damn bikeshedding... ]
> 
> I think the best thing is to add the decoded integer versions as macros,
> and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
> match.

On the qemu side, fine with me, all I want is ability to stay
close to kernel headers.
Let's have XXX_SIZE macros as well, so access size can be
easily validated?

On the kernel side, for 'struct virtio_pci_cap',
since we only ever do offsetof on this struct,
it might be easier to just use the numeric constants
directly.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Rusty Russell
Anthony Liguori  writes:
> Rusty Russell  writes:
>
>> Anthony Liguori  writes:
>>> "Michael S. Tsirkin"  writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy->device_feature_select;
>>>
>>> Oh dear no...  Please use defines like the rest of QEMU.
>>
>> It is pretty ugly.
>
> I think beauty is in the eye of the beholder here...
>
> Pretty much every device we have has a switch statement like this.
> Consistency wins when it comes to qualitative arguments like this.

I was agreeing with you here, actually.

>> Yet the structure definitions are descriptive, capturing layout, size
>> and endianness in natural a format readable by any C programmer.
>
>>From an API design point of view, here are the problems I see:
>
> 1) C makes no guarantees about structure layout beyond the first
>member.  Yes, if it's naturally aligned or has a packed attribute,
>GCC does the right thing.  But this isn't kernel land anymore,
>portability matters and there are more compilers than GCC.

[ I argued in detail here, then deleted.  Damn bikeshedding... ]

I think the best thing is to add the decoded integer versions as macros,
and have a heap of BUILD_BUG_ON() in the kernel source to make sure they
match.

> 3) It suspect it's harder to review because a subtle change could more
>easily have broad impact.  If someone changed the type of a field
>from u32 to u16, it changes the offset of every other field.  That's
>not terribly obvious in the patch itself unless you understand how
>the structure is used elsewhere.

MST's patch just did this, so point taken.  (MST: I'm going to combine
the cfg_type and bar bytes to fix this, patch coming).

>> No change, but there's an open question on whether we should nail it to
>> little endian (or define the endian by the transport).
>>
>> Of course, I can't rule out that the 1.0 standard *may* decide to frob
>> the ring layout somehow,
>
> Well, given that virtio is widely deployed today, I would think the 1.0
> standard should strictly reflect what's deployed today, no?

That will be up to the committee.  I think we want to fix some obvious
pain points, though qemu will not benefit from them in the next 5 years.

> Any new config layout would be 2.0 material, right?
>
> Re: the new config layout, I don't think we would want to use it for
> anything but new devices.

There are enough concrete reasons that I think we want it for existing
devices:

1) Negotiated ring size/alignment.  Coreboot wants smaller, others want
   larger.
2) Remove assertion that it has to be an I/O bar.  PowerPC wants this.
3) Notification location flexibility.  MST wanted this for performance.
4) More feature bits.

> Forcing a guest driver change is a really big
> deal and I see no reason to do that unless there's a compelling reason
> to.
>
> So we're stuck with the 1.0 config layout for a very long time.

We definitely must not force a guest change.  The explicit aim of the
standard is that "legacy" and 1.0 be backward compatible.  One
deliverable is a document detailing how this is done (effectively a
summary of changes between what we have and 1.0).

It's a delicate balancing act.  My plan is to accompany any changes in
the standard with a qemu implementation, so we can see how painful those
changes are.  And if there are performance implications, measure them.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote:
> > Not we. The BIOS can disable IO BAR: it can do this already
> > but the device won't be functional.
> 
> But the only way to expose the device over PCI express is to disable the
> IO BAR, right?

I think this is the source of the misunderstanding:

1.3.2.2.  PCI Express Endpoint Rules

...

PCI Express Endpoint must not depend on operating system
allocation of I/O resources claimed through BAR(s).

The real meaning here is *not* that an Endpoint should not
have an I/O BAR. An Endpoint can have an I/O BAR,
and PCI Express spec supports I/O transactions.
The meaning is that an Endpoint should work
even if *the OS* decides to disable the I/O BAR.
Note: it's up to the guest. We support I/O
as a device with full compatibility for old guests,
but must be prepared to handle a guest which does not
enable I/O.

This likely means that as time goes on,
future OSes will start disabling I/O BARs,
relying on this not hurting functionality.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:55:55AM -0500, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
> >> "Michael S. Tsirkin"  writes:
> >> > I'm guessing any compiler that decides to waste memory in this way
> >> > will quickly get dropped by users and then we won't worry
> >> > about building QEMU with it.
> >> 
> >> There are other responses in the thread here and I don't really care to
> >> bikeshed on this issue.
> >
> > Great. Let's make the bikeshed blue then?
> 
> It's fun to argue about stuff like this and I certainly have an opinion,
> but I honestly don't care all that much about the offsetof thing.
> However...
> 
> 
> >
> >> >> Well, given that virtio is widely deployed today, I would think the 1.0
> >> >> standard should strictly reflect what's deployed today, no?
> >> >> Any new config layout would be 2.0 material, right?
> >> >
> >> > Not as it's currently planned. Devices can choose
> >> > to support a legacy layout in addition to the new one,
> >> > and if you look at the patch you will see that that
> >> > is exactly what it does.
> >> 
> >> Adding a new BAR most certainly requires bumping the revision ID or
> >> changing the device ID, no?
> >
> > No, why would it?
> 
> If we change the programming interface for a device in a way that is
> incompatible, we are required to change the revision ID and/or device
> ID.

It's compatible.


> > If a device dropped BAR0, that would be a good reason
> > to bump revision ID.
> > We don't do this yet.
> 
> But we have to drop BAR0 to put it behind a PCI express bus, right?

No, no.

The PCIe spec states that failure to allocate an I/O BAR should still
allow the device to function.

That's *guest* allocating an I/O BAR.


> If that's the case, then device that's exposed on the PCI express bus
> must use a different device ID and/or revision ID.
> 
> That means a new driver is needed in the guest.
> 
> >> Didn't we run into this problem with the virtio-win drivers with just
> >> the BAR size changing? 
> >
> > Because they had a bug: they validated BAR0 size. AFAIK they don't care
> > what happens with other bars.
> 
> I think there's a grey area with respect to the assumptions a device can
> make about the programming interface.
> 
> But very concretely, we cannot expose virtio-pci-net via PCI express
> with BAR0 disabled because that will result in existing virtio-pci Linux
> drivers breaking.
> 
> > Not we. The BIOS can disable IO BAR: it can do this already
> > but the device won't be functional.
> 
> But the only way to expose the device over PCI express is to disable the
> IO BAR, right?
> 
> Regards,
> 
> Anthony Liguori

No :)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin"  writes:
>> > I'm guessing any compiler that decides to waste memory in this way
>> > will quickly get dropped by users and then we won't worry
>> > about building QEMU with it.
>> 
>> There are other responses in the thread here and I don't really care to
>> bikeshed on this issue.
>
> Great. Let's make the bikeshed blue then?

It's fun to argue about stuff like this and I certainly have an opinion,
but I honestly don't care all that much about the offsetof thing.
However...


>
>> >> Well, given that virtio is widely deployed today, I would think the 1.0
>> >> standard should strictly reflect what's deployed today, no?
>> >> Any new config layout would be 2.0 material, right?
>> >
>> > Not as it's currently planned. Devices can choose
>> > to support a legacy layout in addition to the new one,
>> > and if you look at the patch you will see that that
>> > is exactly what it does.
>> 
>> Adding a new BAR most certainly requires bumping the revision ID or
>> changing the device ID, no?
>
> No, why would it?

If we change the programming interface for a device in a way that is
incompatible, we are required to change the revision ID and/or device
ID.

> If a device dropped BAR0, that would be a good reason
> to bump revision ID.
> We don't do this yet.

But we have to drop BAR0 to put it behind a PCI express bus, right?

If that's the case, then device that's exposed on the PCI express bus
must use a different device ID and/or revision ID.

That means a new driver is needed in the guest.

>> Didn't we run into this problem with the virtio-win drivers with just
>> the BAR size changing? 
>
> Because they had a bug: they validated BAR0 size. AFAIK they don't care
> what happens with other bars.

I think there's a grey area with respect to the assumptions a device can
make about the programming interface.

But very concretely, we cannot expose virtio-pci-net via PCI express
with BAR0 disabled because that will result in existing virtio-pci Linux
drivers breaking.

> Not we. The BIOS can disable IO BAR: it can do this already
> but the device won't be functional.

But the only way to expose the device over PCI express is to disable the
IO BAR, right?

Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 04:32:31PM +0200, Paolo Bonzini wrote:
> Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto:
> >> > There are other responses in the thread here and I don't really care to
> >> > bikeshed on this issue.
> > Great. Let's make the bikeshed blue then?
> 
> Yes, let's make it blue, but please promise to check into Peter's
> register API so we can remove the case offsetof.

I think it's overkill here.  virtio was designed to be easy to implement
with a simple switch statement.

> I checked that it works on RHEL5, which is 4.1 and probably the oldest
> compiler we care about (any other 4.1 lacks the __sync builtins;
> upstream added them in 4.2).
> 
> Paolo

Are you sure? Documentation says 4.0 ...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 16:30, Michael S. Tsirkin ha scritto:
>> > There are other responses in the thread here and I don't really care to
>> > bikeshed on this issue.
> Great. Let's make the bikeshed blue then?

Yes, let's make it blue, but please promise to check into Peter's
register API so we can remove the case offsetof.

I checked that it works on RHEL5, which is 4.1 and probably the oldest
compiler we care about (any other 4.1 lacks the __sync builtins;
upstream added them in 4.2).

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:16:39AM -0500, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
> >> 1) C makes no guarantees about structure layout beyond the first
> >>member.  Yes, if it's naturally aligned or has a packed attribute,
> >>GCC does the right thing.  But this isn't kernel land anymore,
> >>portability matters and there are more compilers than GCC.
> >
> > You expect a compiler to pad this structure:
> >
> > struct foo {
> > uint8_t a;
> > uint8_t b;
> > uint16_t c;
> > uint32_t d;
> > };
> >
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> There are other responses in the thread here and I don't really care to
> bikeshed on this issue.

Great. Let's make the bikeshed blue then?

> >> Well, given that virtio is widely deployed today, I would think the 1.0
> >> standard should strictly reflect what's deployed today, no?
> >> Any new config layout would be 2.0 material, right?
> >
> > Not as it's currently planned. Devices can choose
> > to support a legacy layout in addition to the new one,
> > and if you look at the patch you will see that that
> > is exactly what it does.
> 
> Adding a new BAR most certainly requires bumping the revision ID or
> changing the device ID, no?

No, why would it?

If a device dropped BAR0, that would be a good reason
to bump revision ID.
We don't do this yet.

> Didn't we run into this problem with the virtio-win drivers with just
> the BAR size changing? 

Because they had a bug: they validated BAR0 size. AFAIK they don't care
what happens with other bars.

> >> Re: the new config layout, I don't think we would want to use it for
> >> anything but new devices.  Forcing a guest driver change
> >
> > There's no forcing.
> > If you look at the patches closely, you will see that
> > we still support the old layout on BAR0.
> >
> >
> >> is a really big
> >> deal and I see no reason to do that unless there's a compelling reason
> >> to.
> >
> > There are many a compelling reasons, and they are well known
> > limitations of virtio PCI:
> >
> > - PCI spec compliance (madates device operation with IO memory
> > disabled).
> 
> PCI express spec.  We are fully compliant with the PCI spec.  And what's
> the user visible advantage of pointing an emulated virtio device behind
> a PCI-e bus verses a legacy PCI bus?

Native hotplug support.

> This is a very good example because if we have to disable BAR0, then
> it's an ABI breaker plan and simple.

Not we. The BIOS can disable IO BAR: it can do this already
but the device won't be functional.

> > - support 64 bit addressing
> 
> We currently support 44-bit addressing for the ring.  While I agree we
> need to bump it, there's no immediate problem with 44-bit addressing.

I heard developers (though not users) complaining.

> > - add more than 32 feature bits.
> > - individually disable queues.
> > - sanely support cross-endian systems.
> > - support very small (<1 PAGE) for virtio rings.
> > - support a separate page for each vq kick.
> > - make each device place config at flexible offset.
> 
> None of these things are holding us back today.

All of them do, to bigger or lesser degree.


> I'm not saying we shouldn't introduce a new device.  But adoption of
> that device will be slow and realistically will be limited to new
> devices only.
> 
> We'll be supporting both devices for a very, very long time.

This is true for any new feature. What are you trying to say here?
We won't add new features to old config: for once, we have
run out of feature bits.

> Compatibility is the fundamental value that we provide.  We need to go
> out of our way to make sure that existing guests work and work as well
> as possible.

What are you trying to say?
There's nothing here that breaks compatibility.
Have you looked at the patch?

I'm wasting my time arguing on the mailing list, but
once I tear myself away from this occupation,
I intend to verify that I can run an old guest on
qemu with this patch without issues.

> Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
> a compelling reason to break existing guests.
> 
> Regards,
> 
> Anthony Liguori

That's why my patch does not break existing guests.


> 
> > Addressing any one of these would cause us to add a substantially new
> > way to operate virtio devices.
> >
> > And since it's a guest change anyway, it seemed like a
> > good time to do the new layout and fix everything in one go.
> >
> > And they are needed like yesterday.
> >
> >
> >> So we're stuck with the 1.0 config layout for a very long time.
> >> 
> >> Regards,
> >> 
> >> Anthony Liguori
> >
> > Absolutely. This patch let us support both which will allow for
> > a gradual transition over the next 10 years or so.
> >
> >> > reason.  I suggest that's 2.0 material...
> >> >
> >> > Cheer

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Paolo Bonzini  writes:

> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
>> You expect a compiler to pad this structure:
>> 
>> struct foo {
>>  uint8_t a;
>>  uint8_t b;
>>  uint16_t c;
>>  uint32_t d;
>> };
>> 
>> I'm guessing any compiler that decides to waste memory in this way
>> will quickly get dropped by users and then we won't worry
>> about building QEMU with it.
>
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.

Not that these structures are actually used for something.

We store the config in these structures so they are actually used for
something.

The proposed structures only serve as a way to express offsets.  You
would never actually have a variable of this type.

Regards,

Anthony Liguori

>
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
>
> Paolo
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
>> 1) C makes no guarantees about structure layout beyond the first
>>member.  Yes, if it's naturally aligned or has a packed attribute,
>>GCC does the right thing.  But this isn't kernel land anymore,
>>portability matters and there are more compilers than GCC.
>
> You expect a compiler to pad this structure:
>
> struct foo {
>   uint8_t a;
>   uint8_t b;
>   uint16_t c;
>   uint32_t d;
> };
>
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.

There are other responses in the thread here and I don't really care to
bikeshed on this issue.

>> Well, given that virtio is widely deployed today, I would think the 1.0
>> standard should strictly reflect what's deployed today, no?
>> Any new config layout would be 2.0 material, right?
>
> Not as it's currently planned. Devices can choose
> to support a legacy layout in addition to the new one,
> and if you look at the patch you will see that that
> is exactly what it does.

Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 

>> Re: the new config layout, I don't think we would want to use it for
>> anything but new devices.  Forcing a guest driver change
>
> There's no forcing.
> If you look at the patches closely, you will see that
> we still support the old layout on BAR0.
>
>
>> is a really big
>> deal and I see no reason to do that unless there's a compelling reason
>> to.
>
> There are many a compelling reasons, and they are well known
> limitations of virtio PCI:
>
> - PCI spec compliance (madates device operation with IO memory
> disabled).

PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.

> - support 64 bit addressing

We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.

> - add more than 32 feature bits.
> - individually disable queues.
> - sanely support cross-endian systems.
> - support very small (<1 PAGE) for virtio rings.
> - support a separate page for each vq kick.
> - make each device place config at flexible offset.

None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori


> Addressing any one of these would cause us to add a substantially new
> way to operate virtio devices.
>
> And since it's a guest change anyway, it seemed like a
> good time to do the new layout and fix everything in one go.
>
> And they are needed like yesterday.
>
>
>> So we're stuck with the 1.0 config layout for a very long time.
>> 
>> Regards,
>> 
>> Anthony Liguori
>
> Absolutely. This patch let us support both which will allow for
> a gradual transition over the next 10 years or so.
>
>> > reason.  I suggest that's 2.0 material...
>> >
>> > Cheers,
>> > Rusty.
>> >
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majord...@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 03:41:09PM +0200, Paolo Bonzini wrote:
> Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
> > You expect a compiler to pad this structure:
> > 
> > struct foo {
> > uint8_t a;
> > uint8_t b;
> > uint16_t c;
> > uint32_t d;
> > };
> > 
> > I'm guessing any compiler that decides to waste memory in this way
> > will quickly get dropped by users and then we won't worry
> > about building QEMU with it.
> 
> You know the virtio-pci config structures are padded, but not all of
> them are.  For example, virtio_balloon_stat is not padded and indeed has
> an __attribute__((__packed__)) in the spec.
> 
> For this reason I prefer to have the attribute everywhere.  So people
> don't have to wonder why it's here and not there.
> 
> Paolo

FWIW I think it was a mistake to lay the balloon structure out like that.
We should have padded it manually.
__attribute__((__packed__)) is really easy to misuse.
If you get into such a situation, just use offset enums ...

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 15:24, Michael S. Tsirkin ha scritto:
> You expect a compiler to pad this structure:
> 
> struct foo {
>   uint8_t a;
>   uint8_t b;
>   uint16_t c;
>   uint32_t d;
> };
> 
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.

You know the virtio-pci config structures are padded, but not all of
them are.  For example, virtio_balloon_stat is not padded and indeed has
an __attribute__((__packed__)) in the spec.

For this reason I prefer to have the attribute everywhere.  So people
don't have to wonder why it's here and not there.

Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 14:24, Michael S. Tsirkin  wrote:
> On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
>> 1) C makes no guarantees about structure layout beyond the first
>>member.  Yes, if it's naturally aligned or has a packed attribute,
>>GCC does the right thing.  But this isn't kernel land anymore,
>>portability matters and there are more compilers than GCC.
>
> You expect a compiler to pad this structure:
>
> struct foo {
> uint8_t a;
> uint8_t b;
> uint16_t c;
> uint32_t d;
> };
>
> I'm guessing any compiler that decides to waste memory in this way
> will quickly get dropped by users and then we won't worry
> about building QEMU with it.

Structure alignment is a platform ABI choice. Oddly enough people
choose operating systems on more criteria than the amount of
padding their ABI mandates in structures.

In any case it's really tedious to read a structure and wade
through working out whether it's going to have all its members
naturally aligned, especially once it's more than a handful
of members in size. And getting it wrong is "silent portability
problem".

thanks
-- PMM
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
> Rusty Russell  writes:
> 
> > Anthony Liguori  writes:
> >> "Michael S. Tsirkin"  writes:
> >>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> >>> +return proxy->device_feature_select;
> >>
> >> Oh dear no...  Please use defines like the rest of QEMU.
> >
> > It is pretty ugly.
> 
> I think beauty is in the eye of the beholder here...
> 
> Pretty much every device we have has a switch statement like this.
> Consistency wins when it comes to qualitative arguments like this.
> 
> > Yet the structure definitions are descriptive, capturing layout, size
> > and endianness in natural a format readable by any C programmer.
> 
> >From an API design point of view, here are the problems I see:
> 
> 1) C makes no guarantees about structure layout beyond the first
>member.  Yes, if it's naturally aligned or has a packed attribute,
>GCC does the right thing.  But this isn't kernel land anymore,
>portability matters and there are more compilers than GCC.

You expect a compiler to pad this structure:

struct foo {
uint8_t a;
uint8_t b;
uint16_t c;
uint32_t d;
};

I'm guessing any compiler that decides to waste memory in this way
will quickly get dropped by users and then we won't worry
about building QEMU with it.

> 2) If we every introduce anything like latching, this doesn't work out
>so well anymore because it's hard to express in a single C structure
>the register layout at that point.  Perhaps a union could be used but
>padding may make it a bit challenging.

Then linux won't use it either.

> 3) It suspect it's harder to review because a subtle change could more
>easily have broad impact.  If someone changed the type of a field
>from u32 to u16, it changes the offset of every other field.  That's
>not terribly obvious in the patch itself unless you understand how
>the structure is used elsewhere.
> 
>This may not be a problem for virtio because we all understand that
>the structures are part of an ABI, but if we used this pattern more
>in QEMU, it would be a lot less obvious.

So let's not use it more in QEMU.

> > So AFAICT the question is, do we put the required
> >
> > #define VIRTIO_PCI_CFG_FEATURE_SEL \
> >  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
> >
> > etc. in the kernel headers or qemu?
> 
> I'm pretty sure we would end up just having our own integer defines.  We
> carry our own virtio headers today because we can't easily import the
> kernel headers.

Yes we can easily import them.
And at least we copy headers verbatim.


> >> Haven't looked at the proposed new ring layout yet.
> >
> > No change, but there's an open question on whether we should nail it to
> > little endian (or define the endian by the transport).
> >
> > Of course, I can't rule out that the 1.0 standard *may* decide to frob
> > the ring layout somehow,
> 
> Well, given that virtio is widely deployed today, I would think the 1.0
> standard should strictly reflect what's deployed today, no?
> Any new config layout would be 2.0 material, right?

Not as it's currently planned. Devices can choose
to support a legacy layout in addition to the new one,
and if you look at the patch you will see that that
is exactly what it does.

> Re: the new config layout, I don't think we would want to use it for
> anything but new devices.  Forcing a guest driver change

There's no forcing.
If you look at the patches closely, you will see that
we still support the old layout on BAR0.


> is a really big
> deal and I see no reason to do that unless there's a compelling reason
> to.

There are many a compelling reasons, and they are well known
limitations of virtio PCI:

- PCI spec compliance (madates device operation with IO memory disabled).
- support 64 bit addressing
- add more than 32 feature bits.
- individually disable queues.
- sanely support cross-endian systems.
- support very small (<1 PAGE) for virtio rings.
- support a separate page for each vq kick.
- make each device place config at flexible offset.

Addressing any one of these would cause us to add a substantially new
way to operate virtio devices.

And since it's a guest change anyway, it seemed like a
good time to do the new layout and fix everything in one go.

And they are needed like yesterday.


> So we're stuck with the 1.0 config layout for a very long time.
> 
> Regards,
> 
> Anthony Liguori

Absolutely. This patch let us support both which will allow for
a gradual transition over the next 10 years or so.

> > reason.  I suggest that's 2.0 material...
> >
> > Cheers,
> > Rusty.
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori
Rusty Russell  writes:

> Anthony Liguori  writes:
>> "Michael S. Tsirkin"  writes:
>>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>>> +return proxy->device_feature_select;
>>
>> Oh dear no...  Please use defines like the rest of QEMU.
>
> It is pretty ugly.

I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.

> Yet the structure definitions are descriptive, capturing layout, size
> and endianness in natural a format readable by any C programmer.

>From an API design point of view, here are the problems I see:

1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.

> So AFAICT the question is, do we put the required
>
> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
>
> etc. in the kernel headers or qemu?

I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.

>> Haven't looked at the proposed new ring layout yet.
>
> No change, but there's an open question on whether we should nail it to
> little endian (or define the endian by the transport).
>
> Of course, I can't rule out that the 1.0 standard *may* decide to frob
> the ring layout somehow,

Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori

> but I'd think it would require a compelling
> reason.  I suggest that's 2.0 material...
>
> Cheers,
> Rusty.
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 02:28:32PM +0200, Paolo Bonzini wrote:
> Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto:
>  > >> If you really want to use offsetof like this you're
>  > >> going to need to decorate the structs with QEMU_PACKED.
> >> > 
> >>> > > Nope.
> >>> > > These structs are carefully designed not to have any padding.
> >> > 
> >> > ...on every compiler and OS combination that QEMU builds for?
> > Yes. All the way back to EGCS and before.
> > GCC has been like this for many many years.
> 
> I would still prefer to have QEMU_PACKED (or __attribute((__packed__))
> in the kernel).
> 
> >>> > > And if there was a bug and there was some padding, we still
> >>> > > can't fix it with PACKED because this structure
> >>> > > is used to interact with the guest code which does not
> >>> > > have the packed attribute.
> >> > 
> >> > The guest code has to use a set of structure offsets and
> >> > sizes which is fixed by the virtio ABI -- how it implements
> >> > this is up to the guest (and if it misimplements it that is
> >> > a guest bug and not our problem).
> 
> On the other hand, encouraging both userspace and guest to reuse a
> single set of headers means that the bad offset becomes a spec bug more
> than a guest bug.

Heh

> > Deviating from driver in random ways is an endless source
> > of hard to debug issues, and it's a very practical
> > consideration because virtio spec is constantly
> > being extended (unlike hardware which is mostly fixed).
> 
> Agreed---but the driver should use __attribute__((__packed__)) too.
> 
> Paolo

For these specific structures I don't mind, we never dereference them
anyway. If you do dereference a structure, using packed is generally
a mistake. In particular because GCC generates code that is
much slower.

You can also get nasty surprises e.g.

struct foo {
char a;
int b;
} __attribute__((packed));

struct foo foo;
int *a = &foo.a;

Last line compiles fine but isn't legal C.

So I think there are two cases with packed:
1. it does not change logical behaviour but results in bad code
   generated
2. it does change logical behaviour and leads to unsafe code

There aren't many good reasons to use packed: if you
must treat unaligned data, best to use constants.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 14:16, Michael S. Tsirkin ha scritto:
 > >> If you really want to use offsetof like this you're
 > >> going to need to decorate the structs with QEMU_PACKED.
>> > 
>>> > > Nope.
>>> > > These structs are carefully designed not to have any padding.
>> > 
>> > ...on every compiler and OS combination that QEMU builds for?
> Yes. All the way back to EGCS and before.
> GCC has been like this for many many years.

I would still prefer to have QEMU_PACKED (or __attribute((__packed__))
in the kernel).

>>> > > And if there was a bug and there was some padding, we still
>>> > > can't fix it with PACKED because this structure
>>> > > is used to interact with the guest code which does not
>>> > > have the packed attribute.
>> > 
>> > The guest code has to use a set of structure offsets and
>> > sizes which is fixed by the virtio ABI -- how it implements
>> > this is up to the guest (and if it misimplements it that is
>> > a guest bug and not our problem).

On the other hand, encouraging both userspace and guest to reuse a
single set of headers means that the bad offset becomes a spec bug more
than a guest bug.

> Deviating from driver in random ways is an endless source
> of hard to debug issues, and it's a very practical
> consideration because virtio spec is constantly
> being extended (unlike hardware which is mostly fixed).

Agreed---but the driver should use __attribute__((__packed__)) too.

Paolo

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 11:53:17AM +0100, Peter Maydell wrote:
> On 29 May 2013 11:08, Michael S. Tsirkin  wrote:
> > On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
> >> Asserting is definitely the wrong thing here, since the
> >> guest can trigger it.
> >
> > So?
> 
> So "guest should not be able to crash QEMU" is a standard
> rule: assert() is for QEMU bugs, not guest bugs.

It's quite likely a QEMU bug -
guests normally do fixed-size accesses so it's hard
for them to access a field with a wrong size.

It is guest triggerable but this does not
contradict the fact that it never happens in practice.

> Virtio isn't any different to any other device model.
> 
> > It's a driver bug. It can reset or crash guest with the same effect,
> > and it likely will if we let it continue.
> 
> Letting a misbehaving guest crash itself is fine. Killing
> QEMU isn't.

*Why* isn't it?
It has the advantage of making sure the misbehaving system
is stopped before it does any damage.
Why keep QEMU running even though we know
there's a memory corruption somewhere?

> > assert makes sure we don't let it escalate into some
> > hard to debug security problem.
> 
> If you want to make guest bugs easier to spot and debug
> this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for.

We really want something that would also stop guest
and stop device model as well - we don't know where the bug is.

> >> If you really want to use offsetof like this you're
> >> going to need to decorate the structs with QEMU_PACKED.
> 
> > Nope.
> > These structs are carefully designed not to have any padding.
> 
> ...on every compiler and OS combination that QEMU builds for?

Yes. All the way back to EGCS and before.
GCC has been like this for many many years.

> > And if there was a bug and there was some padding, we still
> > can't fix it with PACKED because this structure
> > is used to interact with the guest code which does not
> > have the packed attribute.
> 
> The guest code has to use a set of structure offsets and
> sizes which is fixed by the virtio ABI -- how it implements
> this is up to the guest (and if it misimplements it that is
> a guest bug and not our problem). Note also that the ABI
> is not defined by a set of C source struct definitions
> (except in as far as they are accompanied by a set of
> restrictions on alignment, padding, etc that completely
> determine the numerical alignments and offsets).

In practical terms, we should all talk and agree on what's
best for driver *and* QEMU, not have QEMU just
ignore driver and do it's own thing.

In practical terms, virtio in QEMU should use
exactly same code to define layout as virtio in guest.
Preferably same header file, we'll get there too,
once we comple the discussion over the bikeshed color.

Deviating from driver in random ways is an endless source
of hard to debug issues, and it's a very practical
consideration because virtio spec is constantly
being extended (unlike hardware which is mostly fixed).


> How QEMU implements the set of offsets and sizes specified
> by the ABI is definitely our problem, and is exactly what
> this discussion is about. The simplest and safest way to
> get the offsets right on all platforms is just to use a set
> of #defines, which is why that's what almost all of our
> device models do.

Maybe you don't feel safe when you see offsetof.
I review a struct and see fields are aligned
properly at a glance and I feel safe.

But I don't feel safe when I see headers in guest and qemu have
different code.

> Where we do define a struct matching
> guest memory layout, we tag it with QEMU_PACKED as our
> best means of ensuring consistency on all hosts.
> 
> thanks
> -- PMM

Further, packed structures produce terrible
code in GCC, in all versions that QEMU cares about :).

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 11:08, Michael S. Tsirkin  wrote:
> On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
>> Asserting is definitely the wrong thing here, since the
>> guest can trigger it.
>
> So?

So "guest should not be able to crash QEMU" is a standard
rule: assert() is for QEMU bugs, not guest bugs.
Virtio isn't any different to any other device model.

> It's a driver bug. It can reset or crash guest with the same effect,
> and it likely will if we let it continue.

Letting a misbehaving guest crash itself is fine. Killing
QEMU isn't.

> assert makes sure we don't let it escalate into some
> hard to debug security problem.

If you want to make guest bugs easier to spot and debug
this is what qemu_log_mask(LOG_GUEST_ERROR,...) is for.

>> If you really want to use offsetof like this you're
>> going to need to decorate the structs with QEMU_PACKED.

> Nope.
> These structs are carefully designed not to have any padding.

...on every compiler and OS combination that QEMU builds for?

> And if there was a bug and there was some padding, we still
> can't fix it with PACKED because this structure
> is used to interact with the guest code which does not
> have the packed attribute.

The guest code has to use a set of structure offsets and
sizes which is fixed by the virtio ABI -- how it implements
this is up to the guest (and if it misimplements it that is
a guest bug and not our problem). Note also that the ABI
is not defined by a set of C source struct definitions
(except in as far as they are accompanied by a set of
restrictions on alignment, padding, etc that completely
determine the numerical alignments and offsets).
How QEMU implements the set of offsets and sizes specified
by the ABI is definitely our problem, and is exactly what
this discussion is about. The simplest and safest way to
get the offsets right on all platforms is just to use a set
of #defines, which is why that's what almost all of our
device models do. Where we do define a struct matching
guest memory layout, we tag it with QEMU_PACKED as our
best means of ensuring consistency on all hosts.

thanks
-- PMM
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 10:00:33AM +0100, Peter Maydell wrote:
> On 29 May 2013 09:24, Michael S. Tsirkin  wrote:
> > diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> > index f4db224..fd09ea7 100644
> > --- a/hw/virtio/virtio-pci.c
> > +++ b/hw/virtio/virtio-pci.c
> > @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
> > *opaque, hwaddr addr,
> >  {
> >  VirtIOPCIProxy *proxy = opaque;
> >  VirtIODevice *vdev = proxy->vdev;
> > +struct virtio_pci_common_cfg cfg;
> >
> >  uint64_t low = 0xull;
> >
> >  switch (addr) {
> >  case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> > +assert(size == sizeof cfg.device_feature_select);
> >  return proxy->device_feature_select;
> 
> Asserting is definitely the wrong thing here, since the
> guest can trigger it.

So?

It's a driver bug. It can reset or crash guest with the same effect,
and it likely will if we let it continue.
assert makes sure we don't let it escalate into some
hard to debug security problem.


> If you really want to use offsetof like this you're
> going to need to decorate the structs with QEMU_PACKED.
> 
> thanks
> -- PMM

Nope.
These structs are carefully designed not to have any padding.

And if there was a bug and there was some padding, we still
can't fix it with PACKED because this structure
is used to interact with the guest code which does not
have the packed attribute.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Laszlo Ersek
On 05/29/13 09:27, Paolo Bonzini wrote:
> Il 29/05/2013 06:33, Rusty Russell ha scritto:
>> Paolo Bonzini  writes:
>>> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, 
 device_feature_select):
 +return proxy->device_feature_select;
>>
>> Oh dear no...  Please use defines like the rest of QEMU.
 Any good reason not to use offsetof?
>>>
>>> I'm not sure it's portable to use it in case labels.  IIRC, the
>>> definition ((int)&(((T *)0)->field)) is not a valid C integer constant
>>> expression.  Laszlo?
>>
>> It's defined to yield an integer constant expression in the ISO standard
>> (and I think ANSI too, though that's not at hand):
> 
> It's not in C89.

It is, see 7.1.6 Common definitions .


> The oldest compiler QEMU cares about is GCC 4.2.  I
> don't know if it has a builtin offsetof, probably it does.

Talking about nothing else but the ISO C standard(s), it doesn't matter
how the C implementation deals with offsetof() internally. "C
implementation" in this sense includes both compiler and standard library.

If the standard library header ( or something internal
included by it) expands the offsetof() macro to replacement text that
does pointer black magic, magic that would be otherwise nonportable or
undefined, it is alright as long as the accompanying compiler guarantees
that the replacement text will work.

That is, offsetof() gives the C implementor leeway to implement it in
wherever "distribution" between compiler and standard library; the main
thing is that the C program use the one offsetof() provided by the C
implementation.

Of course in the FLOSS world OS distros might mix and match gcc and
glibc statistically randomly; normally some documentation should enable
the user to put his/her system into ISO C or even SUSv[1234] mode.

(

An interesting example for, say, "non-unified implementation" is
"getconf" vs. "c99". I'll pick SUSv3 for this example.

http://pubs.opengroup.org/onlinepubs/95399/utilities/getconf.html
http://pubs.opengroup.org/onlinepubs/95399/utilities/c99.html

In a nutshell one can interrogate getconf for CFLAGS, LDFLAGS and LIBS
so that c99 builds a program in ILP32_OFF32 / ILP32_OFFBIG / LP64_OFF64
/ LPBIG_OFFBIG mode ("programming environment").

On my x86_64 RHEL-6.4 laptop, "getconf" is part of glibc
(glibc-common-2.12-1.107.el6.x86_64), while c99 is part of gcc
(gcc-4.4.7-3.el6.x86_64).

Supposing I'd like to build a 32-bit program with 64-bit off_t:

  getconf _POSIX_V6_ILP32_OFFBIG
  1

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_CFLAGS
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LDFLAGS
  -m32

  getconf -v POSIX_V6_ILP32_OFFBIG POSIX_V6_ILP32_OFFBIG_LIBS
  [empty string]

However if I try to actually build a program using these flags:

  #define _XOPEN_SOURCE 600
  int main(void) { return 0; }

  c99 \
  -m32 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 \
  -o mytest \
  -m32 \
  mytest.c

I get

  /usr/bin/ld: crt1.o: No such file: No such file or directory
  collect2: ld returned 1 exit status

unless I install "glibc-devel.i686".

This problem is rooted in "getconf" (a glibc utility) being unaware of
RHEL packaging choices: if the system can't guarantee the final c99
invocation to succeed, then the very first "getconf" invocation should
write "-1\n" or "undefined\n" to stdout.

(I'm aware that RHEL-6 is not certified UNIX 03; this is just an example
for problems caused by various parts of a standard's
(quasi-)implementation coming from separate projects. In that sense,
caring about the offsetof() internals may have merit.)

)

Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Peter Maydell
On 29 May 2013 09:24, Michael S. Tsirkin  wrote:
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index f4db224..fd09ea7 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
> *opaque, hwaddr addr,
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = proxy->vdev;
> +struct virtio_pci_common_cfg cfg;
>
>  uint64_t low = 0xull;
>
>  switch (addr) {
>  case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +assert(size == sizeof cfg.device_feature_select);
>  return proxy->device_feature_select;

Asserting is definitely the wrong thing here, since the
guest can trigger it.

If you really want to use offsetof like this you're
going to need to decorate the structs with QEMU_PACKED.

thanks
-- PMM
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 10:24, Michael S. Tsirkin ha scritto:
> On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:
>> Anthony Liguori  writes:
>>> "Michael S. Tsirkin"  writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy->device_feature_select;
>>>
>>> Oh dear no...  Please use defines like the rest of QEMU.
>>
>> It is pretty ugly.
>>
>> Yet the structure definitions are descriptive, capturing layout, size
>> and endianness in natural a format readable by any C programmer.
>>
>> So AFAICT the question is, do we put the required
>>
>> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>>  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
>>
>> etc. in the kernel headers or qemu?
> 
> 
> I forgot that we need to validate size (different fields
> have different sizes so memory core does not validate
> for us).
> 
> And that is much cleaner if we use offsetof directly:
> You can see that you are checking the correct
> size matching the offset, at a glance.

I wonder if we can use and possibly extend Peter Crosthwaite's "register
API" to support this better.

Paolo

> --->
> 
> virtio: new layout: add offset validation
> 
> Signed-off-by: Michael S. Tsirkin 
> 
> ---
> 
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index f4db224..fd09ea7 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
> *opaque, hwaddr addr,
>  {
>  VirtIOPCIProxy *proxy = opaque;
>  VirtIODevice *vdev = proxy->vdev;
> +struct virtio_pci_common_cfg cfg;
>  
>  uint64_t low = 0xull;
>  
>  switch (addr) {
>  case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +assert(size == sizeof cfg.device_feature_select);
>  return proxy->device_feature_select;
>  case offsetof(struct virtio_pci_common_cfg, device_feature):
> +assert(size == sizeof cfg.device_feature);
>  /* TODO: 64-bit features */
>   return proxy->device_feature_select ? 0 : proxy->host_features;
>  case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
> +assert(size == sizeof cfg.guest_feature_select);
>  return proxy->guest_feature_select;
>  case offsetof(struct virtio_pci_common_cfg, guest_feature):
> +assert(size == sizeof cfg.guest_feature);
>  /* TODO: 64-bit features */
>   return proxy->guest_feature_select ? 0 : vdev->guest_features;
>  case offsetof(struct virtio_pci_common_cfg, msix_config):
> +assert(size == sizeof cfg.msix_config);
>   return vdev->config_vector;
>  case offsetof(struct virtio_pci_common_cfg, num_queues):
> +assert(size == sizeof cfg.num_queues);
>  /* TODO: more exact limit? */
>   return VIRTIO_PCI_QUEUE_MAX;
>  case offsetof(struct virtio_pci_common_cfg, device_status):
> +assert(size == sizeof cfg.device_status);
>  return vdev->status;
>  
>   /* About a specific virtqueue. */
>  case offsetof(struct virtio_pci_common_cfg, queue_select):
> +assert(size == sizeof cfg.queue_select);
>  return  vdev->queue_sel;
>  case offsetof(struct virtio_pci_common_cfg, queue_size):
> +assert(size == sizeof cfg.queue_size);
>  return virtio_queue_get_num(vdev, vdev->queue_sel);
>  case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
> +assert(size == sizeof cfg.queue_msix_vector);
>   return virtio_queue_vector(vdev, vdev->queue_sel);
>  case offsetof(struct virtio_pci_common_cfg, queue_enable):
> +assert(size == sizeof cfg.queue_enable);
>  /* TODO */
>   return 0;
>  case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
> +assert(size == sizeof cfg.queue_notify_off);
>  return vdev->queue_sel;
>  case offsetof(struct virtio_pci_common_cfg, queue_desc):
> +assert(size == 4);
>  return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low;
>  case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
> +assert(size == 4);
>  return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32;
>  case offsetof(struct virtio_pci_common_cfg, queue_avail):
> +assert(size == 4);
>  return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low;
>  case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
> +assert(size == 4);
>  return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32;
>  case offsetof(struct virtio_pci_common_cfg, queue_used):
> +assert(size == 4);
>  return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low;
>  case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
> +assert(size == 4);
>  return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32;
>  default:
>  return 0;
> @@ -523,76 +542,

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 02:01:18PM +0930, Rusty Russell wrote:
> Anthony Liguori  writes:
> > "Michael S. Tsirkin"  writes:
> >> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> >> +return proxy->device_feature_select;
> >
> > Oh dear no...  Please use defines like the rest of QEMU.
> 
> It is pretty ugly.
> 
> Yet the structure definitions are descriptive, capturing layout, size
> and endianness in natural a format readable by any C programmer.
> 
> So AFAICT the question is, do we put the required
> 
> #define VIRTIO_PCI_CFG_FEATURE_SEL \
>  (offsetof(struct virtio_pci_common_cfg, device_feature_select))
> 
> etc. in the kernel headers or qemu?


I forgot that we need to validate size (different fields
have different sizes so memory core does not validate
for us).

And that is much cleaner if we use offsetof directly:
You can see that you are checking the correct
size matching the offset, at a glance.

--->

virtio: new layout: add offset validation

Signed-off-by: Michael S. Tsirkin 

---

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index f4db224..fd09ea7 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -467,51 +467,70 @@ static uint64_t virtio_pci_config_common_read(void 
*opaque, hwaddr addr,
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = proxy->vdev;
+struct virtio_pci_common_cfg cfg;
 
 uint64_t low = 0xull;
 
 switch (addr) {
 case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+assert(size == sizeof cfg.device_feature_select);
 return proxy->device_feature_select;
 case offsetof(struct virtio_pci_common_cfg, device_feature):
+assert(size == sizeof cfg.device_feature);
 /* TODO: 64-bit features */
return proxy->device_feature_select ? 0 : proxy->host_features;
 case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+assert(size == sizeof cfg.guest_feature_select);
 return proxy->guest_feature_select;
 case offsetof(struct virtio_pci_common_cfg, guest_feature):
+assert(size == sizeof cfg.guest_feature);
 /* TODO: 64-bit features */
return proxy->guest_feature_select ? 0 : vdev->guest_features;
 case offsetof(struct virtio_pci_common_cfg, msix_config):
+assert(size == sizeof cfg.msix_config);
return vdev->config_vector;
 case offsetof(struct virtio_pci_common_cfg, num_queues):
+assert(size == sizeof cfg.num_queues);
 /* TODO: more exact limit? */
return VIRTIO_PCI_QUEUE_MAX;
 case offsetof(struct virtio_pci_common_cfg, device_status):
+assert(size == sizeof cfg.device_status);
 return vdev->status;
 
/* About a specific virtqueue. */
 case offsetof(struct virtio_pci_common_cfg, queue_select):
+assert(size == sizeof cfg.queue_select);
 return  vdev->queue_sel;
 case offsetof(struct virtio_pci_common_cfg, queue_size):
+assert(size == sizeof cfg.queue_size);
 return virtio_queue_get_num(vdev, vdev->queue_sel);
 case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+assert(size == sizeof cfg.queue_msix_vector);
return virtio_queue_vector(vdev, vdev->queue_sel);
 case offsetof(struct virtio_pci_common_cfg, queue_enable):
+assert(size == sizeof cfg.queue_enable);
 /* TODO */
return 0;
 case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+assert(size == sizeof cfg.queue_notify_off);
 return vdev->queue_sel;
 case offsetof(struct virtio_pci_common_cfg, queue_desc):
+assert(size == 4);
 return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low;
 case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+assert(size == 4);
 return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32;
 case offsetof(struct virtio_pci_common_cfg, queue_avail):
+assert(size == 4);
 return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low;
 case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+assert(size == 4);
 return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32;
 case offsetof(struct virtio_pci_common_cfg, queue_used):
+assert(size == 4);
 return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low;
 case offsetof(struct virtio_pci_common_cfg, queue_used) + 4:
+assert(size == 4);
 return virtio_queue_get_used_addr(vdev, vdev->queue_sel) >> 32;
 default:
 return 0;
@@ -523,76 +542,95 @@ static void virtio_pci_config_common_write(void *opaque, 
hwaddr addr,
 {
 VirtIOPCIProxy *proxy = opaque;
 VirtIODevice *vdev = proxy->vdev;
+struct virtio_pci_common_cfg cfg;
 
 uint64_t low = 0xull;
 uint64_t high = ~low;
 
 switch (addr) {
 case offsetof(struct virtio_pci_common_cfg, device_feature_

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Michael S. Tsirkin
On Wed, May 29, 2013 at 09:27:02AM +0200, Paolo Bonzini wrote:
> Il 29/05/2013 06:33, Rusty Russell ha scritto:
> > Paolo Bonzini  writes:
> >> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
> >>> +
> >>> +switch (addr) {
> >>> +case offsetof(struct virtio_pci_common_cfg, 
> >>> device_feature_select):
> >>> +return proxy->device_feature_select;
> >
> > Oh dear no...  Please use defines like the rest of QEMU.
> >>> Any good reason not to use offsetof?
> >>
> >> I'm not sure it's portable to use it in case labels.  IIRC, the
> >> definition ((int)&(((T *)0)->field)) is not a valid C integer constant
> >> expression.  Laszlo?
> > 
> > It's defined to yield an integer constant expression in the ISO standard
> > (and I think ANSI too, though that's not at hand):
> 
> It's not in C89.  The oldest compiler QEMU cares about is GCC 4.2.  I
> don't know if it has a builtin offsetof, probably it does.

Yes, I think __builtin_offsetof was added in 4.0:
http://gcc.gnu.org/onlinedocs/gcc-4.0.0/gcc/Offsetof.html


> But I'm not sure about other users of virtio headers.
> 
> Paolo
> > 
> > 7.19, para 3:
> > 
> > ...offsetof(type, member-designator)
> > which expands to an integer constant expression that has type
> > size_t, ...
> > 
> > The real question is whether compilers qemu cares about meet the
> > standard (there's some evidence that older compilers fail this).  If
> > not, we'll have to define them as raw offsets...

So I think the question is whether we care about GCC 3.  We used to when
mingw in debian stable was GCC 3, but not anymore ...

> > 
> > Cheers,
> > Rusty.
> > 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Paolo Bonzini
Il 29/05/2013 06:33, Rusty Russell ha scritto:
> Paolo Bonzini  writes:
>> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
>>> +
>>> +switch (addr) {
>>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>>> +return proxy->device_feature_select;
>
> Oh dear no...  Please use defines like the rest of QEMU.
>>> Any good reason not to use offsetof?
>>
>> I'm not sure it's portable to use it in case labels.  IIRC, the
>> definition ((int)&(((T *)0)->field)) is not a valid C integer constant
>> expression.  Laszlo?
> 
> It's defined to yield an integer constant expression in the ISO standard
> (and I think ANSI too, though that's not at hand):

It's not in C89.  The oldest compiler QEMU cares about is GCC 4.2.  I
don't know if it has a builtin offsetof, probably it does.  But I'm not
sure about other users of virtio headers.

Paolo

> 
> 7.19, para 3:
> 
> ...offsetof(type, member-designator)
> which expands to an integer constant expression that has type
> size_t, ...
> 
> The real question is whether compilers qemu cares about meet the
> standard (there's some evidence that older compilers fail this).  If
> not, we'll have to define them as raw offsets...
> 
> Cheers,
> Rusty.
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Rusty Russell
Anthony Liguori  writes:
> "Michael S. Tsirkin"  writes:
>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>> +return proxy->device_feature_select;
>
> Oh dear no...  Please use defines like the rest of QEMU.

It is pretty ugly.

Yet the structure definitions are descriptive, capturing layout, size
and endianness in natural a format readable by any C programmer.

So AFAICT the question is, do we put the required

#define VIRTIO_PCI_CFG_FEATURE_SEL \
 (offsetof(struct virtio_pci_common_cfg, device_feature_select))

etc. in the kernel headers or qemu?

> Haven't looked at the proposed new ring layout yet.

No change, but there's an open question on whether we should nail it to
little endian (or define the endian by the transport).

Of course, I can't rule out that the 1.0 standard *may* decide to frob
the ring layout somehow, but I'd think it would require a compelling
reason.  I suggest that's 2.0 material...

Cheers,
Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Rusty Russell
Paolo Bonzini  writes:
> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
 > > +
 > > +switch (addr) {
 > > +case offsetof(struct virtio_pci_common_cfg, 
 > > device_feature_select):
 > > +return proxy->device_feature_select;
>>> > 
>>> > Oh dear no...  Please use defines like the rest of QEMU.
>> Any good reason not to use offsetof?
>
> I'm not sure it's portable to use it in case labels.  IIRC, the
> definition ((int)&(((T *)0)->field)) is not a valid C integer constant
> expression.  Laszlo?

It's defined to yield an integer constant expression in the ISO standard
(and I think ANSI too, though that's not at hand):

7.19, para 3:

...offsetof(type, member-designator)
which expands to an integer constant expression that has type
size_t, ...

The real question is whether compilers qemu cares about meet the
standard (there's some evidence that older compilers fail this).  If
not, we'll have to define them as raw offsets...

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Laszlo Ersek
On 05/28/13 19:43, Paolo Bonzini wrote:
> Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
>> +
>> +switch (addr) {
>> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>> +return proxy->device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.
>> Any good reason not to use offsetof?
> 
> I'm not sure it's portable to use it in case labels.  IIRC, the
> definition ((int)&(((T *)0)->field)) is not a valid C integer constant
> expression.  Laszlo?

As long as we use the offsetof() that comes with the C implementation
(ie. we don't hack it up ourselves), we should be safe; ISO C99
elegantly defines offsetof() in "7.17 Common definitions " p3
as

  The macros are [...]

offsetof(type, member-designator)

  which expands to an integer constant expression that has type
  *size_t*, the value of which is the offset in bytes, to the structure
  member (designated by /member-designator/), from the beginning of its
  structure (designated by /type/). The type and member designator shall
  be such that given

static type t;

  then the expression &(t.member-designator) evaluates to an address
  constant. (If the specified member is a bit-field, the behavior is
  undefined.)

("Address constant" is described in 6.6p9 but quoting even that here
doesn't seem necesary.)

>From 6.8.4.2p3, "The expression of each case label shall be an integer
constant expression [...]".

So,
(a) if we use the standard offsetof(),
(b) and you can also write, for example,

  static struct virtio_pci_common_cfg t;
  static void *p = &t.device_feature_select;

then the construct seems valid.

(Consistently with the above mention of UB if the specified member is a
bit-field: the address-of operator's constraints also forbid a bit-field
object as operand.)

Laszlo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
On Tue, May 28, 2013 at 01:53:35PM -0500, Anthony Liguori wrote:
> "Michael S. Tsirkin"  writes:
> 
> > On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
> >> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
> >> > hwaddr addr,
> >> >  }
> >> >  }
> >> >  
> >> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
> >> > +  unsigned size)
> >> > +{
> >> > +VirtIOPCIProxy *proxy = opaque;
> >> > +VirtIODevice *vdev = proxy->vdev;
> >> > +
> >> > +uint64_t low = 0xull;
> >> > +
> >> > +switch (addr) {
> >> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> >> > +return proxy->device_feature_select;
> >> 
> >> Oh dear no...  Please use defines like the rest of QEMU.
> >
> > Any good reason not to use offsetof?
> > I see about 138 examples in qemu.
> 
> There are exactly zero:
> 
> $ find . -name "*.c" -exec grep -l "case offset" {} \;
> $
> 
> > Anyway, that's the way Rusty wrote it in the kernel header -
> > I started with defines.
> > If you convince Rusty to switch I can switch too,
> 
> We have 300+ devices in QEMU that use #defines.  We're not using this
> kind of thing just because you want to copy code from the kernel.

Rusty, let's switch to defines?

> >> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
> >> 
> >> And:
> >> 
> >> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
> >> 
> >> Which lets virtio-pci be subclassable and then remaps the config space to
> >> BAR2.
> >
> >
> > Interesting. Have the spec anywhere?
> 
> Not yet, but working on that.
> 
> > You are saying this is going to conflict because
> > of BAR2 usage, yes?
> 
> No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
> be the vram mapping.  It also had to be an MMIO bar.
> 
> The new layout might make it easier to implement a device like this.

Absolutely, you can put thing anywhere you like.

>  I
> shared it mainly because I wanted to show the subclassing idea vs. just
> tacking an option onto the existing virtio-pci code in QEMU.
> 
> Regards,
> 
> Anthony Liguori

I don't think a subclass will work for the new layout. This should be
completely transparent to users, just have more devices
work with more flexibility.

> > So let's only do this virtio-fb only for new layout, so we don't need
> > to maintain compatibility. In particular, we are working
> > on making memory BAR access fast for virtio devices
> > in a generic way. At the moment they are measureably slower
> > than PIO on x86.
> >
> >
> >> Haven't looked at the proposed new ring layout yet.
> >> 
> >> Regards,
> >
> > No new ring layout. It's new config layout.
> >
> >
> > -- 
> > MST
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
>> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
>> > hwaddr addr,
>> >  }
>> >  }
>> >  
>> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
>> > +  unsigned size)
>> > +{
>> > +VirtIOPCIProxy *proxy = opaque;
>> > +VirtIODevice *vdev = proxy->vdev;
>> > +
>> > +uint64_t low = 0xull;
>> > +
>> > +switch (addr) {
>> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>> > +return proxy->device_feature_select;
>> 
>> Oh dear no...  Please use defines like the rest of QEMU.
>
> Any good reason not to use offsetof?
> I see about 138 examples in qemu.

There are exactly zero:

$ find . -name "*.c" -exec grep -l "case offset" {} \;
$

> Anyway, that's the way Rusty wrote it in the kernel header -
> I started with defines.
> If you convince Rusty to switch I can switch too,

We have 300+ devices in QEMU that use #defines.  We're not using this
kind of thing just because you want to copy code from the kernel.

>> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
>> 
>> And:
>> 
>> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
>> 
>> Which lets virtio-pci be subclassable and then remaps the config space to
>> BAR2.
>
>
> Interesting. Have the spec anywhere?

Not yet, but working on that.

> You are saying this is going to conflict because
> of BAR2 usage, yes?

No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
be the vram mapping.  It also had to be an MMIO bar.

The new layout might make it easier to implement a device like this.  I
shared it mainly because I wanted to show the subclassing idea vs. just
tacking an option onto the existing virtio-pci code in QEMU.

Regards,

Anthony Liguori

> So let's only do this virtio-fb only for new layout, so we don't need
> to maintain compatibility. In particular, we are working
> on making memory BAR access fast for virtio devices
> in a generic way. At the moment they are measureably slower
> than PIO on x86.
>
>
>> Haven't looked at the proposed new ring layout yet.
>> 
>> Regards,
>
> No new ring layout. It's new config layout.
>
>
> -- 
> MST
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Paolo Bonzini
Il 28/05/2013 19:32, Michael S. Tsirkin ha scritto:
>>> > > +
>>> > > +switch (addr) {
>>> > > +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
>>> > > +return proxy->device_feature_select;
>> > 
>> > Oh dear no...  Please use defines like the rest of QEMU.
> Any good reason not to use offsetof?

I'm not sure it's portable to use it in case labels.  IIRC, the
definition ((int)&(((T *)0)->field)) is not a valid C integer constant
expression.  Laszlo?

> I see about 138 examples in qemu.

Almost all of them are about fields in the host, not the guest, and none
of them are in case statements.

Paolo

> 
> Anyway, that's the way Rusty wrote it in the kernel header -
> I started with defines.
> If you convince Rusty to switch I can switch too,
> but I prefer reusing same structures as linux even if
> for now I've given up on reusing same headers.
> 
> 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
> > @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
> > hwaddr addr,
> >  }
> >  }
> >  
> > +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
> > +  unsigned size)
> > +{
> > +VirtIOPCIProxy *proxy = opaque;
> > +VirtIODevice *vdev = proxy->vdev;
> > +
> > +uint64_t low = 0xull;
> > +
> > +switch (addr) {
> > +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> > +return proxy->device_feature_select;
> 
> Oh dear no...  Please use defines like the rest of QEMU.

Any good reason not to use offsetof?
I see about 138 examples in qemu.

Anyway, that's the way Rusty wrote it in the kernel header -
I started with defines.
If you convince Rusty to switch I can switch too,
but I prefer reusing same structures as linux even if
for now I've given up on reusing same headers.


> >From a QEMU pov, take a look at:
> 
> https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
> 
> And:
> 
> https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
> 
> Which lets virtio-pci be subclassable and then remaps the config space to
> BAR2.


Interesting. Have the spec anywhere?
You are saying this is going to conflict because
of BAR2 usage, yes?

So let's only do this virtio-fb only for new layout, so we don't need
to maintain compatibility. In particular, we are working
on making memory BAR access fast for virtio devices
in a generic way. At the moment they are measureably slower
than PIO on x86.


> Haven't looked at the proposed new ring layout yet.
> 
> Regards,

No new ring layout. It's new config layout.


-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori
"Michael S. Tsirkin"  writes:

> This adds support for new config, and is designed to work with
> the new layout code in Rusty's new layout branch.
>
> At the moment all fields are in the same memory BAR (bar 2).
> This will be used to test performance and compare
> memory, io and hypercall latency.
>
> Compiles but does not work yet.
> Migration isn't handled yet.
>
> It's not clear what do queue_enable/queue_disable
> fields do, not yet implemented.
>
> Gateway for config access with config cycles
> not yet implemented.
>
> Sending out for early review/flames.
>
> Signed-off-by: Michael S. Tsirkin 
> ---
>  hw/virtio/virtio-pci.c | 393 
> +++--
>  hw/virtio/virtio-pci.h |  55 +++
>  hw/virtio/virtio.c |  20 +++
>  include/hw/virtio/virtio.h |   4 +
>  4 files changed, 458 insertions(+), 14 deletions(-)
>
> diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
> index 752991a..f4db224 100644
> --- a/hw/virtio/virtio-pci.c
> +++ b/hw/virtio/virtio-pci.c
> @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
> *proxy)
>  proxy->ioeventfd_started = false;
>  }
>  
> +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
> +{
> +VirtIODevice *vdev = proxy->vdev;
> +
> +if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> +virtio_pci_stop_ioeventfd(proxy);
> +}
> +
> +virtio_set_status(vdev, val & 0xFF);
> +
> +if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> +virtio_pci_start_ioeventfd(proxy);
> +}
> +
> +if (vdev->status == 0) {
> +virtio_reset(proxy->vdev);
> +msix_unuse_all_vectors(&proxy->pci_dev);
> +}
> +}
> +
>  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>  {
>  VirtIOPCIProxy *proxy = opaque;
> @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
> addr, uint32_t val)
>  }
>  break;
>  case VIRTIO_PCI_STATUS:
> -if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
> -virtio_pci_stop_ioeventfd(proxy);
> -}
> -
> -virtio_set_status(vdev, val & 0xFF);
> -
> -if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
> -virtio_pci_start_ioeventfd(proxy);
> -}
> -
> -if (vdev->status == 0) {
> -virtio_reset(proxy->vdev);
> -msix_unuse_all_vectors(&proxy->pci_dev);
> -}
> +virtio_pci_set_status(proxy, val);
>  
>  /* Linux before 2.6.34 sets the device as OK without enabling
> the PCI device bus master bit. In this case we need to disable
> @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
> hwaddr addr,
>  }
>  }
>  
> +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
> +  unsigned size)
> +{
> +VirtIOPCIProxy *proxy = opaque;
> +VirtIODevice *vdev = proxy->vdev;
> +
> +uint64_t low = 0xull;
> +
> +switch (addr) {
> +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
> +return proxy->device_feature_select;

Oh dear no...  Please use defines like the rest of QEMU.

>From a QEMU pov, take a look at:

https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659

And:

https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb

Which lets virtio-pci be subclassable and then remaps the config space to
BAR2.

Haven't looked at the proposed new ring layout yet.

Regards,

Anthony Liguori

> +case offsetof(struct virtio_pci_common_cfg, device_feature):
> +/* TODO: 64-bit features */
> + return proxy->device_feature_select ? 0 : proxy->host_features;
> +case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
> +return proxy->guest_feature_select;
> +case offsetof(struct virtio_pci_common_cfg, guest_feature):
> +/* TODO: 64-bit features */
> + return proxy->guest_feature_select ? 0 : vdev->guest_features;
> +case offsetof(struct virtio_pci_common_cfg, msix_config):
> + return vdev->config_vector;
> +case offsetof(struct virtio_pci_common_cfg, num_queues):
> +/* TODO: more exact limit? */
> + return VIRTIO_PCI_QUEUE_MAX;
> +case offsetof(struct virtio_pci_common_cfg, device_status):
> +return vdev->status;
> +
> + /* About a specific virtqueue. */
> +case offsetof(struct virtio_pci_common_cfg, queue_select):
> +return  vdev->queue_sel;
> +case offsetof(struct virtio_pci_common_cfg, queue_size):
> +return virtio_queue_get_num(vdev, vdev->queue_sel);
> +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
> + return virtio_queue_vector(vdev, vdev->queue_sel);
> +case offsetof(struct virtio_pci_common_cfg, queue_enable):
> +/* TODO */
> + return 0;
> +case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
> +return vdev->queue_sel;

[PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Michael S. Tsirkin
This adds support for new config, and is designed to work with
the new layout code in Rusty's new layout branch.

At the moment all fields are in the same memory BAR (bar 2).
This will be used to test performance and compare
memory, io and hypercall latency.

Compiles but does not work yet.
Migration isn't handled yet.

It's not clear what do queue_enable/queue_disable
fields do, not yet implemented.

Gateway for config access with config cycles
not yet implemented.

Sending out for early review/flames.

Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-pci.c | 393 +++--
 hw/virtio/virtio-pci.h |  55 +++
 hw/virtio/virtio.c |  20 +++
 include/hw/virtio/virtio.h |   4 +
 4 files changed, 458 insertions(+), 14 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 752991a..f4db224 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
*proxy)
 proxy->ioeventfd_started = false;
 }
 
+static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
+{
+VirtIODevice *vdev = proxy->vdev;
+
+if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
+virtio_pci_stop_ioeventfd(proxy);
+}
+
+virtio_set_status(vdev, val & 0xFF);
+
+if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
+virtio_pci_start_ioeventfd(proxy);
+}
+
+if (vdev->status == 0) {
+virtio_reset(proxy->vdev);
+msix_unuse_all_vectors(&proxy->pci_dev);
+}
+}
+
 static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
 {
 VirtIOPCIProxy *proxy = opaque;
@@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
addr, uint32_t val)
 }
 break;
 case VIRTIO_PCI_STATUS:
-if (!(val & VIRTIO_CONFIG_S_DRIVER_OK)) {
-virtio_pci_stop_ioeventfd(proxy);
-}
-
-virtio_set_status(vdev, val & 0xFF);
-
-if (val & VIRTIO_CONFIG_S_DRIVER_OK) {
-virtio_pci_start_ioeventfd(proxy);
-}
-
-if (vdev->status == 0) {
-virtio_reset(proxy->vdev);
-msix_unuse_all_vectors(&proxy->pci_dev);
-}
+virtio_pci_set_status(proxy, val);
 
 /* Linux before 2.6.34 sets the device as OK without enabling
the PCI device bus master bit. In this case we need to disable
@@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, hwaddr 
addr,
 }
 }
 
+static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
+  unsigned size)
+{
+VirtIOPCIProxy *proxy = opaque;
+VirtIODevice *vdev = proxy->vdev;
+
+uint64_t low = 0xull;
+
+switch (addr) {
+case offsetof(struct virtio_pci_common_cfg, device_feature_select):
+return proxy->device_feature_select;
+case offsetof(struct virtio_pci_common_cfg, device_feature):
+/* TODO: 64-bit features */
+   return proxy->device_feature_select ? 0 : proxy->host_features;
+case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
+return proxy->guest_feature_select;
+case offsetof(struct virtio_pci_common_cfg, guest_feature):
+/* TODO: 64-bit features */
+   return proxy->guest_feature_select ? 0 : vdev->guest_features;
+case offsetof(struct virtio_pci_common_cfg, msix_config):
+   return vdev->config_vector;
+case offsetof(struct virtio_pci_common_cfg, num_queues):
+/* TODO: more exact limit? */
+   return VIRTIO_PCI_QUEUE_MAX;
+case offsetof(struct virtio_pci_common_cfg, device_status):
+return vdev->status;
+
+   /* About a specific virtqueue. */
+case offsetof(struct virtio_pci_common_cfg, queue_select):
+return  vdev->queue_sel;
+case offsetof(struct virtio_pci_common_cfg, queue_size):
+return virtio_queue_get_num(vdev, vdev->queue_sel);
+case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
+   return virtio_queue_vector(vdev, vdev->queue_sel);
+case offsetof(struct virtio_pci_common_cfg, queue_enable):
+/* TODO */
+   return 0;
+case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
+return vdev->queue_sel;
+case offsetof(struct virtio_pci_common_cfg, queue_desc):
+return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) & low;
+case offsetof(struct virtio_pci_common_cfg, queue_desc) + 4:
+return virtio_queue_get_desc_addr(vdev, vdev->queue_sel) >> 32;
+case offsetof(struct virtio_pci_common_cfg, queue_avail):
+return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) & low;
+case offsetof(struct virtio_pci_common_cfg, queue_avail) + 4:
+return virtio_queue_get_avail_addr(vdev, vdev->queue_sel) >> 32;
+case offsetof(struct virtio_pci_common_cfg, queue_used):
+return virtio_queue_get_used_addr(vdev, vdev->queue_sel) & low;
+