from:"Anthony Liguori"

Re: [PATCH] virtio_pci: properly clean up MSI-X state when initialization fails

2014-09-15 Thread Anthony Liguori

Hi Michael,

On Mon, Sep 15, 2014 at 1:32 AM, Michael S. Tsirkin m...@redhat.com wrote:
 On Sun, Sep 14, 2014 at 08:23:26PM -0700, Anthony Liguori wrote:
 From: Anthony Liguori aligu...@amazon.com

 If MSI-X initialization fails after setting msix_enabled = 1, then
 the device is left in an inconsistent state.  This would normally
 only happen if there was a bug in the device emulation but it still
 should be handled correctly.

 This might happen if host runs out of resources when trying
 to map VQs to vectors, so doesn't have to be a bug.

 But I don't see what the problem is:
 msix_used_vectors reflects the number of used vectors
 and msix_enabled is set, thus vp_free_vectors
 will free all IRQs and then disable MSIX.

 Where is the inconsistency you speak about?

I missed the fact that vp_free_vectors() conditionally sets
msix_enabled=0.  It seems a bit cludgy especially since it is called
both before and after setting msix_enabled=1.

I ran into a number of weird problems due to read/write reordering
that was causing this code path to fail.  The impact was
non-deterministic.  I'll go back and try to better understand what
code path is causing it.

 Cc: Matt Wilson m...@amazon.com
 Cc: Rusty Russell ru...@rustcorp.com.au
 Cc: Michael Tsirkin m...@redhat.com
 Signed-off-by: Anthony Liguori aligu...@amazon.com
 ---
  drivers/virtio/virtio_pci.c |8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index 9cbac33..3d2c2a5 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -357,7 +357,7 @@ static int vp_request_msix_vectors(struct virtio_device 
 *vdev, int nvectors,
   v = ioread16(vp_dev-ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
   if (v == VIRTIO_MSI_NO_VECTOR) {
   err = -EBUSY;
 - goto error;
 + goto error_msix_used;
   }

   if (!per_vq_vectors) {
 @@ -369,11 +369,15 @@ static int vp_request_msix_vectors(struct 
 virtio_device *vdev, int nvectors,
 vp_vring_interrupt, 0, vp_dev-msix_names[v],
 vp_dev);
   if (err)
 - goto error;
 + goto error_msix_used;
   ++vp_dev-msix_used_vectors;
   }
   return 0;
 +error_msix_used:
 + v = --vp_dev-msix_used_vectors;
 + free_irq(vp_dev-msix_entries[v].vector, vp_dev);
  error:
 + vp_dev-msix_enabled = 0;

 As far as I can see, if you do this, guest will not call
 pci_disable_msix thus leaving the device with MSIX enabled.

I don't understand this comment.  How is the work done in this path
any different from what's done in vp_free_vectors()?

Regards,

Anthony Liguori

 I'm not sure this won't break drivers if they then
 try to use the device without MSIX, and it
 definitely seems less elegant than reverting the
 device to the original state.


   vp_free_vectors(vdev);
   return err;
  }
 --
 1.7.9.5
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-pci: also bind to Amazon PCI vendor ID

2014-09-15 Thread Anthony Liguori

Hi Michael,

On Mon, Sep 15, 2014 at 1:20 AM, Michael S. Tsirkin m...@redhat.com wrote:
On Sun, Sep 14, 2014 at 08:29:33PM -0700, Anthony Liguori wrote:
From: Anthony Liguori aligu...@amazon.com

See https://issues.oasis-open.org/browse/VIRTIO-16 although it
was prematurely closed.

The reason it was closed is described in the comments. and I quote:
I think anyone can use a different subsystem vendor id and whql the
driver. virtio-102 will make this even easier, by making the subsystem
id flexible. Let's close this and re-open if someone tries to do this
and runs into a problem.

Look here for example:
https://github.com/YanVugenfirer/kvm-guest-drivers-windows/blob/master/NetKVM/NDIS5/wxp/netkvm.inf
Replace SUBSYS_00011AF4 with SUBSYS_00011D0F, and you will get
a virtio-net driver that (I think) you should be able to WHQL.

The string you are referencing is the device description which is part
of what forms the device instance id. You can read more at
http://msdn.microsoft.com/en-us/library/windows/hardware/ff541327%28v=vs.85%29.aspx.
Including a SUBSYS entry is mandatory in modern Windows drivers.

But this is independent of WHQL certification. My understanding is
that Microsoft will only allow the owner of the PCI Vendor ID to WHQL
drivers. As best as I know, this is not a publicly documented
process.

Do you have any examples of anyone else successfuling WHQL'ing drivers
by just changing the subsystem ID? Microsoft has specific rules about
the usage of the subsystem ID. See
http://msdn.microsoft.com/en-us/library/windows/hardware/dn653567%28v=vs.85%29.aspx.
I don't think it is intended to enable a totally different
implementation of the driver based on subsystem ID.

Certainly, this is not expected with typical PCI devices.

On the host side, you will need a QEMU patch to allow libvirt control of
the subsystem vendor ID.

All this while all Linux guests will keep working without changes,
which seems like a better approach.

Looking on the web, I found:
http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/pciidspec-11.doc
Test will read and properly concatenate PCI IDs and verify uniqueness
this is likely what you are running into: IDs must be unique,
so if you want to put your driver in Microsoft's database,
it must match a different set of IDs.
But you should not need to change the vendor ID to make them unique,
changing subsystem vendor ID will do.

Did you try this?

You cannot submit a modified kvm-guest-drivers.git for WHQL
certification as the licensing is constructed in such a way as to
prevent that.

Red Hat has non-redistributable Windows drivers and Microsoft
will not allow anyone else to WHQL certify drivers using that
vendor ID.

Don't see what Red Hat's windows drivers have to do with Linux really.
Amazon.com can do whatever it wants with its vendor ID, and if there is a
hypervisor with a different vendor ID that can use the virtio drivers, this
patch is required.
The following would be a reasonable commit log in that case:

Amazon.com uses PV devices with vendor ID 0x1d0f that are otherwise
compatible with Linux virtio drivers. Add 0x1d0f ID to the list
to make Linux work with these devices.
Feel free to use :)

But I'd like to note that by doing this on the hypervisor side,
you lose the ability to run older Linux guests,
and create work for all distros who now need to update their
kernels to work with your ID, apparently for no good reason.

So if this isn't out in the field yet, I would suggest examining
the alternative listed above.

I am very happy to use any alternative mechanism that allows for
virtio to be used with a wide variety of guests. Many other companies
have also struggled with this and AFAIK no one has had much success.

OTOH if it *is* decided this is going to be out there in the field, please add
the new devices to the PCI IDs list.
http://pci-ids.ucw.cz/
Otherwise there's no way to be sure someone won't try to
use these IDs for something else.

PCI-SIG assigns vendor IDs and 0x1d0f is assigned to Amazon. See
https://www.pcisig.com/membership/vid_search/

Vendors self-manage device IDs and we have allocated 0x1000-0x103f to
virtio devices.

That makes it impossible to use virtio drivers with
a Windows guest without changing the vendor ID.

Hardly impossible: virtio drivers are available from a
variety of sources.

Examples?

Regards,

Anthony Liguori

But this is IMO beside the point.

Cc: Matt Wilson m...@amazon.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael Tsirkin m...@redhat.com
Signed-off-by: Anthony Liguori aligu...@amazon.com

I'd like to see the response/confirmation of the above, and/or the
commit log replaced before this patch is applied.

Thanks!

---
drivers/virtio/virtio_pci.c |2 ++
1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 101db3f..9cbac33 100644
--- a/drivers/virtio

[PATCH] virtio_pci: properly clean up MSI-X state when initialization fails

2014-09-14 Thread Anthony Liguori

From: Anthony Liguori aligu...@amazon.com

If MSI-X initialization fails after setting msix_enabled = 1, then
the device is left in an inconsistent state.  This would normally
only happen if there was a bug in the device emulation but it still
should be handled correctly.

Cc: Matt Wilson m...@amazon.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael Tsirkin m...@redhat.com
Signed-off-by: Anthony Liguori aligu...@amazon.com
---
 drivers/virtio/virtio_pci.c |8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 9cbac33..3d2c2a5 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -357,7 +357,7 @@ static int vp_request_msix_vectors(struct virtio_device 
*vdev, int nvectors,
v = ioread16(vp_dev-ioaddr + VIRTIO_MSI_CONFIG_VECTOR);
if (v == VIRTIO_MSI_NO_VECTOR) {
err = -EBUSY;
-   goto error;
+   goto error_msix_used;
}
 
if (!per_vq_vectors) {
@@ -369,11 +369,15 @@ static int vp_request_msix_vectors(struct virtio_device 
*vdev, int nvectors,
  vp_vring_interrupt, 0, vp_dev-msix_names[v],
  vp_dev);
if (err)
-   goto error;
+   goto error_msix_used;
++vp_dev-msix_used_vectors;
}
return 0;
+error_msix_used:
+   v = --vp_dev-msix_used_vectors;
+   free_irq(vp_dev-msix_entries[v].vector, vp_dev);
 error:
+   vp_dev-msix_enabled = 0;
vp_free_vectors(vdev);
return err;
 }
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

[PATCH] virtio-pci: also bind to Amazon PCI vendor ID

2014-09-14 Thread Anthony Liguori

From: Anthony Liguori aligu...@amazon.com

See https://issues.oasis-open.org/browse/VIRTIO-16 although it
was prematurely closed.

Red Hat has non-redistributable Windows drivers and Microsoft
will not allow anyone else to WHQL certify drivers using that
vendor ID.  That makes it impossible to use virtio drivers with
a Windows guest without changing the vendor ID.

Cc: Matt Wilson m...@amazon.com
Cc: Rusty Russell ru...@rustcorp.com.au
Cc: Michael Tsirkin m...@redhat.com
Signed-off-by: Anthony Liguori aligu...@amazon.com
---
 drivers/virtio/virtio_pci.c |2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index 101db3f..9cbac33 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -93,6 +93,8 @@ struct virtio_pci_vq_info
 /* Qumranet donated their vendor ID for devices 0x1000 thru 0x10FF. */
 static DEFINE_PCI_DEVICE_TABLE(virtio_pci_id_table) = {
{ PCI_DEVICE(0x1af4, PCI_ANY_ID) },
+   /* Amazon.com vendor ID */
+   { PCI_DEVICE(0x1d0f, PCI_ANY_ID) },
{ 0 }
 };
 
-- 
1.7.9.5

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

KVM Forum 2013 Call for Participation - Extended to August 4th

2013-07-23 Thread Anthony Liguori


We have received numerous requests to extend the CFP deadline and so
we are happy to announce that the CFP deadline has been moved by two
weeks to August 4th.

=
KVM Forum 2013: Call For Participation
October 21-23, 2013 - Edinburgh International Conference Centre - Edinburgh, UK

(All submissions must be received before midnight July 21, 2013)
=

KVM is an industry leading open source hypervisor that provides an ideal
platform for datacenter virtualization, virtual desktop infrastructure,
and cloud computing.  Once again, it's time to bring together the
community of developers and users that define the KVM ecosystem for
our annual technical conference.  We will discuss the current state of
affairs and plan for the future of KVM, its surrounding infrastructure,
and management tools.  The oVirt Workshop will run in parallel with the
KVM Forum again, bringing in a community focused on enterprise datacenter
virtualization management built on KVM.  For topics which overlap we will
have shared sessions.  So mark your calendar and join us in advancing KVM.

http://events.linuxfoundation.org/events/kvm-forum/

Once again we are colocated with The Linux Foundation's LinuxCon Europe.
KVM Forum attendees will be able to attend oVirt Workshop sessions and
are eligible to attend LinuxCon Europe for a discounted rate.

http://events.linuxfoundation.org/events/kvm-forum/register

We invite you to lead part of the discussion by submitting a speaking
proposal for KVM Forum 2013.

http://events.linuxfoundation.org/cfp

Suggested topics:

 KVM/Kernel
 - Scaling and performance
 - Nested virtualization
 - I/O improvements
 - VFIO, device assignment, SR-IOV
 - Driver domains
 - Time keeping
 - Resource management (cpu, memory, i/o)
 - Memory management (page sharing, swapping, huge pages, etc)
 - Network virtualization
 - Security
 - Architecture ports

 QEMU
 - Device model improvements
 - New devices and chipsets
 - Scaling and performance
 - Desktop virtualization
 - Spice
 - Increasing robustness and hardening
 - Security model
 - Management interfaces
 - QMP protocol and implementation
 - Image formats
 - Firmware (SeaBIOS, OVMF, UEFI, etc)
 - Live migration
 - Live snapshots and merging
 - Fault tolerance, high availability, continuous backup
 - Real-time guest support

 Virtio
 - Speeding up existing devices
 - Alternatives
 - Virtio on non-Linux or non-virtualized

 Management infrastructure
 - oVirt (shared track w/ oVirt Workshop)
 - Libvirt
 - KVM autotest
 - OpenStack
 - Network virtualization management
 - Enterprise storage management

 Cloud computing
 - Scalable storage
 - Virtual networking
 - Security
 - Provisioning

SUBMISSION REQUIREMENTS

Abstracts due: July 21, 2013
Notification: August 1, 2013

Please submit a short abstract (~150 words) describing your presentation
proposal.  In your submission please note how long your talk will take.
Slots vary in length up to 45 minutes.  Also include in your proposal
the proposal type -- one of:

- technical talk
- end-user talk
- birds of a feather (BOF) session

Submit your proposal here:

http://events.linuxfoundation.org/cfp

You will receive a notification whether or not your presentation proposal
was accepted by Aug 1st.

END-USER COLLABORATION

One of the big challenges as developers is to know what, where and how
people actually use our software.  We will reserve a few slots for end
users talking about their deployment challenges and achievements.

If you are using KVM in production you are encouraged submit a speaking
proposal.  Simply mark it as an end-user collaboration proposal.  As an
end user, this is a unique opportunity to get your input to developers.

BOF SESSION

We will reserve some slots in the evening after the main conference
tracks, for birds of a feather (BOF) sessions. These sessions will be
less formal than presentation tracks and targetted for people who would
like to discuss specific issues with other developers and/or users.
If you are interested in getting developers and/or uses together to
discuss a specific problem, please submit a BOF proposal.

HOTEL / TRAVEL

The KVM Forum 2013 will be held in Edinburgh, UK at the Edinburgh
International Conference Centre.

http://events.linuxfoundation.org/events/kvm-forum/hotel

Thank you for your interest in KVM.  We're looking forward to your
submissions and seeing you at the KVM Forum 2013 in October!

Thanks,
-your KVM Forum 2013 Program Committee

Please contact us with any questions or comments.
kvm-forum-2013...@redhat.com

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.
 
 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

 The model is very interesting for QEMU because then we can use vhost as
 a backend for other types of network adapters (like vmxnet3 or even
 e1000).

 It also helps for things like fault tolerance where we need to be able
 to control packet flow within QEMU.

 (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).

 Then I'm really confused as to what this would look like.  A zero copy
 sendmsg?  We should be able to implement that today.

The only trouble with sendmsg would be doing batch submission and
asynchronous completion.

A thread pool could certainly be used for this I guess.

Regards,

Anthony Liguori

 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

 Confused,
 Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori

Stefan Hajnoczi stefa...@gmail.com writes:

 On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au wrote:
 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.

 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

 The model is very interesting for QEMU because then we can use vhost as
 a backend for other types of network adapters (like vmxnet3 or even
 e1000).

 It also helps for things like fault tolerance where we need to be able
 to control packet flow within QEMU.

 (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).

 Then I'm really confused as to what this would look like.  A zero copy
 sendmsg?  We should be able to implement that today.

 On the receive side, what can we do better than readv?  If we need to
 return to userspace to tell the guest that we've got a new packet, we
 don't win on latency.  We might reduce syscall overhead with a
 multi-dimensional readv to read multiple packets at once?

 Sounds like recvmmsg(2).

Could we map this to mergable rx buffers though?

Regards,

Anthony Liguori


 Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-30 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Forcing a guest driver change is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 So we're stuck with the 1.0 config layout for a very long time.

 We definitely must not force a guest change.  The explicit aim of the
 standard is that legacy and 1.0 be backward compatible.  One
 deliverable is a document detailing how this is done (effectively a
 summary of changes between what we have and 1.0).

If 2.0 is fully backwards compatible, great.  It seems like such a
difference that that would be impossible but I need to investigate
further.

Regards,

Anthony Liguori


 It's a delicate balancing act.  My plan is to accompany any changes in
 the standard with a qemu implementation, so we can see how painful those
 changes are.  And if there are performance implications, measure them.

 Cheers,
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: updated: kvm networking todo wiki

2013-05-30 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Thu, May 30, 2013 at 08:40:47AM -0500, Anthony Liguori wrote:
 Stefan Hajnoczi stefa...@gmail.com writes:
 
  On Thu, May 30, 2013 at 7:23 AM, Rusty Russell ru...@rustcorp.com.au 
  wrote:
  Anthony Liguori anth...@codemonkey.ws writes:
  Rusty Russell ru...@rustcorp.com.au writes:
  On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
  FWIW, I think what's more interesting is using vhost-net as a 
  networking
  backend with virtio-net in QEMU being what's guest facing.
 
  In theory, this gives you the best of both worlds: QEMU acts as a first
  line of defense against a malicious guest while still getting the
  performance advantages of vhost-net (zero-copy).
 
  It would be an interesting idea if we didn't already have the vhost
  model where we don't need the userspace bounce.
 
  The model is very interesting for QEMU because then we can use vhost as
  a backend for other types of network adapters (like vmxnet3 or even
  e1000).
 
  It also helps for things like fault tolerance where we need to be able
  to control packet flow within QEMU.
 
  (CC's reduced, context added, Dmitry Fleytman added for vmxnet3 thoughts).
 
  Then I'm really confused as to what this would look like.  A zero copy
  sendmsg?  We should be able to implement that today.
 
  On the receive side, what can we do better than readv?  If we need to
  return to userspace to tell the guest that we've got a new packet, we
  don't win on latency.  We might reduce syscall overhead with a
  multi-dimensional readv to read multiple packets at once?
 
  Sounds like recvmmsg(2).
 
 Could we map this to mergable rx buffers though?
 
 Regards,
 
 Anthony Liguori

 Yes because we don't have to complete buffers in order.

What I meant though was for GRO, we don't know how large the received
packet is going to be.  Mergable rx buffers lets us allocate a pool of
data for all incoming packets instead of allocating max packet size *
max packets.

recvmmsg expects an array of msghdrs and I presume each needs to be
given a fixed size.  So this seems incompatible with mergable rx
buffers.

Regards,

Anthony Liguori


 
  Stefan
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Michael S. Tsirkin m...@redhat.com writes:
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

 Oh dear no...  Please use defines like the rest of QEMU.

 It is pretty ugly.

I think beauty is in the eye of the beholder here...

Pretty much every device we have has a switch statement like this.
Consistency wins when it comes to qualitative arguments like this.

 Yet the structure definitions are descriptive, capturing layout, size
 and endianness in natural a format readable by any C programmer.

From an API design point of view, here are the problems I see:

1) C makes no guarantees about structure layout beyond the first
   member.  Yes, if it's naturally aligned or has a packed attribute,
   GCC does the right thing.  But this isn't kernel land anymore,
   portability matters and there are more compilers than GCC.

2) If we every introduce anything like latching, this doesn't work out
   so well anymore because it's hard to express in a single C structure
   the register layout at that point.  Perhaps a union could be used but
   padding may make it a bit challenging.

3) It suspect it's harder to review because a subtle change could more
   easily have broad impact.  If someone changed the type of a field
   from u32 to u16, it changes the offset of every other field.  That's
   not terribly obvious in the patch itself unless you understand how
   the structure is used elsewhere.

   This may not be a problem for virtio because we all understand that
   the structures are part of an ABI, but if we used this pattern more
   in QEMU, it would be a lot less obvious.

 So AFAICT the question is, do we put the required

 #define VIRTIO_PCI_CFG_FEATURE_SEL \
  (offsetof(struct virtio_pci_common_cfg, device_feature_select))

 etc. in the kernel headers or qemu?

I'm pretty sure we would end up just having our own integer defines.  We
carry our own virtio headers today because we can't easily import the
kernel headers.

 Haven't looked at the proposed new ring layout yet.

 No change, but there's an open question on whether we should nail it to
 little endian (or define the endian by the transport).

 Of course, I can't rule out that the 1.0 standard *may* decide to frob
 the ring layout somehow,

Well, given that virtio is widely deployed today, I would think the 1.0
standard should strictly reflect what's deployed today, no?

Any new config layout would be 2.0 material, right?

Re: the new config layout, I don't think we would want to use it for
anything but new devices.  Forcing a guest driver change is a really big
deal and I see no reason to do that unless there's a compelling reason
to.

So we're stuck with the 1.0 config layout for a very long time.

Regards,

Anthony Liguori

 but I'd think it would require a compelling
 reason.  I suggest that's 2.0 material...

 Cheers,
 Rusty.

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: updated: kvm networking todo wiki

2013-05-29 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:
 On Fri, May 24, 2013 at 08:47:58AM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
  On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
   Hey guys,
   I've updated the kvm networking todo wiki with current projects.
   Will try to keep it up to date more often.
   Original announcement below.
  
  Thanks a lot. I've added the tasks I'm currently working on to the wiki.
  
  btw. I notice the virtio-net data plane were missed in the wiki. Is the
  project still being considered?
 
  It might have been interesting several years ago, but now that linux has
  vhost-net in kernel, the only point seems to be to
  speed up networking on non-linux hosts.
 
 Data plane just means having a dedicated thread for virtqueue processing
 that doesn't hold qemu_mutex.
 
 Of course we're going to do this in QEMU.  It's a no brainer.  But not
 as a separate device, just as an improvement to the existing userspace
 virtio-net.
 
  Since non-linux does not have kvm, I doubt virtio is a bottleneck.
 
 FWIW, I think what's more interesting is using vhost-net as a networking
 backend with virtio-net in QEMU being what's guest facing.
 
 In theory, this gives you the best of both worlds: QEMU acts as a first
 line of defense against a malicious guest while still getting the
 performance advantages of vhost-net (zero-copy).

 Great idea, that sounds very intresting.

 I'll add it to the wiki.

 In fact a bit of complexity in vhost was put there in the vague hope to
 support something like this: virtio rings are not translated through
 regular memory tables, instead, vhost gets a pointer to ring address.

 This allows qemu acting as a man in the middle,
 verifying the descriptors but not touching the

 Anyone interested in working on such a project?

 It would be an interesting idea if we didn't already have the vhost
 model where we don't need the userspace bounce.

The model is very interesting for QEMU because then we can use vhost as
a backend for other types of network adapters (like vmxnet3 or even
e1000).

It also helps for things like fault tolerance where we need to be able
to control packet flow within QEMU.

Regards,

Anthony Liguori

 We already have two
 sets of host side ring code in the kernel (vhost and vringh, though
 they're being unified).

 All an accelerator can offer on the tx side is zero copy and direct
 update of the used ring.  On rx userspace could register the buffers and
 the accelerator could fill them and update the used ring.  It still
 needs to deal with merged buffers, for example.

 You avoid the address translation in the kernel, but I'm not convinced
 that's a key problem.

 Cheers,
 Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-29 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 The headers say they are BSD licensed... but they include a GPLv2+
 header.  Doesn't make a lot of sense, does it?

 It makes perfect sense: you're overthinking it.  It just means that
 copying the BSD headers outside Linux is encouraged.

Copying by hand and modifying.  This patch series attempts to do it
automatically within QEMU.

 And it's clearly nonsensical to claim the GPL on kernel headers means
 that userspace needs to be GPL.  So please ignore this licensing
 red-herring.

You're missing context, I suspect.  This series is trying to
automatically copy the headers from Linux.  We currently have a manually
copied version.

The headers are not currently well suited for automatic copying because
(1) they use kernel types (2) they pull in dependencies from throughout
the kernel.

This discussion comes down to whether we want to make it easier to
automatically copy the headers or do we maintain the status quo and
require manual munging to pull in the headers.

Regards,

Anthony Liguori


 And we'll bikeshed the headers in the standard when we have to :)  They
 certainly don't need to be cutpaste into the kernel sources.

 Cheers,
 Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-29 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 07:52:37AM -0500, Anthony Liguori wrote:
 1) C makes no guarantees about structure layout beyond the first
member.  Yes, if it's naturally aligned or has a packed attribute,
GCC does the right thing.  But this isn't kernel land anymore,
portability matters and there are more compilers than GCC.

 You expect a compiler to pad this structure:

 struct foo {
   uint8_t a;
   uint8_t b;
   uint16_t c;
   uint32_t d;
 };

 I'm guessing any compiler that decides to waste memory in this way
 will quickly get dropped by users and then we won't worry
 about building QEMU with it.

There are other responses in the thread here and I don't really care to
bikeshed on this issue.

 Well, given that virtio is widely deployed today, I would think the 1.0
 standard should strictly reflect what's deployed today, no?
 Any new config layout would be 2.0 material, right?

 Not as it's currently planned. Devices can choose
 to support a legacy layout in addition to the new one,
 and if you look at the patch you will see that that
 is exactly what it does.

Adding a new BAR most certainly requires bumping the revision ID or
changing the device ID, no?

Didn't we run into this problem with the virtio-win drivers with just
the BAR size changing? 

 Re: the new config layout, I don't think we would want to use it for
 anything but new devices.  Forcing a guest driver change

 There's no forcing.
 If you look at the patches closely, you will see that
 we still support the old layout on BAR0.


 is a really big
 deal and I see no reason to do that unless there's a compelling reason
 to.

 There are many a compelling reasons, and they are well known
 limitations of virtio PCI:

 - PCI spec compliance (madates device operation with IO memory
 disabled).

PCI express spec.  We are fully compliant with the PCI spec.  And what's
the user visible advantage of pointing an emulated virtio device behind
a PCI-e bus verses a legacy PCI bus?

This is a very good example because if we have to disable BAR0, then
it's an ABI breaker plan and simple.

 - support 64 bit addressing

We currently support 44-bit addressing for the ring.  While I agree we
need to bump it, there's no immediate problem with 44-bit addressing.

 - add more than 32 feature bits.
 - individually disable queues.
 - sanely support cross-endian systems.
 - support very small (1 PAGE) for virtio rings.
 - support a separate page for each vq kick.
 - make each device place config at flexible offset.

None of these things are holding us back today.

I'm not saying we shouldn't introduce a new device.  But adoption of
that device will be slow and realistically will be limited to new
devices only.

We'll be supporting both devices for a very, very long time.

Compatibility is the fundamental value that we provide.  We need to go
out of our way to make sure that existing guests work and work as well
as possible.

Sticking virtio devices behind a PCI-e bus just for the hell of it isn't
a compelling reason to break existing guests.

Regards,

Anthony Liguori


 Addressing any one of these would cause us to add a substantially new
 way to operate virtio devices.

 And since it's a guest change anyway, it seemed like a
 good time to do the new layout and fix everything in one go.

 And they are needed like yesterday.


 So we're stuck with the 1.0 config layout for a very long time.
 
 Regards,
 
 Anthony Liguori

 Absolutely. This patch let us support both which will allow for
 a gradual transition over the next 10 years or so.

  reason.  I suggest that's 2.0 material...
 
  Cheers,
  Rusty.
 
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-29 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Wed, May 29, 2013 at 08:05:29AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Anthony Liguori anth...@codemonkey.ws writes:
  The headers say they are BSD licensed... but they include a GPLv2+
  header.  Doesn't make a lot of sense, does it?
 
  It makes perfect sense: you're overthinking it.  It just means that
  copying the BSD headers outside Linux is encouraged.
 
 Copying by hand and modifying.  This patch series attempts to do it
 automatically within QEMU.
 
  And it's clearly nonsensical to claim the GPL on kernel headers means
  that userspace needs to be GPL.  So please ignore this licensing
  red-herring.
 
 You're missing context, I suspect.  This series is trying to
 automatically copy the headers from Linux.  We currently have a manually
 copied version.
 
 The headers are not currently well suited for automatic copying because
 (1) they use kernel types (2) they pull in dependencies from throughout
 the kernel.
 
 This discussion comes down to whether we want to make it easier to
 automatically copy the headers or do we maintain the status quo and
 require manual munging to pull in the headers.
 
 Regards,
 
 Anthony Liguori

 Okay, what if I

 1. add a stub if_ether.h with a couple of macros we want
 2. replace the types during copying

 Would this address all concerns?

If Rusty doesn't like the idea of making the headers usable directly,
then I don't object to having stubs/scripts to sanitize them in QEMU.

Regards,

Anthony Liguori


 
  And we'll bikeshed the headers in the standard when we have to :)  They
  certainly don't need to be cutpaste into the kernel sources.
 
  Cheers,
  Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 This adds support for new config, and is designed to work with
 the new layout code in Rusty's new layout branch.

 At the moment all fields are in the same memory BAR (bar 2).
 This will be used to test performance and compare
 memory, io and hypercall latency.

 Compiles but does not work yet.
 Migration isn't handled yet.

 It's not clear what do queue_enable/queue_disable
 fields do, not yet implemented.

 Gateway for config access with config cycles
 not yet implemented.

 Sending out for early review/flames.

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/virtio/virtio-pci.c | 393 
 +++--
  hw/virtio/virtio-pci.h |  55 +++
  hw/virtio/virtio.c |  20 +++
  include/hw/virtio/virtio.h |   4 +
  4 files changed, 458 insertions(+), 14 deletions(-)

 diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
 index 752991a..f4db224 100644
 --- a/hw/virtio/virtio-pci.c
 +++ b/hw/virtio/virtio-pci.c
 @@ -259,6 +259,26 @@ static void virtio_pci_stop_ioeventfd(VirtIOPCIProxy 
 *proxy)
  proxy-ioeventfd_started = false;
  }
  
 +static void virtio_pci_set_status(VirtIOPCIProxy *proxy, uint8_t val)
 +{
 +VirtIODevice *vdev = proxy-vdev;
 +
 +if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 +virtio_pci_stop_ioeventfd(proxy);
 +}
 +
 +virtio_set_status(vdev, val  0xFF);
 +
 +if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 +virtio_pci_start_ioeventfd(proxy);
 +}
 +
 +if (vdev-status == 0) {
 +virtio_reset(proxy-vdev);
 +msix_unuse_all_vectors(proxy-pci_dev);
 +}
 +}
 +
  static void virtio_ioport_write(void *opaque, uint32_t addr, uint32_t val)
  {
  VirtIOPCIProxy *proxy = opaque;
 @@ -293,20 +313,7 @@ static void virtio_ioport_write(void *opaque, uint32_t 
 addr, uint32_t val)
  }
  break;
  case VIRTIO_PCI_STATUS:
 -if (!(val  VIRTIO_CONFIG_S_DRIVER_OK)) {
 -virtio_pci_stop_ioeventfd(proxy);
 -}
 -
 -virtio_set_status(vdev, val  0xFF);
 -
 -if (val  VIRTIO_CONFIG_S_DRIVER_OK) {
 -virtio_pci_start_ioeventfd(proxy);
 -}
 -
 -if (vdev-status == 0) {
 -virtio_reset(proxy-vdev);
 -msix_unuse_all_vectors(proxy-pci_dev);
 -}
 +virtio_pci_set_status(proxy, val);
  
  /* Linux before 2.6.34 sets the device as OK without enabling
 the PCI device bus master bit. In this case we need to disable
 @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
 hwaddr addr,
  }
  }
  
 +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
 +  unsigned size)
 +{
 +VirtIOPCIProxy *proxy = opaque;
 +VirtIODevice *vdev = proxy-vdev;
 +
 +uint64_t low = 0xull;
 +
 +switch (addr) {
 +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
 +return proxy-device_feature_select;

Oh dear no...  Please use defines like the rest of QEMU.

From a QEMU pov, take a look at:

https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659

And:

https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb

Which lets virtio-pci be subclassable and then remaps the config space to
BAR2.

Haven't looked at the proposed new ring layout yet.

Regards,

Anthony Liguori

 +case offsetof(struct virtio_pci_common_cfg, device_feature):
 +/* TODO: 64-bit features */
 + return proxy-device_feature_select ? 0 : proxy-host_features;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature_select):
 +return proxy-guest_feature_select;
 +case offsetof(struct virtio_pci_common_cfg, guest_feature):
 +/* TODO: 64-bit features */
 + return proxy-guest_feature_select ? 0 : vdev-guest_features;
 +case offsetof(struct virtio_pci_common_cfg, msix_config):
 + return vdev-config_vector;
 +case offsetof(struct virtio_pci_common_cfg, num_queues):
 +/* TODO: more exact limit? */
 + return VIRTIO_PCI_QUEUE_MAX;
 +case offsetof(struct virtio_pci_common_cfg, device_status):
 +return vdev-status;
 +
 + /* About a specific virtqueue. */
 +case offsetof(struct virtio_pci_common_cfg, queue_select):
 +return  vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_size):
 +return virtio_queue_get_num(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_msix_vector):
 + return virtio_queue_vector(vdev, vdev-queue_sel);
 +case offsetof(struct virtio_pci_common_cfg, queue_enable):
 +/* TODO */
 + return 0;
 +case offsetof(struct virtio_pci_common_cfg, queue_notify_off):
 +return vdev-queue_sel;
 +case offsetof(struct virtio_pci_common_cfg, queue_desc):
 +return virtio_queue_get_desc_addr(vdev, vdev-queue_sel

Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR

2013-05-28 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Tue, May 28, 2013 at 12:15:16PM -0500, Anthony Liguori wrote:
  @@ -455,6 +462,226 @@ static void virtio_pci_config_write(void *opaque, 
  hwaddr addr,
   }
   }
   
  +static uint64_t virtio_pci_config_common_read(void *opaque, hwaddr addr,
  +  unsigned size)
  +{
  +VirtIOPCIProxy *proxy = opaque;
  +VirtIODevice *vdev = proxy-vdev;
  +
  +uint64_t low = 0xull;
  +
  +switch (addr) {
  +case offsetof(struct virtio_pci_common_cfg, device_feature_select):
  +return proxy-device_feature_select;
 
 Oh dear no...  Please use defines like the rest of QEMU.

 Any good reason not to use offsetof?
 I see about 138 examples in qemu.

There are exactly zero:

$ find . -name *.c -exec grep -l case offset {} \;
$

 Anyway, that's the way Rusty wrote it in the kernel header -
 I started with defines.
 If you convince Rusty to switch I can switch too,

We have 300+ devices in QEMU that use #defines.  We're not using this
kind of thing just because you want to copy code from the kernel.

 https://github.com/aliguori/qemu/commit/587c35c1a3fe90f6af0f97927047ef4d3182a659
 
 And:
 
 https://github.com/aliguori/qemu/commit/01ba80a23cf2eb1e15056f82b44b94ec381565cb
 
 Which lets virtio-pci be subclassable and then remaps the config space to
 BAR2.


 Interesting. Have the spec anywhere?

Not yet, but working on that.

 You are saying this is going to conflict because
 of BAR2 usage, yes?

No, this whole thing is flexible.  I had to use BAR2 because BAR0 has to
be the vram mapping.  It also had to be an MMIO bar.

The new layout might make it easier to implement a device like this.  I
shared it mainly because I wanted to show the subclassing idea vs. just
tacking an option onto the existing virtio-pci code in QEMU.

Regards,

Anthony Liguori

 So let's only do this virtio-fb only for new layout, so we don't need
 to maintain compatibility. In particular, we are working
 on making memory BAR access fast for virtio devices
 in a generic way. At the moment they are measureably slower
 than PIO on x86.


 Haven't looked at the proposed new ring layout yet.
 
 Regards,

 No new ring layout. It's new config layout.


 -- 
 MST
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-27 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Sun, May 26, 2013 at 07:55:25PM -0500, Anthony Liguori wrote:
 Michael S. Tsirkin m...@redhat.com writes:
 
  On Sun, May 26, 2013 at 03:49:53PM -0500, Anthony Liguori wrote:
  Paolo Bonzini pbonz...@redhat.com writes:
  
   Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto:
My fault.  I should have looked at linux/types.h (actually 
asm-generic/).
   
   Not really, __uX appear in the headers that were posted.
  
  Which is a problem because this is a reserved namespace in C99.
  
   What I'm saying is - a chance of a conflict is very remote,
   if it happens it's a build failure so easy to debug.
  
   I'm sure that others will complain, :) but you can go ahead.
  
  I think we should clean up the virtio headers in the kernel first.
  Seems like a good thing to do given the standardization effort too.
  There are lots of headers in uapi that use the C99 int types
 
  I found 4:
  $ grep -l uint include/uapi/linux/*h
  include/uapi/linux/dm-log-userspace.h
  include/uapi/linux/fuse.h
  include/uapi/linux/jffs2.h
  include/uapi/linux/pps.h
  include/uapi/linux/rds.h
  include/uapi/linux/sctp.h
 
  That's not really lots.
 
  so there
  doesn't seem to be a reason they couldn't be used in virtio.  fuse.h
  even has a:
  
  #ifdef __KERNEL__
  #include linux/types.h
  #else
  #include stdint.h
  #endif
  
  Which seems like a reasonable thing to do.
 
  In kernel, we really want to use things like endian-ness
  checks (and, we really should have them in userspace too!).
  So we want __le32 and other kernel goodies
  like that. stdint.h won't cut it.
 
 With the spec being standardized, I think having a stand alone set of
 headers is a good thing.

 Sure, that's possible. We'll have to find some way to
 preserve the endian-ness annotations, I think.
 And then import them into kernel/qemu with some script, converting
 to kernel/qemu style and annotations?

See below.

  Licensing is problematic here too.
 
 If virtio headers depend on other Linux headers, then it really doesn't
 matter if they are BSD licensed if you need a GPL header (like
 linux/if_ether.h).
 
 Now, we can certainly debate the copyrightability of these defines and
 what have you but if the headers are meant to be 1) consumed outside the
 kernel 2) licensed under a different license than the general kernel
 then depending on kernel goodies is the wrong strategy.

 Well specifically if_ether.h says GPLv2+ so it's OK for QEMU.
 Do you mean for some other non GPL app?

Ignore QEMU for the moment.

The headers say they are BSD licensed... but they include a GPLv2+
header.  Doesn't make a lot of sense, does it?

Again, I think it's something pretty basic here.  Either (1) these are
kernel/userspace headers that are meant to be consumed through libc or
whatever (2) these are kernel-private headers not to be consumed outside
of the kernel (3) these are headers meant to be copied widely and used
as a reference implementation of virtio.

If (1) or (2), copying them into QEMU via a magic sanitizing script is
wrong.  We should just keep doing what we're doing.

If (3), then we should clean up the virtio headers to be license clean
and usable outside of the kernel.  We shouldn't try to solve this
problem in QEMU (via scripts) if we can just solve it in the kernel.

Regards,

Anthony Liguori


  The only other kernel dependency is linux/if_ether.h to define
  ETH_ALEN.  But it seems completely reasonable to define a
  VIRTIO_NET_MAC_ALEN or something like that.
 
  Ugh. Not really reasonable IMO. We also use ETH_P_IP in code,
  would like to get rid of redefining that too.
  But we can have our own linux/if_ether.h for non-linux hosts,
  just with a
  couple of macros like that, it's not a big deal.
 
 See above.
 
 Regards,
 
 Anthony Liguori
 
 
  This would make the virtio headers completely stand alone and includable
  in userspace (without violating C99).
  
  Perhaps it's even worth moving the headers from uapi/linux to
  uapi/virtio.  Rusty, what do you think?
  
  Regards,
  
  Anhtony Liguori
  
  
   Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-27 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Paolo Bonzini pbonz...@redhat.com writes:

 Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto:
  My fault.  I should have looked at linux/types.h (actually asm-generic/).
 
 Not really, __uX appear in the headers that were posted.

 Which is a problem because this is a reserved namespace in C99.

 Personally, I find it hard to care.  What matters is not what the
 standard has carved out, but whether we have clashes, reserved namespace
 or no.  And that won't happen for these.

 If someone wants to convert all the kernel headers, I won't NAK it
 though.

virtio headers are special.  Linux headers are normally only consumed in
the kernel or in a userspace application running on Linux.

virtio headers may be used either in a userspace application running on
!Linux (we need to support QEMU on Windows) or even in a foreign kernel.

linux/types.h is unusable outside of Linux because it pulls in a bunch
of other headers.  If you look at Michael's patch, he adds his own
version of types.h.  It's unfortunate for every user of the headers to
do this.

Regards,

Anthony Liguori

 Perhaps it's even worth moving the headers from uapi/linux to
 uapi/virtio.  Rusty, what do you think?

 Hmm, #include virtio/virtio_net.h etc would be worthwhile if that also
 worked on FreeBSD.  Bryan CC'd...

 Cheers,
 Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-26 Thread Anthony Liguori

Paolo Bonzini pbonz...@redhat.com writes:

 Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto:
  My fault.  I should have looked at linux/types.h (actually asm-generic/).
 
 Not really, __uX appear in the headers that were posted.

Which is a problem because this is a reserved namespace in C99.

 What I'm saying is - a chance of a conflict is very remote,
 if it happens it's a build failure so easy to debug.

 I'm sure that others will complain, :) but you can go ahead.

I think we should clean up the virtio headers in the kernel first.
Seems like a good thing to do given the standardization effort too.

There are lots of headers in uapi that use the C99 int types so there
doesn't seem to be a reason they couldn't be used in virtio.  fuse.h
even has a:

#ifdef __KERNEL__
#include linux/types.h
#else
#include stdint.h
#endif

Which seems like a reasonable thing to do.

The only other kernel dependency is linux/if_ether.h to define
ETH_ALEN.  But it seems completely reasonable to define a
VIRTIO_NET_MAC_ALEN or something like that.

This would make the virtio headers completely stand alone and includable
in userspace (without violating C99).

Perhaps it's even worth moving the headers from uapi/linux to
uapi/virtio.  Rusty, what do you think?

Regards,

Anhtony Liguori


 Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH v2 00/11] qemu: use virtio linux headers in portable code

2013-05-26 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Sun, May 26, 2013 at 03:49:53PM -0500, Anthony Liguori wrote:
 Paolo Bonzini pbonz...@redhat.com writes:
 
  Il 26/05/2013 22:02, Michael S. Tsirkin ha scritto:
   My fault.  I should have looked at linux/types.h (actually 
   asm-generic/).
  
  Not really, __uX appear in the headers that were posted.
 
 Which is a problem because this is a reserved namespace in C99.
 
  What I'm saying is - a chance of a conflict is very remote,
  if it happens it's a build failure so easy to debug.
 
  I'm sure that others will complain, :) but you can go ahead.
 
 I think we should clean up the virtio headers in the kernel first.
 Seems like a good thing to do given the standardization effort too.
 There are lots of headers in uapi that use the C99 int types

 I found 4:
 $ grep -l uint include/uapi/linux/*h
 include/uapi/linux/dm-log-userspace.h
 include/uapi/linux/fuse.h
 include/uapi/linux/jffs2.h
 include/uapi/linux/pps.h
 include/uapi/linux/rds.h
 include/uapi/linux/sctp.h

 That's not really lots.

 so there
 doesn't seem to be a reason they couldn't be used in virtio.  fuse.h
 even has a:
 
 #ifdef __KERNEL__
 #include linux/types.h
 #else
 #include stdint.h
 #endif
 
 Which seems like a reasonable thing to do.

 In kernel, we really want to use things like endian-ness
 checks (and, we really should have them in userspace too!).
 So we want __le32 and other kernel goodies
 like that. stdint.h won't cut it.

With the spec being standardized, I think having a stand alone set of
headers is a good thing.  Licensing is problematic here too.

If virtio headers depend on other Linux headers, then it really doesn't
matter if they are BSD licensed if you need a GPL header (like
linux/if_ether.h).

Now, we can certainly debate the copyrightability of these defines and
what have you but if the headers are meant to be 1) consumed outside the
kernel 2) licensed under a different license than the general kernel
then depending on kernel goodies is the wrong strategy.

 The only other kernel dependency is linux/if_ether.h to define
 ETH_ALEN.  But it seems completely reasonable to define a
 VIRTIO_NET_MAC_ALEN or something like that.

 Ugh. Not really reasonable IMO. We also use ETH_P_IP in code,
 would like to get rid of redefining that too.
 But we can have our own linux/if_ether.h for non-linux hosts,
 just with a
 couple of macros like that, it's not a big deal.

See above.

Regards,

Anthony Liguori


 This would make the virtio headers completely stand alone and includable
 in userspace (without violating C99).
 
 Perhaps it's even worth moving the headers from uapi/linux to
 uapi/virtio.  Rusty, what do you think?
 
 Regards,
 
 Anhtony Liguori
 
 
  Paolo
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: updated: kvm networking todo wiki

2013-05-24 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Fri, May 24, 2013 at 05:41:11PM +0800, Jason Wang wrote:
 On 05/23/2013 04:50 PM, Michael S. Tsirkin wrote:
  Hey guys,
  I've updated the kvm networking todo wiki with current projects.
  Will try to keep it up to date more often.
  Original announcement below.
 
 Thanks a lot. I've added the tasks I'm currently working on to the wiki.
 
 btw. I notice the virtio-net data plane were missed in the wiki. Is the
 project still being considered?

 It might have been interesting several years ago, but now that linux has
 vhost-net in kernel, the only point seems to be to
 speed up networking on non-linux hosts.

Data plane just means having a dedicated thread for virtqueue processing
that doesn't hold qemu_mutex.

Of course we're going to do this in QEMU.  It's a no brainer.  But not
as a separate device, just as an improvement to the existing userspace
virtio-net.

 Since non-linux does not have kvm, I doubt virtio is a bottleneck.

FWIW, I think what's more interesting is using vhost-net as a networking
backend with virtio-net in QEMU being what's guest facing.

In theory, this gives you the best of both worlds: QEMU acts as a first
line of defense against a malicious guest while still getting the
performance advantages of vhost-net (zero-copy).

 IMO yet another networking backend is a distraction,
 and confusing to users.
 In any case, I'd like to see virtio-blk dataplane replace
 non dataplane first. We don't want two copies of
 virtio-net in qemu.

100% agreed.

Regards,

Anthony Liguori


  
 
  I've put up a wiki page with a kvm networking todo list,
  mainly to avoid effort duplication, but also in the hope
  to draw attention to what I think we should try addressing
  in KVM:
 
  http://www.linux-kvm.org/page/NetworkingTodo
 
  This page could cover all networking related activity in KVM,
  currently most info is related to virtio-net.
 
  Note: if there's no developer listed for an item,
  this just means I don't know of anyone actively working
  on an issue at the moment, not that no one intends to.
 
  I would appreciate it if others working on one of the items on this list
  would add their names so we can communicate better.  If others like this
  wiki page, please go ahead and add stuff you are working on if any.
 
  It would be especially nice to add autotest projects:
  there is just a short test matrix and a catch-all
  'Cover test matrix with autotest', currently.
 
  Currently there are some links to Red Hat bugzilla entries,
  feel free to add links to other bugzillas.
 
  Thanks!
 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH-v2 1/2] virtio-scsi: create VirtIOSCSICommon

2013-04-08 Thread Anthony Liguori

Nicholas A. Bellinger n...@linux-iscsi.org writes:

 From: Paolo Bonzini pbonz...@redhat.com

 This patch refactors existing virtio-scsi code into VirtIOSCSICommon
 in order to allow virtio_scsi_init_common() to be used by both internal
 virtio_scsi_init() and external vhost-scsi-pci code.

 Changes in Patch-v2:
- Move -get_features() assignment to virtio_scsi_init() instead of
  virtio_scsi_init_common()


Any reason we're not doing this as a QOM base class?

Similiar to how the in-kernel PIT/PIC work using a common base class...

Regards,

Anthony Liguori


 Signed-off-by: Paolo Bonzini pbonz...@redhat.com
 Cc: Michael S. Tsirkin m...@redhat.com
 Cc: Asias He as...@redhat.com
 Signed-off-by: Nicholas Bellinger n...@linux-iscsi.org
 ---
  hw/virtio-scsi.c |  192 
 +-
  hw/virtio-scsi.h |  130 --
  include/qemu/osdep.h |4 +
  3 files changed, 178 insertions(+), 148 deletions(-)

 diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
 index 8620712..c59e9c6 100644
 --- a/hw/virtio-scsi.c
 +++ b/hw/virtio-scsi.c
 @@ -18,118 +18,6 @@
  #include hw/scsi.h
  #include hw/scsi-defs.h
  
 -#define VIRTIO_SCSI_VQ_SIZE 128
 -#define VIRTIO_SCSI_CDB_SIZE32
 -#define VIRTIO_SCSI_SENSE_SIZE  96
 -#define VIRTIO_SCSI_MAX_CHANNEL 0
 -#define VIRTIO_SCSI_MAX_TARGET  255
 -#define VIRTIO_SCSI_MAX_LUN 16383
 -
 -/* Response codes */
 -#define VIRTIO_SCSI_S_OK   0
 -#define VIRTIO_SCSI_S_OVERRUN  1
 -#define VIRTIO_SCSI_S_ABORTED  2
 -#define VIRTIO_SCSI_S_BAD_TARGET   3
 -#define VIRTIO_SCSI_S_RESET4
 -#define VIRTIO_SCSI_S_BUSY 5
 -#define VIRTIO_SCSI_S_TRANSPORT_FAILURE6
 -#define VIRTIO_SCSI_S_TARGET_FAILURE   7
 -#define VIRTIO_SCSI_S_NEXUS_FAILURE8
 -#define VIRTIO_SCSI_S_FAILURE  9
 -#define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED   10
 -#define VIRTIO_SCSI_S_FUNCTION_REJECTED11
 -#define VIRTIO_SCSI_S_INCORRECT_LUN12
 -
 -/* Controlq type codes.  */
 -#define VIRTIO_SCSI_T_TMF  0
 -#define VIRTIO_SCSI_T_AN_QUERY 1
 -#define VIRTIO_SCSI_T_AN_SUBSCRIBE 2
 -
 -/* Valid TMF subtypes.  */
 -#define VIRTIO_SCSI_T_TMF_ABORT_TASK   0
 -#define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET   1
 -#define VIRTIO_SCSI_T_TMF_CLEAR_ACA2
 -#define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET   3
 -#define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET  4
 -#define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET   5
 -#define VIRTIO_SCSI_T_TMF_QUERY_TASK   6
 -#define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET   7
 -
 -/* Events.  */
 -#define VIRTIO_SCSI_T_EVENTS_MISSED0x8000
 -#define VIRTIO_SCSI_T_NO_EVENT 0
 -#define VIRTIO_SCSI_T_TRANSPORT_RESET  1
 -#define VIRTIO_SCSI_T_ASYNC_NOTIFY 2
 -#define VIRTIO_SCSI_T_PARAM_CHANGE 3
 -
 -/* Reasons for transport reset event */
 -#define VIRTIO_SCSI_EVT_RESET_HARD 0
 -#define VIRTIO_SCSI_EVT_RESET_RESCAN   1
 -#define VIRTIO_SCSI_EVT_RESET_REMOVED  2
 -
 -/* SCSI command request, followed by data-out */
 -typedef struct {
 -uint8_t lun[8];  /* Logical Unit Number */
 -uint64_t tag;/* Command identifier */
 -uint8_t task_attr;   /* Task attribute */
 -uint8_t prio;
 -uint8_t crn;
 -uint8_t cdb[];
 -} QEMU_PACKED VirtIOSCSICmdReq;
 -
 -/* Response, followed by sense data and data-in */
 -typedef struct {
 -uint32_t sense_len;  /* Sense data length */
 -uint32_t resid;  /* Residual bytes in data buffer */
 -uint16_t status_qualifier;   /* Status qualifier */
 -uint8_t status;  /* Command completion status */
 -uint8_t response;/* Response values */
 -uint8_t sense[];
 -} QEMU_PACKED VirtIOSCSICmdResp;
 -
 -/* Task Management Request */
 -typedef struct {
 -uint32_t type;
 -uint32_t subtype;
 -uint8_t lun[8];
 -uint64_t tag;
 -} QEMU_PACKED VirtIOSCSICtrlTMFReq;
 -
 -typedef struct {
 -uint8_t response;
 -} QEMU_PACKED VirtIOSCSICtrlTMFResp;
 -
 -/* Asynchronous notification query/subscription */
 -typedef struct {
 -uint32_t type;
 -uint8_t lun[8];
 -uint32_t event_requested;
 -} QEMU_PACKED VirtIOSCSICtrlANReq;
 -
 -typedef struct {
 -uint32_t event_actual;
 -uint8_t response;
 -} QEMU_PACKED VirtIOSCSICtrlANResp;
 -
 -typedef struct {
 -uint32_t event;
 -uint8_t lun[8];
 -uint32_t reason;
 -} QEMU_PACKED VirtIOSCSIEvent;
 -
 -typedef struct {
 -uint32_t num_queues;
 -uint32_t seg_max;
 -uint32_t max_sectors;
 -uint32_t cmd_per_lun;
 -uint32_t event_info_size;
 -uint32_t sense_size;
 -uint32_t cdb_size;
 -uint16_t max_channel;
 -uint16_t max_target;
 -uint32_t max_lun

Re: [PATCH 0/5] vhost-scsi: Add support for host virtualized target

2013-01-21 Thread Anthony Liguori

Nicholas A. Bellinger n...@linux-iscsi.org writes:

 Hi MST  Co,

 On Thu, 2013-01-17 at 18:43 +0200, Michael S. Tsirkin wrote:
 On Fri, Sep 07, 2012 at 06:48:14AM +, Nicholas A. Bellinger wrote:
  From: Nicholas Bellinger n...@linux-iscsi.org
  
  Hello Anthony  Co,
  
  This is the fourth installment to add host virtualized target support for
  the mainline tcm_vhost fabric driver using Linux v3.6-rc into QEMU 
  1.3.0-rc.
  
  The series is available directly from the following git branch:
  
 git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git 
  vhost-scsi-for-1.3
  
  Note the code is cut against yesterday's QEMU head, and dispite the name
  of the tree is based upon mainline qemu.org git code + has thus far been
  running overnight with  100K IOPs small block 4k workloads using v3.6-rc2+
  based target code with RAMDISK_DR backstores.
  
  Other than some minor fuzz between jumping from QEMU 1.2.0 - 1.2.50, this
  series is functionally identical to what's been posted for vhost-scsi 
  RFC-v3
  to qemu-devel.
  
  Please consider applying these patches for an initial vhost-scsi merge into
  QEMU 1.3.0-rc code, or let us know what else you'd like to see addressed 
  for
  this series to in order to merge.
  
  Thank you!
  
  --nab
 
 OK what's the status here?
 We missed 1.3 but let's try not to miss 1.4?
 

 Unfortunately, I've not been able to get back to the conversion
 requested by Paolo for a standalone vhost-scsi PCI device.

Is your git repo above up to date?  Perhaps I can find someone to help
out..

 At this point my hands are still full with iSER-target for-3.9 kernel
 code over the next weeks.  

 What's the v1.4 feature cut-off looking like at this point..?

Hard freeze is on february 1st but 1.5 opens up again on the 15th.  So
the release windows shouldn't have a major impact on merging...

Regards,

Anthony Liguori


 --nab

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Pv-drivers] [PATCH 0/6] VSOCK for Linux upstreaming

2012-11-15 Thread Anthony Liguori


On 11/07/2012 12:58 AM, Gerd Hoffmann wrote:

On 11/05/12 19:19, Andy King wrote:

Hi David,


The big and only question is whether anyone can actually use any of
this stuff without your proprietary bits?


Do you mean the VMCI calls?  The VMCI driver is in the process of being
upstreamed into the drivers/misc tree.  Greg (cc'd on these patches) is
actively reviewing that code and we are addressing feedback.

Also, there was some interest from RedHat into using vSockets as a unified
interface, routed over a hypervisor-specific transport (virtio or
otherwise, although for now VMCI is the only one implemented).


Can you outline how this can be done?  From a quick look over the code
it seems like vsock has a hard dependency on vmci, is that correct?

When making vsock a generic, reusable kernel service it should be the
other way around:  vsock should provide the core implementation and an
interface where hypervisor-specific transports (vmci, virtio, xenbus,
...) can register themself.


This was already done in a hypervisor neutral way using virtio:

http://lists.openwall.net/netdev/2008/12/14/8

The concept was Nacked and that led to the abomination of virtio-serial.  If an 
address family for virtualization is on the table, we should reconsider 
AF_VMCHANNEL.


I'd be thrilled to get rid of virtio-serial...

Regards,

Anthony Liguori



cheers,
   Gerd


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Gerd Hoffmann kra...@redhat.com writes:
 So how about this:

 (1) Add a vendor specific pci capability for new-style virtio.
 Specifies the pci bar used for new-style virtio registers.
 Guests can use it to figure whenever new-style virtio is
 supported and to map the correct bar (which will probably
 be bar 1 in most cases).

 This was closer to the original proposal[1], which I really liked (you
 can layout bars however you want).  Anthony thought that vendor
 capabilities were a PCI-e feature, but it seems they're blessed in PCI
 2.3.

2.3 was standardized in 2002.  Are we confident that vendor extensions
play nice with pre-2.3 OSes like Win2k, WinXP, etc?

I still think it's a bad idea to rely on something so new in something
as fundamental as virtio-pci unless we have to.

Regards,

Anthony Liguori


 So let's return to that proposal, giving something like this:

 /* IDs for different capabilities.  Must all exist. */
 /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
 /* Common configuration */
 #define VIRTIO_PCI_CAP_COMMON_CFG 1
 /* Notifications */
 #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
 /* ISR access */
 #define VIRTIO_PCI_CAP_ISR_CFG3
 /* Device specific confiuration */
 #define VIRTIO_PCI_CAP_DEVICE_CFG 4

 /* This is the PCI capability header: */
 struct virtio_pci_cap {
   u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
   u8 cap_next;/* Generic PCI field: next ptr. */
   u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
   u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
   u8 bar; /* Where to find it. */
   u8 unused;
   __le16 offset;  /* Offset within bar. */
   __le32 length;  /* Length. */
 };

 This means qemu can point the isr_cfg into the legacy area if it wants.
 In fact, it can put everything in BAR0 if it wants.

 Thoughts?
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)

 You will be supporting this for qemu on x86, sure.

And PPC.

 As I think we're
 still in the growth phase for virtio, I prioritize future spec
 cleanliness pretty high.

 But I think you'll be surprised how fast this is deprecated:
 1) Bigger queues for block devices (guest-specified ringsize)
 2) Smaller rings for openbios (guest-specified alignment)
 3) All-mmio mode (powerpc)
 4) Whatever network features get numbers  31.

We can do all of these things with incremental change to the existing
layout.  That's the only way what I'm suggesting is different.

You want to reorder all of the fields and have a driver flag day.  But I
strongly suspect we'll decide we need to do the same exercise again in 4
years when we now need to figure out how to take advantage of
transactional memory or some other whiz-bang hardware feature.

There are a finite number of BARs but each BAR has an almost infinite
size.  So extending BARs instead of introducing new one seems like the
conservative approach moving forward.

 I don't think we gain a lot by moving the ISR into a separate BAR.
 Splitting up registers like that seems weird to me too.

 Confused.  I proposed the same split as you have, just ISR by itself.

I disagree with moving the ISR into a separate BAR.  That's what seems
weird to me.

 It's very normal to have a mirrored set of registers that are PIO in one
 bar and MMIO in a different BAR.

 If we added an additional constraints that BAR1 was mirrored except for
 the config space and the MSI section was always there, I think the end
 result would be nice.  IOW:

 But it won't be the same, because we want all that extra stuff, like
 more feature bits and queue size alignment.  (Admittedly queues past
 16TB aren't a killer feature).

 To make it concrete:

 Current:
 struct {
 __le32 host_features;   /* read-only */
 __le32 guest_features;  /* read/write */
 __le32 queue_pfn;   /* read/write */
 __le16 queue_size;  /* read-only */
 __le16 queue_sel;   /* read/write */
 __le16 queue_notify;/* read/write */
 u8 status;  /* read/write */
 u8 isr; /* read-only, clear on read */
 /* Optional */
 __le16 msi_config_vector;   /* read/write */
 __le16 msi_queue_vector;/* read/write */
 /* ... device features */
 };

 Proposed:
 struct virtio_pci_cfg {
   /* About the whole device. */
   __le32 device_feature_select;   /* read-write */
   __le32 device_feature;  /* read-only */
   __le32 guest_feature_select;/* read-write */
   __le32 guest_feature;   /* read-only */
   __le16 msix_config; /* read-write */
   __u8 device_status; /* read-write */
   __u8 unused;

   /* About a specific virtqueue. */
   __le16 queue_select;/* read-write */
   __le16 queue_align; /* read-write, power of 2. */
   __le16 queue_size;  /* read-write, power of 2. */
   __le16 queue_msix_vector;/* read-write */
   __le64 queue_address;   /* read-write: 0x == DNE. */
 };

 struct virtio_pci_isr {
 __u8 isr; /* read-only, clear on read */
 };

What I'm suggesting is:

 struct {
 __le32 host_features;   /* read-only */
 __le32 guest_features;  /* read/write */
 __le32 queue_pfn;   /* read/write */
 __le16 queue_size;  /* read-only */
 __le16 queue_sel;   /* read/write */
 __le16 queue_notify;/* read/write */
 u8 status;  /* read/write */
 u8 isr; /* read-only, clear on read */
 __le16 msi_config_vector;   /* read/write */
 __le16 msi_queue_vector;/* read/write */
 __le32 host_feature_select; /* read/write */
 __le32 guest_feature_select;/* read/write */
 __le32 queue_pfn_hi;/* read/write */
 };


With the additional semantic that the virtio-config space is overlayed
on top of the register set in BAR0 unless the
VIRTIO_PCI_F_SEPARATE_CONFIG feature is acknowledged.  This feature
acts as a latch and when set, removes the config space overlay.

If the config space overlays the registers, the offset in BAR0 of the
overlay depends on whether MSI is enabled or not in the PCI device.

BAR1 is an MMIO mirror of BAR0 except that the config space is never
overlayed in BAR1 regardless of VIRTIO_PCI_F_SEPARATE_CONFIG.

BAR2 contains the config space.

A guest can look at BAR1 and BAR2 to determine whether they exist.

 We could also enforce LE in the per-device config space

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-09 Thread Anthony Liguori

Avi Kivity a...@redhat.com writes:

 On 10/09/2012 05:16 AM, Rusty Russell wrote:
 Anthony Liguori aligu...@us.ibm.com writes:
 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)
 
 You will be supporting this for qemu on x86, sure.  As I think we're
 still in the growth phase for virtio, I prioritize future spec
 cleanliness pretty high.

 If a pure ppc hypervisor was on the table, this might have been
 worthwhile.  As it is the codebase is shared, and the Linux drivers are
 shared, so cleaning up the spec doesn't help the code.

Note that distros have been (perhaps unknowingly) shipping virtio-pci
for PPC for some time now.

So even though there wasn't a hypervisor that supported virtio-pci, the
guests already support it and are out there in the wild.

There's a lot of value in maintaining legacy support even for PPC.
 
 But I think you'll be surprised how fast this is deprecated:
 1) Bigger queues for block devices (guest-specified ringsize)
 2) Smaller rings for openbios (guest-specified alignment)
 3) All-mmio mode (powerpc)
 4) Whatever network features get numbers  31.
 
 I don't think we gain a lot by moving the ISR into a separate BAR.
 Splitting up registers like that seems weird to me too.
 
 Confused.  I proposed the same split as you have, just ISR by itself.

 I believe Anthony objects to having the ISR by itself.  What is the
 motivation for that?

Right, BARs are a precious resource not to be spent lightly.  Having an
entire BAR dedicated to a 1-byte register seems like a waste to me.

Regards,

Anthony Liguori



 -- 
 error compiling committee.c: too many arguments to function
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 (Topic updated, cc's trimmed).

 Anthony Liguori aligu...@us.ibm.com writes:
 Rusty Russell ru...@rustcorp.com.au writes:
 4) The only significant change to the spec is that we use PCI
capabilities, so we can have infinite feature bits.
(see 
 http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

 We discussed this on IRC last night.  I don't think PCI capabilites are
 a good mechanism to use...

 PCI capabilities are there to organize how the PCI config space is
 allocated to allow vendor extensions to co-exist with future PCI
 extensions.

 But we've never used the PCI config space within virtio-pci.  We do
 everything in BAR0.  I don't think there's any real advantage of using
 the config space vs. a BAR for virtio-pci.

 Note before anyone gets confused; we were talking about using the PCI
 config space to indicate what BAR(s) the virtio stuff is in.  An
 alternative would be to simply specify a new layout format in BAR1.

 The arguments for a more flexible format that I know of:

 1) virtio-pci has already extended the pci-specific part of the
configuration once (for MSI-X), so I don't want to assume it won't
happen again.

configuration is the wrong word here.

The virtio-pci BAR0 layout is:

   0..19   virtio-pci registers
   20+ virtio configuration space

MSI-X needed to add additional virtio-pci registers, so now we have:

   0..19   virtio-pci registers

if MSI-X:
   20..23  virtio-pci MSI-X registers
   24+ virtio configuration space
else:
   20+ virtio configuration space

I agree, this stinks.

But I think we could solve this in a different way.  I think we could
just move the virtio configuration space to BAR1 by using a transport
feature bit.

That then frees up the entire BAR0 for use as virtio-pci registers.  We
can then always include the virtio-pci MSI-X register space and
introduce all new virtio-pci registers as simply being appended.

This new feature bit then becomes essentially a virtio configuration
latch.  When unacked, virtio configuration hides new registers, when
acked, those new registers are exposed.

Another option is to simply put new registers after the virtio
configuration blob.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.

I think the rationale is that ISR really needs to be PIO but everything
else doesn't.  PIO is much faster on x86 because it doesn't require
walking page tables or instruction emulation to handle the exit.

The argument to move the remaining registers to MMIO is to allow 64-bit
accesses to registers which isn't possible with PIO.

 This maps really nicely to non-PCI transports too.

 This isn't right.  Noone else can use the PCI layout.  While parts are
 common, other parts are pci-specific (MSI-X and ISR for example), and
 yet other parts are specified by PCI elsewhere (eg interrupt numbers).

 But extending the
 PCI config space (especially dealing with capability allocation) is
 pretty gnarly and there isn't an obvious equivalent outside of PCI.

 That's OK, because general changes should be done with feature bits, and
 the others all have an infinite number.  Being the first, virtio-pci has
 some unique limitations we'd like to fix.

 There are very devices that we emulate today that make use of extended
 PCI device registers outside the platform devices (that have no BARs).

 This sentence confused me?

There is a missing few.  There are very few devices...

Extending the PCI configuration space is unusual for PCI devices.  That
was the point.

Regards,

Anthony Liguori


 Thanks,
 Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori

Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 But I think we could solve this in a different way.  I think we could
 just move the virtio configuration space to BAR1 by using a transport
 feature bit.

 Why hard-code stuff?

 I think it makes alot of sense to have a capability simliar to msi-x
 which simply specifies bar and offset of the register sets:

 [root@fedora ~]# lspci -vvs4
 00:04.0 SCSI storage controller: Red Hat, Inc Virtio block device
 [ ... ]
   Region 0: I/O ports at c000 [size=64]
   Region 1: Memory at fc029000 (32-bit) [size=4K]
   Capabilities: [40] MSI-X: Enable+ Count=2 Masked-
   Vector table: BAR=1 offset=
   PBA: BAR=1 offset=0800

MSI-X capability is a standard PCI capability which is why lspci can
parse it.


 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800

This would be a vendor specific PCI capability so lspci wouldn't
automatically know how to parse it.

You could just as well teach lspci to parse BAR0 to figure out what
features are supported.

 That then frees up the entire BAR0 for use as virtio-pci registers.  We
 can then always include the virtio-pci MSI-X register space and
 introduce all new virtio-pci registers as simply being appended.

 BAR0 needs to stay as-is for compatibility reasons.  New devices which
 don't have to care about old guests don't need to provide a 'legacy'
 register region.

A latch feature bit would allow the format to change without impacting
compatibility at all.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.
 
 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...

It wasn't that long ago that MSI-X wasn't supported..  I think we should
continue to keep ISR as PIO as it is a fast path.

Regards,

Anthony Liguori


 cheers,
   Gerd
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori

Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800
 
 This would be a vendor specific PCI capability so lspci wouldn't
 automatically know how to parse it.

 Sure, would need a patch to actually parse+print the cap,
 /me was just trying to make my point clear in a simple way.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.

 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...
 
 It wasn't that long ago that MSI-X wasn't supported..  I think we should
 continue to keep ISR as PIO as it is a fast path.

 No problem if we allow to have both legacy layout and new layout at the
 same time.  Guests can continue to use ISR @ BAR0 in PIO space for
 existing virtio devices, even in case they want use mmio for other
 registers - all fine.

 New virtio devices can support MSI-X from day one and decide to not
 expose a legacy layout PIO bar.

I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
virtio configuration space is probably not that bad of a solution.

Regards,

Anthony Liguori


 cheers,
   Gerd

 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-08 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 Gerd Hoffmann kra...@redhat.com writes:

   Hi,

 So we could have for virtio something like this:

 Capabilities: [??] virtio-regs:
 legacy: BAR=0 offset=0
 virtio-pci: BAR=1 offset=1000
 virtio-cfg: BAR=1 offset=1800
 
 This would be a vendor specific PCI capability so lspci wouldn't
 automatically know how to parse it.

 Sure, would need a patch to actually parse+print the cap,
 /me was just trying to make my point clear in a simple way.

 2) ISTR an argument about mapping the ISR register separately, for
performance, but I can't find a reference to it.

 I think the rationale is that ISR really needs to be PIO but everything
 else doesn't.  PIO is much faster on x86 because it doesn't require
 walking page tables or instruction emulation to handle the exit.

 Is this still a pressing issue?  With MSI-X enabled ISR isn't needed,
 correct?  Which would imply that pretty much only old guests without
 MSI-X support need this, and we don't need to worry that much when
 designing something new ...
 
 It wasn't that long ago that MSI-X wasn't supported..  I think we should
 continue to keep ISR as PIO as it is a fast path.

 No problem if we allow to have both legacy layout and new layout at the
 same time.  Guests can continue to use ISR @ BAR0 in PIO space for
 existing virtio devices, even in case they want use mmio for other
 registers - all fine.

 New virtio devices can support MSI-X from day one and decide to not
 expose a legacy layout PIO bar.

 I think having BAR1 be an MMIO mirror of the registers + a BAR2 for
 virtio configuration space is probably not that bad of a solution.

 Well, we also want to clean up the registers, so how about:

 BAR0: legacy, as is.  If you access this, don't use the others.
 BAR1: new format virtio-pci layout.  If you use this, don't use BAR0.
 BAR2: virtio-cfg.  If you use this, don't use BAR0.
 BAR3: ISR. If you use this, don't use BAR0.

 I prefer the cases exclusive (ie. use one or the other) as a clear path
 to remove the legacy layout; and leaving the ISR in BAR0 leaves us with
 an ugly corner case in future (ISR is BAR0 + 19?  WTF?).

We'll never remove legacy so we shouldn't plan on it.  There are
literally hundreds of thousands of VMs out there with the current virtio
drivers installed in them.  We'll be supporting them for a very, very
long time :-)

I don't think we gain a lot by moving the ISR into a separate BAR.
Splitting up registers like that seems weird to me too.

It's very normal to have a mirrored set of registers that are PIO in one
bar and MMIO in a different BAR.

If we added an additional constraints that BAR1 was mirrored except for
the config space and the MSI section was always there, I think the end
result would be nice.  IOW:

BAR0[pio]: virtio-pci registers + optional MSI section + virtio-config
BAR1[mmio]: virtio-pci registers + MSI section + future extensions
BAR2[mmio]: virtio-config

We can continue to do ISR access via BAR0 for performance reasons.

 As to MMIO vs PIO, the BARs are self-describing, so we should explicitly
 endorse that and leave it to the devices.

 The detection is simple: if BAR1 has non-zero length, it's new-style,
 otherwise legacy.

I agree that this is the best way to extend, but I think we should still
use a transport feature bit.  We want to be able to detect within QEMU
whether a guest is using these new features because we need to adjust
migration state accordingly.

Otherwise we would have to detect reads/writes to the new BARs to
maintain whether the extended register state needs to be saved.  This
gets nasty dealing with things like reset.

A feature bit simplifies this all pretty well.

Regards,

Anthony Liguori


 Thoughts?
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Proposal for virtio standardization.

2012-10-04 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Hi all,

   I've had several requests for a more formal approach to the
 virtio draft spec, and (after some soul-searching) I'd like to try that.

   The proposal is to use OASIS as the standards body, as it's
 fairly light-weight as these things go.  For me this means paperwork and
 setting up a Working Group and getting the right people involved as
 Voting members starting with the current contributors; for most of you
 it just means a new mailing list, though I'll be cross-posting any
 drafts and major changes here anyway.

   I believe that a documented standard (aka virtio 1.0) will
 increase visibility and adoption in areas outside our normal linux/kvm
 universe.  There's been some of that already, but this is the clearest
 path to accelerate it.  Not the easiest path, but I believe that a solid
 I/O standard is a Good Thing for everyone.

   Yet I also want to decouple new and experimental development
 from the standards effort; running code comes first.  New feature bits
 and new device numbers should be reservable without requiring a full
 spec change.

 So the essence of my proposal is:
 1) I start a Working Group within OASIS where we can aim for virtio spec
1.0.

 2) The current spec is textually reordered so the core is clearly
bus-independent, with PCI, mmio, etc appendices.

 3) Various clarifications, formalizations and cleanups to the spec text,
and possibly elimination of old deprecated features.

 4) The only significant change to the spec is that we use PCI
capabilities, so we can have infinite feature bits.
(see 
 http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

We discussed this on IRC last night.  I don't think PCI capabilites are
a good mechanism to use...

PCI capabilities are there to organize how the PCI config space is
allocated to allow vendor extensions to co-exist with future PCI
extensions.

But we've never used the PCI config space within virtio-pci.  We do
everything in BAR0.  I don't think there's any real advantage of using
the config space vs. a BAR for virtio-pci.

The nice thing about using a BAR is that the region is MMIO or PIO.
This maps really nicely to non-PCI transports too.  But extending the
PCI config space (especially dealing with capability allocation) is
pretty gnarly and there isn't an obvious equivalent outside of PCI.

There are very devices that we emulate today that make use of extended
PCI device registers outside the platform devices (that have no BARs).

Regards,

Anthony Liguori


 5) Changes to the ring layout and other such things are deferred to a
future virtio version; whether this is done within OASIS or
externally depends on how well this works for the 1.0 release.

 Thoughts?
 Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

This is a bug in the specification.

The QEMU implementation pre-dates the specification.  All of the actual
implementations of virtio relied on the semantics of s/g elements and
still do.

What's in the specification really doesn't matter when it doesn't agree
with all of the existing implementations.

Users use implementations, not specifications.  The specification really
ought to be changed here.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 So my plan was to tie this assumption to the new PCI layout.  And have a
 stress-testing patch like the one below in the kernel (see my virtio-wip
 branch for stuff like this).  Turn it on at boot with
 virtio_ring.torture on the kernel commandline.

 BTW, I've fixed lguest, but my kvm here (Ubuntu precise, kvm-qemu 1.0)
 is too old.  Building the latest git now...

 Cheers,
 Rusty.

 Subject: virtio: CONFIG_VIRTIO_DEVICE_TORTURE

 Virtio devices are not supposed to depend on the framing of the scatter-gather
 lists, but various implementations did.  Safeguard this in future by adding
 an option to deliberately create perverse descriptors.

 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

Ignore framing is really a bad idea.  You want backends to enforce
reasonable framing because guest's shouldn't do silly things with framing.

For instance, with virtio-blk, if you want decent performance, you
absolutely want to avoid bouncing the data.  If you're using O_DIRECT in
the host to submit I/O requests, then it's critical that all of the s/g
elements are aligned to a sector boundary and sized to a sector
boundary.

Yes, QEMU can handle if that's not the case, but it would be insanely
stupid for a guest not to do this.  This is the sort of thing that ought
to be enforced in the specification because a guest cannot perform well
if it doesn't follow these rules.

A spec isn't terribly useful if the result is guest drivers that are
slow.  There's very little to gain by not enforcing rules around framing
and there's a lot to lose if a guest frames incorrectly.

In the rare case where we want to make a framing change, we should use
feature bits like Michael is proposing.

In this case, we should simply say that with the feature bit, the vnet
header can be in the same element as the data but not allow the header
to be spread across multiple elements.

Regards,

Anthony Liguori


 diff --git a/drivers/virtio/Kconfig b/drivers/virtio/Kconfig
 index 8d5bddb..930a4ea 100644
 --- a/drivers/virtio/Kconfig
 +++ b/drivers/virtio/Kconfig
 @@ -5,6 +5,15 @@ config VIRTIO
 bus, such as CONFIG_VIRTIO_PCI, CONFIG_VIRTIO_MMIO, CONFIG_LGUEST,
 CONFIG_RPMSG or CONFIG_S390_GUEST.
  
 +config VIRTIO_DEVICE_TORTURE
 + bool Virtio device torture tests
 + depends on VIRTIO  DEBUG_KERNEL
 + help
 +   This makes the virtio_ring implementation creatively change
 +   the format of requests to make sure that devices are
 +   properly implemented.  This will make your virtual machine
 +   slow *and* unreliable!  Say N.
 +
  menu Virtio drivers
  
  config VIRTIO_PCI
 diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c
 index e639584..8893753 100644
 --- a/drivers/virtio/virtio_ring.c
 +++ b/drivers/virtio/virtio_ring.c
 @@ -124,6 +124,149 @@ struct vring_virtqueue
  
  #define to_vvq(_vq) container_of(_vq, struct vring_virtqueue, vq)
  
 +#ifdef CONFIG_VIRTIO_DEVICE_TORTURE
 +static bool torture;
 +module_param(torture, bool, 0644);
 +
 +struct torture {
 + unsigned int orig_out, orig_in;
 + void *orig_data;
 + struct scatterlist sg[4];
 + struct scatterlist orig_sg[];
 +};
 +
 +static size_t tot_len(struct scatterlist sg[], unsigned num)
 +{
 + size_t len, i;
 +
 + for (len = 0, i = 0; i  num; i++)
 + len += sg[i].length;
 +
 + return len;
 +}
 +
 +static void copy_sg_data(const struct scatterlist *dst, unsigned dnum,
 +  const struct scatterlist *src, unsigned snum)
 +{
 + unsigned len;
 + struct scatterlist s, d;
 +
 + s = *src;
 + d = *dst;
 +
 + while (snum  dnum) {
 + len = min(s.length, d.length);
 + memcpy(sg_virt(d), sg_virt(s), len);
 + d.offset += len;
 + d.length -= len;
 + s.offset += len;
 + s.length -= len;
 + if (!s.length) {
 + BUG_ON(snum == 0);
 + src++;
 + snum--;
 + s = *src;
 + }
 + if (!d.length) {
 + BUG_ON(dnum == 0);
 + dst++;
 + dnum--;
 + d = *dst;
 + }
 + }
 +}
 +
 +static bool torture_replace(struct scatterlist **sg,
 +  unsigned int *out,
 +  unsigned int *in,
 +  void **data,
 +  gfp_t gfp)
 +{
 + static size_t seed;
 + struct torture *t;
 + size_t outlen, inlen, ourseed, len1;
 + void *buf;
 +
 + if (!torture)
 + return true;
 +
 + outlen = tot_len(*sg, *out);
 + inlen = tot_len(*sg

Re: [PATCH 0/3] virtio-net: inline header support

2012-10-03 Thread Anthony Liguori

Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori anth...@codemonkey.ws writes:
 Rusty Russell ru...@rustcorp.com.au writes:

 Michael S. Tsirkin m...@redhat.com writes:

 Thinking about Sasha's patches, we can reduce ring usage
 for virtio net small packets dramatically if we put
 virtio net header inline with the data.
 This can be done for free in case guest net stack allocated
 extra head room for the packet, and I don't see
 why would this have any downsides.

 I've been wanting to do this for the longest time... but...

 Even though with my recent patches qemu
 no longer requires header to be the first s/g element,
 we need a new feature bit to detect this.
 A trivial qemu patch will be sent separately.

 There's a reason I haven't done this.  I really, really dislike my
 implemention isn't broken feature bits.  We could have an infinite
 number of them, for each bug in each device.

 This is a bug in the specification.

 The QEMU implementation pre-dates the specification.  All of the actual
 implementations of virtio relied on the semantics of s/g elements and
 still do.

 lguest fix is pending in my queue.  lkvm and qemu are broken; lkvm isn't
 ever going to be merged, so I'm not sure what its status is?  But I'm
 determined to fix qemu, and hence my torture patch to make sure this
 doesn't creep in again.

There are even more implementations out there and I'd wager they all
rely on framing.

 What's in the specification really doesn't matter when it doesn't agree
 with all of the existing implementations.

 Users use implementations, not specifications.  The specification really
 ought to be changed here.

 I'm sorely tempted, except that we're losing a real optimization because
 of this :(

What optimizations?  What Michael is proposing is still achievable with
a device feature.  Are there other optimizations that can be achieved by
changing framing that we can't achieve with feature bits?

As I mentioned in another note, bad framing decisions can cause
performance issues too...

 The specification has long contained the footnote:

 The current qemu device implementations mistakenly insist that
 the first descriptor cover the header in these cases exactly, so
 a cautious driver should arrange it so.

I seem to recall this being a compromise between you and I..  I think
I objected strongly to this back when you first wrote the spec and you
added this to appease me ;-)

Regards,

Anthony Liguori


 I'd like to tie this caveat to the PCI capability change, so this note
 will move to the appendix with the old PCI layout.

 Cheers,
 Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 4/5] virtio-scsi: Add start/stop functionality for vhost-scsi

2012-09-11 Thread Anthony Liguori


On 09/10/2012 01:24 AM, Michael S. Tsirkin wrote:

On Mon, Sep 10, 2012 at 08:16:54AM +0200, Paolo Bonzini wrote:

Il 09/09/2012 00:40, Michael S. Tsirkin ha scritto:

On Fri, Sep 07, 2012 at 06:00:50PM +0200, Paolo Bonzini wrote:

Il 07/09/2012 08:48, Nicholas A. Bellinger ha scritto:

Cc: Stefan Hajnoczistefa...@linux.vnet.ibm.com
Cc: Zhi Yong Wuwu...@linux.vnet.ibm.com
Cc: Michael S. Tsirkinm...@redhat.com
Cc: Paolo Bonzinipbonz...@redhat.com
Signed-off-by: Nicholas Bellingern...@linux-iscsi.org
---
  hw/virtio-pci.c  |2 ++
  hw/virtio-scsi.c |   49 +
  hw/virtio-scsi.h |1 +
  3 files changed, 52 insertions(+), 0 deletions(-)


Please create a completely separate device vhost-scsi-pci instead (or
virtio-scsi-tcm-pci, or something like that).  It is used completely
differently from virtio-scsi-pci, it does not make sense to conflate the
two.


Ideally the name would say how it is different, not what backend it
uses. Any good suggestions?


I chose the backend name because, ideally, there would be no other
difference.  QEMU _could_ implement all the goodies in vhost-scsi (such
as reservations or ALUA), it just doesn't do that yet.

Paolo


Then why do you say It is used completely differently from
virtio-scsi-pci?  Isn't it just a different backend?

If yes then it should be a backend option, like it is
for virtio-net.


I don't mean to bike shed here so don't take this as a nack on making it a 
backend option, but in retrospect, the way we did vhost-net was a mistake even 
though I strongly advocated for it to be a backend option.


The code to do it is really, really ugly.  I think it would have made a lot more 
sense to just make it a device and then have it not use a netdev backend or any 
other kind of backend split.


For instance:

qemu -device vhost-net-pci,tapfd=X

I know this breaks the model of separate backends and frontends but since 
vhost-net absolutely requires a tap fd, I think it's better in the long run to 
not abuse the netdev backend to prevent user confusion.  Having a dedicated 
backend type that only has one possible option and can only be used by one 
device is a bit silly too.


So I would be in favor of dropping/squashing 3/5 and radically simplifying how 
this was exposed to the user.


I would just take qemu_vhost_scsi_opts and make them device properties.

Regards,

Anthony Liguori





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

 Actually for me a more important question is how does it compare
 with virtio-blk dataplane?

I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.

There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.

Regards,

Anthony Liguori


 -- 
 MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-19 Thread Anthony Liguori

 = vhost_blk_set_status(blk, req-status, status);
 + if (!ret)
 + vhost_add_used_and_signal(blk-dev, vq, head, ret);
 + break;
 + case VIRTIO_BLK_T_GET_ID:
 + /* TODO: need a real ID string */
 + ret = snprintf(vq-iov[BLK_HDR + 1].iov_base,
 +VIRTIO_BLK_ID_BYTES, VHOST-BLK-DISK);
 + status = ret  0 ? VIRTIO_BLK_S_IOERR : VIRTIO_BLK_S_OK;
 + ret = vhost_blk_set_status(blk, req-status, status);
 + if (!ret)
 + vhost_add_used_and_signal(blk-dev, vq, head,
 +   VIRTIO_BLK_ID_BYTES);
 + break;
 + default:
 + pr_warn(Unsupported request type %d\n, hdr-type);
 + vhost_discard_vq_desc(vq, 1);
 + ret = -EFAULT;
 + break;
 + }

There doesn't appear to be any error handling in the event that
vhost_blk_io_submit fails.  It would appear that you leak the ring queue
entry since you never push anything onto the used queue.

I think you need to handle io_submit() failing too with EAGAIN.  Relying
on min nr_events == queue_size seems like a dangerous assumption to me
particularly since queue_size tends to be very large and max_nr_events
is a fixed size global pool.

To properly handle EAGAIN, you effectively need to implement flow
control and back off reading the virtqueue until you can submit requests again.

Of course, the million dollar question is why would using AIO in the
kernel be faster than using AIO in userspace?

When using eventfd, there is no heavy weight exit on the notify path.
It should just be the difference between scheduling a kernel thread vs
scheduling a userspace thread.  There's simply no way that that's a 60%
difference in performance.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori


On 07/17/2012 04:50 PM, Nicholas A. Bellinger wrote:

On Tue, 2012-07-17 at 13:55 -0500, Anthony Liguori wrote:

On 07/17/2012 10:05 AM, Michael S. Tsirkin wrote:

On Wed, Jul 11, 2012 at 09:15:00PM +, Nicholas A. Bellinger wrote:


SNIP



It still seems not 100% clear whether this driver will have major
userspace using it. And if not, it would be very hard to support a driver
when recent userspace does not use it in the end.


I don't think this is a good reason to exclude something from the kernel.
However, there are good reasons why this doesn't make sense for something like
QEMU--specifically because we have a large number of features in our block layer
that tcm_vhost would bypass.



I can definitely appreciate your concern here as the QEMU maintainer.


But perhaps it makes sense for something like native kvm tool.  And if it did go
into the kernel, we would certainly support it in QEMU.



...


But I do think the kernel should carefully consider whether it wants to support
an interface like this.  This an extremely complicated ABI with a lot of subtle
details around state and compatibility.

Are you absolutely confident that you can support a userspace application that
expects to get exactly the same response from all possible commands in 20 kernel
versions from now?  Virtualization requires absolutely precise compatibility in
terms of bugs and features.  This is probably not something the TCM stack has
had to consider yet.



We most certainly have thought about long term userspace compatibility
with TCM.  Our userspace code (that's now available in all major
distros) is completely forward-compatible with new fabric modules such
as tcm_vhost.  No update required.


I'm not sure we're talking about the same thing when we say compatibility.

I'm not talking about the API.  I'm talking about the behavior of the commands 
that tcm_vhost supports.


If you add support for a new command, you need to provide userspace a way to 
disable this command.  If you change what gets reported for VPD, you need to 
provide userspace a way to make VPD look like what it did in a previous version.


Basically, you need to be able to make a TCM device behave 100% the same as it 
did in an older version of the kernel.


This is unique to virtualization due to live migration.  If you migrate from a 
3.6 kernel to a 3.8 kernel, you need to make sure that the 3.8 kernel's TCM 
device behaves exactly like the 3.6 kernel because the guest that is interacting 
with it does not realize that live migration happened.


Yes, you can add knobs via configfs to control this behavior, but I think the 
question is, what's the plan for this?


BTW, I think this is a good thing to cover in Documentation/vhost/tcm_vhost.txt. 
 I think that's probably the only change that's needed here.


Regards,

Anthony Liguori



Also, by virtue of the fact that we are using configfs + rtslib (python
object library) on top, it's very easy to keep any type of compatibility
logic around in python code.  With rtslib, we are able to hide configfs
ABI changes from higher level apps.

So far we've had a track record of 100% userspace ABI compatibility in
mainline since .38, and I don't intend to merge a patch that breaks this
any time soon.  But if that ever happens, apps using rtslib are not
going to be effected.


I think a good idea for 3.6 would be to make it depend on CONFIG_STAGING.
Then we don't commit to an ABI.


I think this is a good idea.  Even if it goes in, a really clear policy would be
needed wrt the userspace ABI.

While tcm_vhost is probably more useful than vhost_blk, it's a much more complex
ABI to maintain.



As far as I am concerned, the kernel API (eg: configfs directory layout)
as it is now in sys/kernel/config/target/vhost/ is not going to change.
It's based on the same drivers/target/target_core_fabric_configfs.c
generic layout that we've had since .38.

The basic functional fabric layout in configfs is identical (with fabric
dependent WWPN naming of course) regardless of fabric driver, and by
virtue of being generic it means we can add things like fabric dependent
attributes + parameters in the future for existing fabrics without
breaking userspace.

So while I agree the ABI is more complex than vhost-blk, the logic in
target_core_fabric_configfs.c is a basic ABI fabric definition that we
are enforcing across all fabric modules in mainline for long term
compatibility.

--nab



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori


On 07/18/2012 10:53 AM, Christoph Hellwig wrote:

On Wed, Jul 18, 2012 at 08:42:21AM -0500, Anthony Liguori wrote:


If you add support for a new command, you need to provide userspace
a way to disable this command.  If you change what gets reported for
VPD, you need to provide userspace a way to make VPD look like what
it did in a previous version.

Basically, you need to be able to make a TCM device behave 100% the
same as it did in an older version of the kernel.

This is unique to virtualization due to live migration.  If you
migrate from a 3.6 kernel to a 3.8 kernel, you need to make sure
that the 3.8 kernel's TCM device behaves exactly like the 3.6 kernel
because the guest that is interacting with it does not realize that
live migration happened.


I don't think these strict live migration rules apply to SCSI targets.

Real life storage systems get new features and different behaviour with
firmware upgrades all the time, and SCSI initiators deal with that just
fine.  I don't see any reason to be more picky just because we're
virtualized.


But would this happen while a system is running live?

I agree that in general, SCSI targets don't need this, but I'm pretty sure that 
if a guest probes for a command, you migrate to an old version, and that command 
is no longer there, badness will ensue.


It's different when you're talking about a reboot happening or a 
disconnect/reconnect due to firmware upgrade.  The OS would naturally be 
reprobing in this case.


Regards,

Anthony Liguori





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-18 Thread Anthony Liguori


On 07/18/2012 11:47 AM, James Bottomley wrote:

On Wed, 2012-07-18 at 11:00 -0500, Anthony Liguori wrote:

Of course: Think about the consequences: you want to upgrade one array
on your SAN.  You definitely don't want to shut down your entire data
centre to achieve it.  In place upgrades on running SANs have been
common in enterprise environments for a while.


Would firmware upgrades ever result in major OS visible changes though?

Maybe OSes are more robust with SCSI than with other types of buses, but I don't 
think it's safe to completely ignore the problem.



I agree that in general, SCSI targets don't need this, but I'm pretty sure that
if a guest probes for a command, you migrate to an old version, and that command
is no longer there, badness will ensue.


What command are we talking about?  Operation of initiators is usually
just READ and WRITE.  So perhaps we might have inline UNMAP ... but the
world wouldn't come to an end even if the latter stopped working.


Is that true for all OSes?  Linux may handle things gracefully if UNMAP starts 
throwing errors but that doesn't mean that Windows will.


There are other cases where this creates problems.  Windows (and some other 
OSes) fingerprint the hardware profile in order to do license enforcement.  If 
the hardware changes beyond a certain amount, then they refuse to boot.


Windows does this with a points system and I do believe that INQUIRY responses 
from any local disks are included in this tally.



Most of the complex SCSI stuff is done at start of day; it's actually
only then we'd notice things like changes in INQUIRY strings or mode
pages.

Failover, which is what you're talking about, requires reinstatement of
all the operating parameters of the source/target system, but that's not
wholly the responsibility of the storage system ...


It's the responsibility of the hypervisor when dealing with live migration.

Regards,

Anthony Liguori



James


It's different when you're talking about a reboot happening or a
disconnect/reconnect due to firmware upgrade.  The OS would naturally be
reprobing in this case.






___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC-v2 0/4] tcm_vhost+cmwq fabric driver code for-3.6

2012-07-17 Thread Anthony Liguori


On 07/17/2012 10:05 AM, Michael S. Tsirkin wrote:

On Wed, Jul 11, 2012 at 09:15:00PM +, Nicholas A. Bellinger wrote:

From: Nicholas Bellingern...@linux-iscsi.org

Hi folks,

The following is a RFC-v2 series of tcm_vhost target fabric driver code
currently in-flight for-3.6 mainline code.

After last week's developments along with the help of some new folks, the
changelog v1 -  v2 so far looks like:

*) Fix drivers/vhost/test.c to use VHOST_NET_FEATURES in patch #1 (Asias He)
*) Fix tv_cmd completion -  release SGL memory leak (nab)
*) Fix sparse warnings for static variable usage (Fengguang Wu)
*) Fix sparse warnings for min() typing + printk format specs (Fengguang Wu)
*) Convert to cmwq submission for I/O dispatch (nab + hch)

Also following Paolo's request, a patch for hw/virtio-scsi.c that sets
scsi_host-max_target=0 that removes the need for virtio-scsi LLD to hardcode
VirtIOSCSIConfig-max_id=1 in order to function with tcm_vhost.

Note this series has been pushed into target-pending.git/for-next-merge, and
should be getting picked up for tomorrow's linux-next build.

Please let us know if you have any concerns and/or additional review feedback.

Thank you!



It still seems not 100% clear whether this driver will have major
userspace using it. And if not, it would be very hard to support a driver
when recent userspace does not use it in the end.


I don't think this is a good reason to exclude something from the kernel. 
However, there are good reasons why this doesn't make sense for something like 
QEMU--specifically because we have a large number of features in our block layer 
that tcm_vhost would bypass.


But perhaps it makes sense for something like native kvm tool.  And if it did go 
into the kernel, we would certainly support it in QEMU.


But I do think the kernel should carefully consider whether it wants to support 
an interface like this.  This an extremely complicated ABI with a lot of subtle 
details around state and compatibility.


Are you absolutely confident that you can support a userspace application that 
expects to get exactly the same response from all possible commands in 20 kernel 
versions from now?  Virtualization requires absolutely precise compatibility in 
terms of bugs and features.  This is probably not something the TCM stack has 
had to consider yet.



I think a good idea for 3.6 would be to make it depend on CONFIG_STAGING.
Then we don't commit to an ABI.


I think this is a good idea.  Even if it goes in, a really clear policy would be 
needed wrt the userspace ABI.


While tcm_vhost is probably more useful than vhost_blk, it's a much more complex 
ABI to maintain.


Regards,

Anthony Liguori


For this, you can add a separate Kconfig and source it from 
drivers/staging/Kconfig.
Maybe it needs to be in a separate directory drivers/vhost/staging/Kconfig.



Nicholas Bellinger (2):
   vhost: Add vhost_scsi specific defines
   tcm_vhost: Initial merge for vhost level target fabric driver

Stefan Hajnoczi (2):
   vhost: Separate vhost-net features from vhost features
   vhost: make vhost work queue visible

  drivers/vhost/Kconfig |6 +
  drivers/vhost/Makefile|1 +
  drivers/vhost/net.c   |4 +-
  drivers/vhost/tcm_vhost.c | 1609 +
  drivers/vhost/tcm_vhost.h |   74 ++
  drivers/vhost/test.c  |4 +-
  drivers/vhost/vhost.c |5 +-
  drivers/vhost/vhost.h |6 +-
  include/linux/vhost.h |9 +
  9 files changed, 1710 insertions(+), 8 deletions(-)
  create mode 100644 drivers/vhost/tcm_vhost.c
  create mode 100644 drivers/vhost/tcm_vhost.h

--
1.7.2.5




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH 0/6] tcm_vhost/virtio-scsi WIP code for-3.6

2012-07-04 Thread Anthony Liguori


On 07/04/2012 10:05 AM, Michael S. Tsirkin wrote:

On Wed, Jul 04, 2012 at 04:52:00PM +0200, Paolo Bonzini wrote:

Il 04/07/2012 16:02, Michael S. Tsirkin ha scritto:

On Wed, Jul 04, 2012 at 04:24:00AM +, Nicholas A. Bellinger wrote:

From: Nicholas Bellingern...@linux-iscsi.org

Hi folks,

This series contains patches required to update tcm_vhost-  virtio-scsi
connected hosts-  guests to run on v3.5-rc2 mainline code.  This series is
available on top of target-pending/auto-next here:

git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending.git 
tcm_vhost

This includes the necessary vhost changes from Stefan to to get tcm_vhost
functioning, along a virtio-scsi LUN scanning change to address a client bug
with tcm_vhost I ran into..  Also, tcm_vhost driver has been merged into a 
single
source + header file that is now living under /drivers/vhost/, along with latest
tcm_vhost changes from Zhi's tcm_vhost tree.

Here are a couple of screenshots of the code in action using raw IBLOCK
backends provided by FusionIO ioDrive Duo:

http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-3.png
http://linux-iscsi.org/images/Virtio-scsi-tcm-vhost-3.5-rc2-4.png

So the next steps on my end will be converting tcm_vhost to submit backend I/O 
from
cmwq context, along with fio benchmark numbers between tcm_vhost/virtio-scsi and
virtio-scsi-raw using raw IBLOCK iomemory_vsl flash.


OK so this is an RFC, not for merge yet?


Patch 6 definitely looks RFCish, but patch 5 should go in anyway.

Paolo


I was talking about 4/6 first of all.
Anyway, it's best to split, not to mix RFCs and fixes.


What is the use-case that we're targeting for this?

I certainly think it's fine to merge this into the kernel...  maybe something 
will use it.  But I'm pretty opposed to taking support for this into QEMU.  It's 
going to create more problems than it solves specifically because I have no idea 
what problem it actually solves.


We cannot avoid doing better SCSI emulation in QEMU.  That cannot be a long term 
strategy on our part and vhost-scsi isn't going to solve that problem for us.


Regards,

Anthony Liguori


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori


On 03/19/2012 10:56 AM, Michael S. Tsirkin wrote:

Currently virtio-pci is specified so that configuration of the device is
done through a PCI IO space (via BAR 0 of the virtual PCI device).
However, Linux guests happen to use ioread/iowrite/iomap primitives
for access, and these work uniformly across memory/io BARs.

While PCI IO accesses are faster than MMIO on x86 kvm,
MMIO might be helpful on other systems which don't
implement PIO or where PIO is slower than MMIO.

Add a property to make it possible to tweak the BAR type.

Signed-off-by: Michael S. Tsirkinm...@redhat.com

This is harmless by default but causes segfaults in memory.c
when enabled. Thus an RFC until I figure out what's wrong.


Doesn't this violate the virtio-pci spec?

Making the same vendor/device ID have different semantics depending on a magic 
flag in QEMU seems like a pretty bad idea to me.


Regards,

Anthony Liguori



---
  hw/virtio-pci.c |   16 ++--
  hw/virtio-pci.h |4 
  2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 28498ec..6f338d2 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -655,6 +655,7 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)
  {
  uint8_t *config;
  uint32_t size;
+uint8_t bar0_type;

  proxy-vdev = vdev;

@@ -684,8 +685,14 @@ void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice 
*vdev)

  memory_region_init_io(proxy-bar,virtio_pci_config_ops, proxy,
virtio-pci, size);
-pci_register_bar(proxy-pci_dev, 0, PCI_BASE_ADDRESS_SPACE_IO,
-proxy-bar);
+
+if (proxy-flags  VIRTIO_PCI_FLAG_USE_MMIO) {
+bar0_type = PCI_BASE_ADDRESS_SPACE_MEMORY;
+} else {
+bar0_type = PCI_BASE_ADDRESS_SPACE_IO;
+}
+
+pci_register_bar(proxy-pci_dev, 0, bar0_type,proxy-bar);

  if (!kvm_has_many_ioeventfds()) {
  proxy-flags= ~VIRTIO_PCI_FLAG_USE_IOEVENTFD;
@@ -823,6 +830,7 @@ static Property virtio_blk_properties[] = {
  DEFINE_PROP_BIT(ioeventfd, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_BLK_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -856,6 +864,7 @@ static Property virtio_net_properties[] = {
  DEFINE_PROP_UINT32(x-txtimer, VirtIOPCIProxy, net.txtimer, 
TX_TIMER_INTERVAL),
  DEFINE_PROP_INT32(x-txburst, VirtIOPCIProxy, net.txburst, TX_BURST),
  DEFINE_PROP_STRING(tx, VirtIOPCIProxy, net.tx),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -888,6 +897,7 @@ static Property virtio_serial_properties[] = {
  DEFINE_PROP_HEX32(class, VirtIOPCIProxy, class_code, 0),
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
  DEFINE_PROP_UINT32(max_ports, VirtIOPCIProxy, 
serial.max_virtserial_ports, 31),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -915,6 +925,7 @@ static TypeInfo virtio_serial_info = {

  static Property virtio_balloon_properties[] = {
  DEFINE_VIRTIO_COMMON_FEATURES(VirtIOPCIProxy, host_features),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

@@ -969,6 +980,7 @@ static int virtio_scsi_exit_pci(PCIDevice *pci_dev)
  static Property virtio_scsi_properties[] = {
  DEFINE_PROP_UINT32(vectors, VirtIOPCIProxy, nvectors, 2),
  DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
+DEFINE_PROP_BIT(mmio, VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_MMIO_BIT, false),
  DEFINE_PROP_END_OF_LIST(),
  };

diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index e560428..e6a8861 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -24,6 +24,10 @@
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT 1
  #define VIRTIO_PCI_FLAG_USE_IOEVENTFD   (1  
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT)

+/* Some guests don't support port IO. Use MMIO instead. */
+#define VIRTIO_PCI_FLAG_USE_MMIO_BIT 2
+#define VIRTIO_PCI_FLAG_USE_MMIO   (1  VIRTIO_PCI_FLAG_USE_MMIO_BIT)
+
  typedef struct {
  PCIDevice pci_dev;
  VirtIODevice *vdev;


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH RFC] virtio-pci: add MMIO property

2012-03-19 Thread Anthony Liguori


On 03/19/2012 06:52 PM, Rusty Russell wrote:

On Mon, 19 Mar 2012 17:13:06 -0500, Anthony Liguorianth...@codemonkey.ws  
wrote:

Maybe just make this a hidden option like x-miio?


x-violate-the-virtio-spec-to-trick-old-linux-drivers-into-working-on-power?


To configure the device, we use the first I/O region of the PCI
device.

Meh, it does sound a little like we are specifying that it's an PCI I/O
bar.

Let's resurrect the PCI-v2 idea, which is ready to implement now, and a
nice cleanup?  Detach it from the change-of-ring-format idea which is
turning out to be a tarpit.


I think that's a sensible approach.

Regards,

Anthony Liguori


Thanks,
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2012-01-11 Thread Anthony Liguori


On 01/10/2012 06:25 PM, Rusty Russell wrote:

On Tue, 10 Jan 2012 19:03:36 +0200, Michael S. Tsirkinm...@redhat.com  
wrote:

On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:

Yes.  The idea that we can alter fields in the device-specific config
area is flawed.  There may be cases where it doesn't matter, but as an
idea it was holed to begin with.

We can reduce probability by doing a double read to check, but there are
still cases where it will fail.


Okay - want me to propose an interface for that?


Had a brief chat with BenH (CC'd).

I think we should deprecate writing to the config space.  Only balloon
does it AFAICT, and I can't quite figure out *why* it has an 'active'
field.  This solves half the problem, of sync guest writes.  For the
other half, I suggest a generation counter; odd means inconsistent.  The
guest can poll.

BenH also convinced me we should finally make the config space LE if
we're going to change things.  Since PCI is the most common transport,
guest-endian confuses people.  And it sucks for really weird machines


I think the more important thing to do is require accesses to integers in the 
config space to always be aligned and to use the appropriate accessor. 
Non-integer fields should be restricted to byte access.


That limits config space entries to 32-bit but also means that there is no need 
for a generation counter.  It's also easier to deal with endian conversion that way.


But it means the backend code ends up being much simpler to write (because it 
behaves more like a normal PCI device).


If we're already making the change, the endianness ought to be a feature bit.


We should also change the ring (to a single ring, I think).


Ack.


Descriptors
to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
We might be able to squeeze it into 20 bytes but that means packing.  We
should support inline, chained or indirect.  Let the other side ack by
setting flag, cookie and len (if written).

Moreover, I think we should make all these changes at once (at least, in
the spec).  That makes it a big change, and it'll take longer to
develop, but makes it easy in the long run to differentiate legacy and
modern virtio.


Ack.  Long live virtio2! :-)

Regards,

Anthony Liguori



Thoughts?
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2012-01-11 Thread Anthony Liguori


On 01/11/2012 09:12 AM, Michael S. Tsirkin wrote:

On Wed, Jan 11, 2012 at 07:30:34AM -0600, Anthony Liguori wrote:

On 01/10/2012 06:25 PM, Rusty Russell wrote:

On Tue, 10 Jan 2012 19:03:36 +0200, Michael S. Tsirkinm...@redhat.com   
wrote:

On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:

Yes.  The idea that we can alter fields in the device-specific config
area is flawed.  There may be cases where it doesn't matter, but as an
idea it was holed to begin with.

We can reduce probability by doing a double read to check, but there are
still cases where it will fail.


Okay - want me to propose an interface for that?


Had a brief chat with BenH (CC'd).

I think we should deprecate writing to the config space.  Only balloon
does it AFAICT, and I can't quite figure out *why* it has an 'active'
field.  This solves half the problem, of sync guest writes.  For the
other half, I suggest a generation counter; odd means inconsistent.  The
guest can poll.

BenH also convinced me we should finally make the config space LE if
we're going to change things.  Since PCI is the most common transport,
guest-endian confuses people.  And it sucks for really weird machines


I think the more important thing to do is require accesses to
integers in the config space to always be aligned and to use the
appropriate accessor. Non-integer fields should be restricted to
byte access.

That limits config space entries to 32-bit but also means that there
is no need for a generation counter.  It's also easier to deal with
endian conversion that way.


This is similar to what we have now. But it's still buggy: e.g. if guest
updates MAC byte by byte, we have no way to know when it's done doing
so.


This is no different than a normal network card.  You have to use a secondary 
register to trigger an update.


Regards,

Anthony Liguori





But it means the backend code ends up being much simpler to write
(because it behaves more like a normal PCI device).

If we're already making the change, the endianness ought to be a feature bit.


We should also change the ring (to a single ring, I think).


Ack.


Descriptors
to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
We might be able to squeeze it into 20 bytes but that means packing.  We
should support inline, chained or indirect.  Let the other side ack by
setting flag, cookie and len (if written).

Moreover, I think we should make all these changes at once (at least, in
the spec).  That makes it a big change, and it'll take longer to
develop, but makes it easy in the long run to differentiate legacy and
modern virtio.


Ack.  Long live virtio2! :-)

Regards,

Anthony Liguori



Thoughts?
Rusty.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2012-01-11 Thread Anthony Liguori


On 01/11/2012 09:21 AM, Michael S. Tsirkin wrote:

On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:

This is similar to what we have now. But it's still buggy: e.g. if guest
updates MAC byte by byte, we have no way to know when it's done doing
so.


This is no different than a normal network card.  You have to use a
secondary register to trigger an update.

Regards,

Anthony Liguori


Possible but doesn't let us layer nicely to allow unchanged drivers
that work with all transports (new pci, old pci, non pci).


If we declare config space LE, then we have to touch all drivers.  There's no 
way around it because the virtio API is byte-based, not word based.


This is why I'm suggesting making the virtio API (and then the virtio-pci ABI) 
word based.  It gives us the flexibility to make endianness a property of the 
transport and not a property of the devices.


Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2012-01-11 Thread Anthony Liguori


On 01/11/2012 09:45 AM, Michael S. Tsirkin wrote:

On Wed, Jan 11, 2012 at 09:28:27AM -0600, Anthony Liguori wrote:

On 01/11/2012 09:21 AM, Michael S. Tsirkin wrote:

On Wed, Jan 11, 2012 at 09:15:49AM -0600, Anthony Liguori wrote:

This is similar to what we have now. But it's still buggy: e.g. if guest
updates MAC byte by byte, we have no way to know when it's done doing
so.


This is no different than a normal network card.  You have to use a
secondary register to trigger an update.

Regards,

Anthony Liguori


Possible but doesn't let us layer nicely to allow unchanged drivers
that work with all transports (new pci, old pci, non pci).


If we declare config space LE, then we have to touch all drivers.
There's no way around it because the virtio API is byte-based, not
word based.


Fine but don't we want to be compatible with old hypervisors?


We can modify the drivers to work either with a virtio1 or virtio2 transport. 
If the only difference is that we move to word access instead of byte access for 
the config space, it's a nop because drivers don't rely on sub-word access today.



This is why I'm suggesting making the virtio API (and then the
virtio-pci ABI) word based.  It gives us the flexibility to make
endianness a property of the transport and not a property of the
devices.

Regards,

Anthony Liguori


Some fields are 64 bit, this is still tricky to do atomically.
What's the objection to using a config VQ?


Then we move very far away from something that looks like a PCI device.  The 
problem we're having here is specifically where we've deviated from what a 
normal PCI device would do.  Fixing that by deviating even further seems counter 
intuitive to me.


Regards,

Anthony Liguori





___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

2012-01-11 Thread Anthony Liguori


On 01/11/2012 11:08 AM, Michael S. Tsirkin wrote:


Not sure what you mean. Using VQ is DMA which is pretty common for PCI.


Do you know of a network device that obtains it's mac address via a DMA 
transaction?

Regards,

Anthony Liguori







___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-blk: Add stats VQ to collect information about devices

2011-08-21 Thread Anthony Liguori

On 08/18/2011 11:29 AM, Sasha Levin wrote:
 On Thu, 2011-08-18 at 08:10 -0700, Avi Kivity wrote:
 On 08/17/2011 09:38 PM, Sasha Levin wrote:
 On Wed, 2011-08-17 at 16:00 -0700, Avi Kivity wrote:
   On 08/16/2011 12:47 PM, Sasha Levin wrote:
  This patch adds support for an optional stats vq that works similary 
 to the
  stats vq provided by virtio-balloon.
   
  The purpose of this change is to allow collection of statistics 
 about working
  virtio-blk devices to easily analyze performance without having to 
 tap into
  the guest.
   
   

   Why can't you get the same info from the host?  i.e. read sectors?

 Some of the stats you can collect from the host, but some you can't.

 The ones you can't include all the timing statistics and the internal
 queue statistics (read/write merges).

 Surely you can time the actual amount of time the I/O takes?  It doesn't
 account for the virtio round-trip, but does it matter?

 Why is the merge count important for the host?


 I assumed that the time the request spends in the virtio layer is
 (somewhat) significant, specially since that this is something that adds
 up over time.

 Merge count can be useful for several testing scenarios (I'll describe
 the reasoning behind this patch below).


 The idea behind providing all of the stats on the stats vq (which is
 basically what you see in '/dev/block/[device]/stats') is to give a
 consistent snapshot of the state of the device.



 What can you do with it?


 I was actually planning on submitting another patch that would add
 something similar into virtio-net. My plan was to enable collecting
 statistics regarding memory, network and disk usage in a simple manner
 without accessing guests.

Why not just add an interface that lets you read files from a guest 
either via a guest agent (like qemu-ga) or a purpose built PV device?

That would let you access the guest's full sysfs which seems to be quite 
a lot more useful long term than adding a bunch of specific interfaces.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-pci: add softlinks between virtio and pci

2011-01-05 Thread Anthony Liguori

On 01/05/2011 01:17 PM, Michael S. Tsirkin wrote:
 We sometimes need to map between the virtio device and
 the given pci device. One such use is OS installer that
 gets the boot pci device from BIOS and needs to
 find the relevant block device. Since it can't,
 installation fails.


I have no objection to this patch but I'm a tad confused by the description.

I assume you mean the installer is querying the boot device via int13 
get driver parameters such that it returns the pci address of the device?

Or is it querying geometry information and then trying to find the best 
match block device?

If it's the former, I don't really understand the need for a backlink 
since the PCI address gives you a link to the block device.  OTOH, if 
it's the later, it would make sense but then your description doesn't 
really make much sense.

At any rate, a better commit message would be helpful in explaining the 
need for this.

Regards,

Anthony Liguori

 Supply softlinks between these to make it possible.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 ---

 Gleb, could you please ack that this patch below
 will be enough to fix the installer issue that
 you see?

   drivers/virtio/virtio_pci.c |   18 +-
   1 files changed, 17 insertions(+), 1 deletions(-)

 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index ef8d9d5..06eb2f8 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -25,6 +25,7 @@
   #includelinux/virtio_pci.h
   #includelinux/highmem.h
   #includelinux/spinlock.h
 +#includelinux/sysfs.h

   MODULE_AUTHOR(Anthony Liguorialigu...@us.ibm.com);
   MODULE_DESCRIPTION(virtio-pci);
 @@ -667,8 +668,21 @@ static int __devinit virtio_pci_probe(struct pci_dev 
 *pci_dev,
   if (err)
   goto out_set_drvdata;

 - return 0;
 + err = sysfs_create_link(pci_dev-dev.kobj,vp_dev-vdev.dev.kobj,
 + virtio_device);
 + if (err)
 + goto out_register_device;
 +
 + err = sysfs_create_link(vp_dev-vdev.dev.kobj,pci_dev-dev.kobj,
 + bus_device);
 + if (err)
 + goto out_create_link;

 + return 0;
 +out_create_link:
 + sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
 +out_register_device:
 + unregister_virtio_device(vp_dev-vdev);
   out_set_drvdata:
   pci_set_drvdata(pci_dev, NULL);
   pci_iounmap(pci_dev, vp_dev-ioaddr);
 @@ -685,6 +699,8 @@ static void __devexit virtio_pci_remove(struct pci_dev 
 *pci_dev)
   {
   struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);

 + sysfs_remove_link(vp_dev-vdev.dev.kobj, bus_device);
 + sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
   unregister_virtio_device(vp_dev-vdev);
   }



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-pci: add softlinks between virtio and pci

2011-01-05 Thread Anthony Liguori

On 01/05/2011 01:38 PM, Michael S. Tsirkin wrote:
 On Wed, Jan 05, 2011 at 01:28:34PM -0600, Anthony Liguori wrote:

 On 01/05/2011 01:17 PM, Michael S. Tsirkin wrote:
  
 We sometimes need to map between the virtio device and
 the given pci device. One such use is OS installer that
 gets the boot pci device from BIOS and needs to
 find the relevant block device. Since it can't,
 installation fails.

 I have no objection to this patch but I'm a tad confused by the description.

 I assume you mean the installer is querying the boot device via
 int13 get driver parameters such that it returns the pci address of
 the device?

 Or is it querying geometry information and then trying to find the
 best match block device?
  
 I think it's the former.


 If it's the former, I don't really understand the need for a
 backlink since the PCI address gives you a link to the block device.
  
 It does? How does it?


Okay, after some more discussion, I think I better understand what's 
going on here.

The installer needs to figure out the boot device.  It does this by 
querying the BIOS and the BIOS can only give it back a PCI address.  
Right now, if you follow the PCI sysfs path, you cannot reach the actual 
virtio device because the PCI device only refers to the bus.  This is 
because virtio has a separate struct device than the PCI struct device.  
These explicit links establish the relationship.

A lot of this would be cleaner if virtio didn't force an independent 
struct device and could simply make use of an already existing struct 
device.

Regards,

Anthony Liguori


 OTOH, if it's the later, it would make sense but then your
 description doesn't really make much sense.

 At any rate, a better commit message would be helpful in explaining
 the need for this.

 Regards,

 Anthony Liguori

  
 Supply softlinks between these to make it possible.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 ---

 Gleb, could you please ack that this patch below
 will be enough to fix the installer issue that
 you see?

   drivers/virtio/virtio_pci.c |   18 +-
   1 files changed, 17 insertions(+), 1 deletions(-)

 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index ef8d9d5..06eb2f8 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -25,6 +25,7 @@
   #includelinux/virtio_pci.h
   #includelinux/highmem.h
   #includelinux/spinlock.h
 +#includelinux/sysfs.h

   MODULE_AUTHOR(Anthony Liguorialigu...@us.ibm.com);
   MODULE_DESCRIPTION(virtio-pci);
 @@ -667,8 +668,21 @@ static int __devinit virtio_pci_probe(struct pci_dev 
 *pci_dev,
 if (err)
 goto out_set_drvdata;

 -   return 0;
 +   err = sysfs_create_link(pci_dev-dev.kobj,vp_dev-vdev.dev.kobj,
 +   virtio_device);
 +   if (err)
 +   goto out_register_device;
 +
 +   err = sysfs_create_link(vp_dev-vdev.dev.kobj,pci_dev-dev.kobj,
 +   bus_device);
 +   if (err)
 +   goto out_create_link;

 +   return 0;
 +out_create_link:
 +   sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
 +out_register_device:
 +   unregister_virtio_device(vp_dev-vdev);
   out_set_drvdata:
 pci_set_drvdata(pci_dev, NULL);
 pci_iounmap(pci_dev, vp_dev-ioaddr);
 @@ -685,6 +699,8 @@ static void __devexit virtio_pci_remove(struct pci_dev 
 *pci_dev)
   {
 struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);

 +   sysfs_remove_link(vp_dev-vdev.dev.kobj, bus_device);
 +   sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
 unregister_virtio_device(vp_dev-vdev);
   }



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio-pci: add softlinks between virtio and pci

2011-01-05 Thread Anthony Liguori

On 01/05/2011 02:05 PM, Michael S. Tsirkin wrote:
 On Wed, Jan 05, 2011 at 01:28:34PM -0600, Anthony Liguori wrote:

 On 01/05/2011 01:17 PM, Michael S. Tsirkin wrote:
  
 We sometimes need to map between the virtio device and
 the given pci device. One such use is OS installer that
 gets the boot pci device from BIOS and needs to
 find the relevant block device. Since it can't,
 installation fails.

 I have no objection to this patch but I'm a tad confused by the description.

 I assume you mean the installer is querying the boot device via
 int13 get driver parameters such that it returns the pci address of
 the device?

 Or is it querying geometry information and then trying to find the
 best match block device?

 If it's the former, I don't really understand the need for a
 backlink since the PCI address gives you a link to the block device.
 OTOH, if it's the later, it would make sense but then your
 description doesn't really make much sense.

 At any rate, a better commit message would be helpful in explaining
 the need for this.

 Regards,

 Anthony Liguori
  
 OK just to clarify: we get pci address from BIOS
 and need the virtio device to get at the linux device
 (e.g. block) in the end.  Thus the link from pci to virtio.
 I also added a backlink since I thought it's handy.

 Does this answer the questions?

 Rusty rewrites my commit logs anyway, he has better style :)


It helps.  The real reason this is needed is because in a normal device, 
there is only one struct device whereas with virtio-pci, the virtio-pci 
device has a struct device and then the actual virtio device has another 
one.  There's probably a better way to handle this in sysfs making 
virtio-pci a proper bus with only a single device as a child or 
something like that.

But the links are probably an easier solution.

Regards,

Anthony Liguori


 Supply softlinks between these to make it possible.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com
 ---

 Gleb, could you please ack that this patch below
 will be enough to fix the installer issue that
 you see?

   drivers/virtio/virtio_pci.c |   18 +-
   1 files changed, 17 insertions(+), 1 deletions(-)

 diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
 index ef8d9d5..06eb2f8 100644
 --- a/drivers/virtio/virtio_pci.c
 +++ b/drivers/virtio/virtio_pci.c
 @@ -25,6 +25,7 @@
   #includelinux/virtio_pci.h
   #includelinux/highmem.h
   #includelinux/spinlock.h
 +#includelinux/sysfs.h

   MODULE_AUTHOR(Anthony Liguorialigu...@us.ibm.com);
   MODULE_DESCRIPTION(virtio-pci);
 @@ -667,8 +668,21 @@ static int __devinit virtio_pci_probe(struct pci_dev 
 *pci_dev,
 if (err)
 goto out_set_drvdata;

 -   return 0;
 +   err = sysfs_create_link(pci_dev-dev.kobj,vp_dev-vdev.dev.kobj,
 +   virtio_device);
 +   if (err)
 +   goto out_register_device;
 +
 +   err = sysfs_create_link(vp_dev-vdev.dev.kobj,pci_dev-dev.kobj,
 +   bus_device);
 +   if (err)
 +   goto out_create_link;

 +   return 0;
 +out_create_link:
 +   sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
 +out_register_device:
 +   unregister_virtio_device(vp_dev-vdev);
   out_set_drvdata:
 pci_set_drvdata(pci_dev, NULL);
 pci_iounmap(pci_dev, vp_dev-ioaddr);
 @@ -685,6 +699,8 @@ static void __devexit virtio_pci_remove(struct pci_dev 
 *pci_dev)
   {
 struct virtio_pci_device *vp_dev = pci_get_drvdata(pci_dev);

 +   sysfs_remove_link(vp_dev-vdev.dev.kobj, bus_device);
 +   sysfs_remove_link(pci_dev-dev.kobj, virtio_device);
 unregister_virtio_device(vp_dev-vdev);
   }



___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-12 Thread Anthony Liguori

On 11/12/2010 06:14 AM, Ian Molton wrote:
 On 10/11/10 17:47, Anthony Liguori wrote:
 On 11/10/2010 11:22 AM, Ian Molton wrote:
 Ping ?

 I think the best way forward is to post patches.

 I posted links to the git trees. I can post patches, but they are 
 *large*. Do you really want me to post them?

Yes, and they have to be split up into something reviewable.


 To summarize what I was trying to express in the thread, I think this is
 not the right long term architecture but am not opposed to it as a short
 term solution. I think having a new virtio device is a bad design choice
 but am not totally opposed to it.

 Ok! (I agree (that this should be a short term solution) :) )

 you want to go for the path of integration, you're going to have to fix
 all of the coding style issues and make the code fit into QEMU. Dropping
 a bunch of junk into target-i386/ is not making the code fit into QEMU.

 I agree. how about hw/gl for the renderer and hw/ for the virtio module?

That would be fine.

 If you post just what you have now in patch form, I can try to provide
 more concrete advice ignoring the coding style problems.

 I can post patches, although I dont think LKML would appreciate the 
 volume! I can post them to the qemu list if you do.

Yes, qemu is where I was suggesting you post them.

Regards,

Anthony Liguori

 -Ian

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-10 Thread Anthony Liguori

On 11/10/2010 11:22 AM, Ian Molton wrote:
 Ping ?

I think the best way forward is to post patches.

To summarize what I was trying to express in the thread, I think this is 
not the right long term architecture but am not opposed to it as a short 
term solution.  I think having a new virtio device is a bad design 
choice but am not totally opposed to it.

My advice is that using virtio-serial + an external tool is probably the 
least amount of work to get something working and usable with QEMU.  If 
you want to go for the path of integration, you're going to have to fix 
all of the coding style issues and make the code fit into QEMU.  
Dropping a bunch of junk into target-i386/ is not making the code fit 
into QEMU.

If you post just what you have now in patch form, I can try to provide 
more concrete advice ignoring the coding style problems.

Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-03 Thread Anthony Liguori

On 11/03/2010 01:03 PM, Ian Molton wrote:

 The virtio driver enfoces the PID field and understands the packet 
 format used. Its better than using serial. Its also just one driver - 
 which doesnt have any special interdependencies and can be extended or 
 got rid of in future if and when better things come along.

Why is it better than using virtio-serial?


 In the very, very short term, I think an external backend to QEMU also
 makes a lot of sense because that's something that Just Works today.

 Whos written that? The 2007 patch I've been working on and updating 
 simply fails to work altogether without huge alterations on current qemu.

 My current patch touches a tiny part of the qemu sources. It works today.

But it's not at all mergable in the current form.  If you want to do the 
work of getting it into a mergable state (cleaning up the coding style, 
moving it to hw/, etc.) than I'm willing to consider it.  But I don't 
think a custom virtio transport is the right thing to do here.

However, if you want something that Just Works with the least amount of 
code possible, just split it into a separate process and we can stick it 
in a contrib/ directory or something.


 I
 think we can consider integrating it into QEMU (or at least simplifying
 the execution of the backend) but integrating into QEMU is going to
 require an awful lot of the existing code to be rewritten. Keeping it
 separate has the advantage of allowing something to Just Work as an
 interim solution as we wait for proper support in Spice.

 I dont know why you think integrating it into qemu is hard? I've 
 already done it. 

Adding a file that happens to compile as part of qemu even though it 
doesn't actually integrate with qemu in any meaningful way is not 
integrating.  That's just build system manipulation.

Regards,

Anthony Liguori

 I added one virtio driver and a seperate offscreen renderer. it 
 touches the qemu code in *one* place. There should be no need to 
 rewrite anything.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-01 Thread Anthony Liguori

On 11/01/2010 05:42 AM, Avi Kivity wrote:
  On 10/28/2010 03:52 PM, Ian Molton wrote:
 On 28/10/10 15:24, Avi Kivity wrote:
 The caller is intended to block as the host must perform GL rendering
 before allowing the guests process to continue.

 Why is that?  Can't we pipeline the process?

 No, not really. the guest may call for the scene to be rendered at 
 any time and we have to wait for that to happen before we can return 
 the data to it.

 Waiting for a response is fine, but can't the guest issue a second 
 batch while waiting for the first?

In a threaded application I think you mean but all RPCs are dispatched 
holding a global lock so even within a threaded application, only a 
single GL call will be executed at a time.

The other scenario would be multiple applications trying to use GL but 
AFAICT, this is not supported in the current model.

Regards,

Anthony Liguori


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-11-01 Thread Anthony Liguori

On 11/01/2010 06:53 AM, Alon Levy wrote:
 The alternative proposal is Spice which so far noone has mentioned.
 Right now, Spice seems to be taking the right approach to guest 3d
 support.

  
 While we (speaking as part of the SPICE developers) want to have the same
 support in our virtual GPU for 3d as we have for 2d, we just don't at this
 point of time.


Yes, but I think the point is that are two general approaches to 
supporting 3d that are being proposed.  One approach is to an RPC layer 
at the OpenGL level which essentially passes through the host OpenGL 
stack.  That's what virtio-gl is.  This has existed for quite a while 
and there are multiple transports for it.  It supports serial ports, TCP 
sockets, a custom ISA extension for x86, and now a custom virtio transport.

Another approach would be to have a virtual GPU and to implement 
GPU-level commands for 3d.  I have been repeated told that much of the 
complexity of Spice is absolutely needed for 3d and that that's a major 
part of the design.

GPU-level support for 3d operations has a number of advantages mainly 
that it's more reasonably portable to things like Windows since the 3d 
commands can be a superset of both OpenGL and Direct3d.

Also, Spice has an abstraction layer that doesn't simply passthrough 
graphics commands, but translates/sanitizes them first.   That's another 
advantage over OpenGL passthrough.  Without a layer to sanitize 
commands, a guest can do funky things with the host or other guests.

I think a Spice-like approach is the best thing long term.  In the short 
term, I think doing the GL marshalling over virtio-serial makes a ton of 
sense since the kernel driver is already present upstream.  It exists 
exactly for things like this.

In the very, very short term, I think an external backend to QEMU also 
makes a lot of sense because that's something that Just Works today.  I 
think we can consider integrating it into QEMU (or at least simplifying 
the execution of the backend) but integrating into QEMU is going to 
require an awful lot of the existing code to be rewritten.  Keeping it 
separate has the advantage of allowing something to Just Work as an 
interim solution as we wait for proper support in Spice.

Regards,

Anthony Liguori


 Regards,

 Anthony Liguori

  
 I understand others are skeptical,
 but this seems simple and if it works and you're happy to maintain

 it I'm
  
 happy to let you do it :)

 Cheers,
 Rusty.




___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] Implement a virtio GPU transport

2010-10-28 Thread Anthony Liguori

On 10/28/2010 09:24 AM, Avi Kivity wrote:
  On 10/28/2010 01:54 PM, Ian Molton wrote:
 Well, I like to review an implementation against a spec.


 True, but then all that would prove is that I can write a spec to 
 match the code.

 It would also allow us to check that the spec matches the 
 requirements.  Those two steps are easier than checking that the code 
 matches the requirements.

I'm extremely sceptical of any GL passthrough proposal.  There have 
literally been half a dozen over the years and they never seem to leave 
proof-of-concept phase.  My (limited) understanding is that it's a 
fundamentally hard problem that no one has adequately solved yet.

A specifically matters an awful lot less than an explanation of how the 
problem is being solved in a robust fashion such that it can be reviewed 
by people with a deeper understanding of the problem space.

Regards,

Anthony Liguori

 The code is proof of concept. the kernel bit is pretty simple, but 
 I'd like to get some idea of whether the rest of the code will be 
 accepted given that theres not much point in having any one (or two) 
 of these components exist without the other.

 I guess some graphics people need to be involved.


 Better, but still unsatisfying. If the server is busy, the caller would
 block. I guess it's expected since it's called from -fsync(). I'm not
 sure whether that's the best interface, perhaps aio_writev is better.

 The caller is intended to block as the host must perform GL rendering 
 before allowing the guests process to continue.

 Why is that?  Can't we pipeline the process?


 The only real bottleneck is that processes will block trying to 
 submit data if another process is performing rendering, but that will 
 only be solved when the renderer is made multithreaded. The same 
 would happen on a real GPU if it had only one queue too.

 If you look at the host code, you can see that the data is already 
 buffered per-process, in a pretty sensible way. if the renderer 
 itself were made a seperate thread, then this problem magically 
 disappears (the queuing code on the host is pretty fast).

 Well, this is out of my area of expertise.  I don't like it, but if 
 it's acceptable to the gpu people, okay.


 In testing, the overhead of this was pretty small anyway. Running a 
 few dozen glxgears and a copy of ioquake3 simultaneously on an intel 
 video card managed the same framerate with the same CPU utilisation, 
 both with the old code and the version I just posted. Contention 
 during rendering just isn't much of an issue.


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv2] vhost-net: add dhclient work-around from userspace

2010-06-30 Thread Anthony Liguori

On 06/29/2010 08:04 AM, Michael S. Tsirkin wrote:
 On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:

 From: Michael S. Tsirkinm...@redhat.com
 Date: Mon, 28 Jun 2010 13:08:07 +0300

 Userspace virtio server has the following hack
 so guests rely on it, and we have to replicate it, too:

 Use port number to detect incoming IPv4 DHCP response packets,
 and fill in the checksum for these.

 The issue we are solving is that on linux guests, some apps
 that use recvmsg with AF_PACKET sockets, don't know how to
 handle CHECKSUM_PARTIAL;
 The interface to return the relevant information was added
 in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
 and older userspace does not use it.
 One important user of recvmsg with AF_PACKET is dhclient,
 so we add a work-around just for DHCP.

 Don't bother applying the hack to IPv6 as userspace virtio does not
 have a work-around for that - let's hope guests will do the right
 thing wrt IPv6.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com

 Yikes, this is awful too.

 Nothing in the kernel should be mucking around with procotol packets
 like this by default.  In particular, what the heck does port 67 mean?
 Locally I can use it for whatever I want for my own purposes, I don't
 have to follow the conventions for service ports as specified by the
 IETF.

 But I can't have the packet checksum state be left alone for port 67
 traffic on a box using virtio because you have this hack there.

 And yes it's broken on machines using the qemu thing, but at least the
 hack there is restricted to userspace.

 Yes, and I think it was a mistake to add the hack there. This is what
 prevented applications from using the new interface in the 3 years
 since it was first introduced.

It's far more complicated than that.  dhclient is part of ISC's DHCP 
package.  They do not have a public SCM and instead require you to join 
their Software Guild to get access to it.

This problem was identified in one distribution and the patch was pushed 
upstream but because they did not have a public SCM, most other 
distributions did not see the fix until it appeared in a release.  ISC 
has a pretty long release cycle historically.

ISC's had the fix for a long time but there was a 3-year gap in their 
releases and since their SCM isn't public, users are stuck with the last 
release.

This hack makes sense in QEMU as we have a few hacks like this to fix 
broken guests.  A primary use of virtualization is to run old 
applications so it makes sense for us to do that.

I don't think it makes sense for vhost to do this.  These guests are so 
old that they don't have the requisite features to achieve really high 
performance anyway.

I've always thought making vhost totally transparent was a bad idea and 
this is one of the reasons.  We can do a lot of ugly things in userspace 
that we shouldn't be doing in the kernel.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv2] vhost-net: add dhclient work-around from userspace

2010-06-30 Thread Anthony Liguori

On 06/30/2010 05:31 PM, Michael S. Tsirkin wrote:
 On Wed, Jun 30, 2010 at 05:08:11PM -0500, Anthony Liguori wrote:

 On 06/29/2010 08:04 AM, Michael S. Tsirkin wrote:

 On Tue, Jun 29, 2010 at 12:36:47AM -0700, David Miller wrote:

 From: Michael S. Tsirkinm...@redhat.com
 Date: Mon, 28 Jun 2010 13:08:07 +0300

 Userspace virtio server has the following hack
 so guests rely on it, and we have to replicate it, too:

 Use port number to detect incoming IPv4 DHCP response packets,
 and fill in the checksum for these.

 The issue we are solving is that on linux guests, some apps
 that use recvmsg with AF_PACKET sockets, don't know how to
 handle CHECKSUM_PARTIAL;
 The interface to return the relevant information was added
 in 8dc4194474159660d7f37c495e3fc3f10d0db8cc,
 and older userspace does not use it.
 One important user of recvmsg with AF_PACKET is dhclient,
 so we add a work-around just for DHCP.

 Don't bother applying the hack to IPv6 as userspace virtio does not
 have a work-around for that - let's hope guests will do the right
 thing wrt IPv6.

 Signed-off-by: Michael S. Tsirkinm...@redhat.com

 Yikes, this is awful too.

 Nothing in the kernel should be mucking around with procotol packets
 like this by default.  In particular, what the heck does port 67 mean?
 Locally I can use it for whatever I want for my own purposes, I don't
 have to follow the conventions for service ports as specified by the
 IETF.

 But I can't have the packet checksum state be left alone for port 67
 traffic on a box using virtio because you have this hack there.

 And yes it's broken on machines using the qemu thing, but at least the
 hack there is restricted to userspace.

 Yes, and I think it was a mistake to add the hack there. This is what
 prevented applications from using the new interface in the 3 years
 since it was first introduced.

 It's far more complicated than that.  dhclient is part of ISC's DHCP
 package.  They do not have a public SCM and instead require you to
 join their Software Guild to get access to it.

 This problem was identified in one distribution and the patch was
 pushed upstream but because they did not have a public SCM, most
 other distributions did not see the fix until it appeared in a
 release.  ISC has a pretty long release cycle historically.

 ISC's had the fix for a long time but there was a 3-year gap in
 their releases and since their SCM isn't public, users are stuck
 with the last release.

 This hack makes sense in QEMU as we have a few hacks like this to
 fix broken guests.
   A primary use of virtualization is to run old
 applications so it makes sense for us to do that.

 IMO it was wrong to put it in qemu: originally, if a distro shipped
 a broken virtio/dhclient combo, it was it's own bug to fix.
 But now that qemu has shipped the work-around for so long,
 broken guests seemed work.

The guests were broken before qemu implemented this.

virtio-net had checksum offload long before it was ever implemented in 
qemu.  Not even lguest implemented it because the interfaces weren't 
available in tun/tap.  I'm not sure how Rusty ever tested it.  We only 
discovered this bug after checksum offload was implemented in tun/tap 
and we were able to enable it in QEMU.  At that point, the guests had 
shipped and were in the wild.

The real problem was that we implemented a feature in a guest driver 
without having a backend implementation and then shipped the code.  
Shipping untested code is a recipe for failure.

If we had implemented the front-end feature only when a backend 
implementation was available, we would have caught this, fixed it in the 
guests, and not be in the situation because there wouldn't be these 
broken guests.

   So we *still* see the bug re-surface in new guests.

Which guests?  Newer versions of dhclient should work as expected.

 And since they are fairly new, it is interesting to
 get decent performance from them now.

 I don't think it makes sense for vhost to do this.  These guests are
 so old that they don't have the requisite features to achieve really
 high performance anyway.

 I've always thought making vhost totally transparent was a bad idea
 and this is one of the reasons.

 It does not have to be fully transparent. You can insert your own ring
 in the middle, and copy descriptors around.  And we stop on errors and
 let userspace handle.  This will come handy if we get e.g. virtio bug
 that we need to work around.

I mean from a UI perspective.  IOW, if users have to explicitly choose 
to use vhost-net, then it's okay to force them to use newer guests that 
support vhost-net.  However, if we make it transparent, then it has to 
support everything that QEMU virtio has ever supported which is 
problematic for exactly the reasons you are now encountering.

 We can do a lot of ugly things in
 userspace that we shouldn't be doing in the kernel.

 Regards,

 Anthony

Re: Hypervisor detection from within a Linux VM

2010-06-29 Thread Anthony Liguori

On 06/29/2010 04:25 PM, Chetan Loke wrote:
 Hello,


 Requirement:
 I have the need to support my apps(running on a Linux VM) on different
 *nix hypervisors(ESX/Xen etc). I need to know on which hypervisor my
 app is running. I read the CPUID usage thread -
 http://thread.gmane.org/gmane.comp.emulators.kvm.devel/22643 but to be
 honest in the end I looked at
 http://lxr.linux.no/#linux+v2.6.34/arch/x86/kernel/cpu/vmware.c#L88
 The vmware_platform() detection code is straight forward.

 Current-hack:
 As a quick hack we just grep lspci for VMware's pci-ids.

 Solution:
 I can write a bare minimal driver, check the cpu-id as VMware's
 balloon driver does and then emit a proc/sysfs node. The setup
 packages and the apps can then check for this node-string.I'm
 currently working on ESX and I am hoping that this thin-driver will
 work.


It can be done entirely in userspace.  Take a look at virt-what:

http://people.redhat.com/~rjones/virt-what/

 Question:
 Q1)Is it possible to get this functionality as part of the stock
 kernel or is that a bad idea? I suspect there could be other
 users/apps who would need to know what *nix hypervisor(or a
 non-virtualized environment) they are
 running on?
 Q2)If this is not the right approach then can someone please suggest
 another approach?


It might be reasonable to list the hypervisor signature as a field in 
/proc/cpuinfo.  There's also a /sys/hypervisor where such information 
could go.

Regards,

Anthony Liguori

 Regards
 Chetan Loke
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [RFC] virtio: Support releasing lock during kick

2010-06-23 Thread Anthony Liguori

On 06/23/2010 04:24 PM, Stefan Hajnoczi wrote:
 The virtio block device holds a lock during I/O request processing.
 Kicking the virtqueue while the lock is held results in long lock hold
 times and increases contention for the lock.

 This patch modifies virtqueue_kick() to optionally release a lock while
 notifying the host.  Virtio block is modified to pass in its lock.  This
 allows other vcpus to queue I/O requests during the time spent servicing
 the virtqueue notify in the host.

 The virtqueue_kick() function is modified to know about locking because
 it changes the state of the virtqueue and should execute with the lock
 held (it would not be correct for virtio block to release the lock
 before calling virtqueue_kick()).

 Signed-off-by: Stefan Hajnoczistefa...@linux.vnet.ibm.com
 ---
 I am not yet 100% happy with this patch which aims to reduce guest CPU
 consumption related to vblk-lock contention.  Although this patch reduces
 wait/hold times it does not affect I/O throughput or guest CPU utilization.
 More investigation is required to get to the bottom of why guest CPU
 utilization does not decrease when a lock bottleneck has been removed.

 Performance figures:

 Host: 2.6.34 upstream kernel, qemu-kvm-0.12.4 if=virtio,cache=none
 Guest: 2.6.35-rc3-kvm.git upstream kernel
 Storage: 12 disks as striped LVM volume
 Benchmark: 4 concurrent dd bs=4k iflag=direct

 Lockstat data forvblk-lock:

 test   con-bounces contentions  waittime-min waittime-max waittime-total
 unmodified 70977108 0.31 956.09   161165.4
 patched11484   115500.30 411.80   50245.83

 The maximum wait time went down by 544.29 us (-57%) and the total wait time
 decreased by 69%.  This shows that the virtqueue kick is indeed hogging the
 lock.

 The patched version actually has higher contention than the unmodified 
 version.
 I think the reason for this is that each virtqueue kick now includes a short
 release and reacquire.  This short release gives other vcpus a chance to
 acquire the lock and progress, hence more contention but overall better wait
 time numbers.

 name   acq-bounces acquisitions holdtime-min holdtime-max holdtime-total
 unmodified 10771   5038346  0.00 3271.81  59016905.47
 patched31594   5857813  0.00 219.76   24104915.55

 Here we see the full impact of this patch: hold time reduced to 219.76 us
 (-93%).

 Again the acquisitions have increased since we're now doing an extra
 unlock+lock per virtqueue kick.

 Testing, ideas, and comments appreciated.

   drivers/block/virtio_blk.c  |2 +-
   drivers/char/hw_random/virtio-rng.c |2 +-
   drivers/char/virtio_console.c   |6 +++---
   drivers/net/virtio_net.c|6 +++---
   drivers/virtio/virtio_balloon.c |6 +++---
   drivers/virtio/virtio_ring.c|   13 +++--
   include/linux/virtio.h  |3 ++-
   net/9p/trans_virtio.c   |2 +-
   8 files changed, 25 insertions(+), 15 deletions(-)

 diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c
 index 258bc2a..de033bf 100644
 --- a/drivers/block/virtio_blk.c
 +++ b/drivers/block/virtio_blk.c
 @@ -187,7 +187,7 @@ static void do_virtblk_request(struct request_queue *q)
   }

   if (issued)
 - virtqueue_kick(vblk-vq);
 + virtqueue_kick(vblk-vq,vblk-lock);
   }


Shouldn't it be possible to just drop the lock before invoking 
virtqueue_kick() and reacquire it afterwards?  There's nothing in that 
virtqueue_kick() path that the lock is protecting AFAICT.

Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Guest bridge setup variations

2009-12-10 Thread Anthony Liguori

Arnd Bergmann wrote:
 3) given an fd, treat a vhost-style interface
 

 This could mean two things, not sure which one you mean. Either the
 file descriptor could be the vhost file descriptor, or the socket or tap file
 descriptor from above, with qemu operating on the vhost interface itself.

 Either option has its advantages, but I guess we should only implement
 one of the two to keep it simple.
   

I was thinking the socket/tap descriptor.

 I believe we should continue supporting the mechanisms we support 
 today.  However, for people that invoke qemu directly from the command 
 line, I believe we should provide a mechanism like the tap helper that 
 can be used to call out to a separate program to create these initial 
 file descriptors.  We'll have to think about how we can make this 
 integrate well so that the syntax isn't clumsy.
 

 Right. I wonder if this helper should integrate with netcf as an abstraction,
 or if we should rather do something generic. It may also be a good idea
 to let the helper decide which of the three options you listed to use
 and pass that back to qemu unless the user overrides it. The decision
 probably needs to be host specific, depending e.g. on the availability
 and version of tools (brctl, iproute, maybe tunctl, ...), the respective
 kernel modules (vhost, macvlan, bridge, tun, ...) and policy (VEPA, vlan,
 ebtables). Ideally the approach should be generic enough to work on
 other platforms (BSD, Solaris, Windows, ...).
   

For helpers, I think I'd like to stick with what we currently support, 
and then allow for a robust way for there to be independent projects 
that implement their own helpers.   For instance, I would love it if 
netcf had a qemu network plugin helper.

There's just too much in the networking space all wrapped up in layers 
of policy decisions.  I think it's time to move it out of qemu.

 One thing I realized the last time we discussed the helper approach is
 that qemu should not need to know or care about the arguments passed
 to the helper, otherwise you get all the complexity back in qemu that
 you're trying to avoid. Maybe for 0.13 we can convert -net socket and
 -net tap to just pass all their options to the helper and move that code
 out of qemu, along with introducting the new syntax.
   

Yes, I was thinking the same thing.  New syntax may need exploring.

 Another unrelated issue that I think needs to be addressed in a
 network code cleanup is adding better support for multi-queue
 transmit and receive. I've prepared macvtap for that by letting you
 open the chardev multiple times to get one queue per guest CPU,
 but that needs to be supported by qemu and virtio-net as well
 to actually parallelize network operation. Ideally, two guest CPUs
 should be able to transmit and receive on separate queues of the
 adapter without ever having to access any shared resources.
   

Multiqueue adds another dimension but I think your approach is pretty 
much right on the money.  Have multiple fds for each queue and we would 
support a mechanism with helpers to receive multiple fds as appropriate.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Guest bridge setup variations

2009-12-09 Thread Anthony Liguori

Arnd Bergmann wrote:
 As promised, here is my small writeup on which setups I feel
 are important in the long run for server-type guests. This
 does not cover -net user, which is really for desktop kinds
 of applications where you do not want to connect into the
 guest from another IP address.

 I can see four separate setups that we may or may not want to
 support, the main difference being how the forwarding between
 guests happens:

 1. The current setup, with a bridge and tun/tap devices on ports
 of the bridge. This is what Gerhard's work on access controls is
 focused on and the only option where the hypervisor actually
 is in full control of the traffic between guests. CPU utilization should
 be highest this way, and network management can be a burden,
 because the controls are done through a Linux, libvirt and/or Director
 specific interface.
   

Typical bridging.

 2. Using macvlan as a bridging mechanism, replacing the bridge
 and tun/tap entirely. This should offer the best performance on
 inter-guest communication, both in terms of throughput and
 CPU utilization, but offer no access control for this traffic at all.
 Performance of guest-external traffic should be slightly better
 than bridge/tap.
   

Optimization to typical bridge (no traffic control).

 3. Doing the bridging in the NIC using macvlan in passthrough
 mode. This lowers the CPU utilization further compared to 2,
 at the expense of limiting throughput by the performance of
 the PCIe interconnect to the adapter. Whether or not this
 is a win is workload dependent. Access controls now happen
 in the NIC. Currently, this is not supported yet, due to lack of
 device drivers, but it will be an important scenario in the future
 according to some people.
   

Optimization to typical bridge (hardware accelerated).

 4. Using macvlan for actual VEPA on the outbound interface.
 This is mostly interesting because it makes the network access
 controls visible in an external switch that is already managed.
 CPU utilization and guest-external throughput should be
 identical to 3, but inter-guest latency can only be worse because
 all frames go through the external switch.
   

VEPA.

While we go over all of these things one thing is becoming clear to me.  
We need to get qemu out of the network configuration business.  There's 
too much going on here.

What I'd like to see is the following interfaces supported:

1) given an fd, make socket calls to send packets.  Could be used with a 
raw socket, a multicast or tcp socket.
2) given an fd, use tap-style read/write calls to send packets*
3) given an fd, treat a vhost-style interface

* need to make all tun ioctls optional based on passed in flags

Every backend we have today could be implemented in terms of one of the 
above three.  They really come down to how the fd is created and setup.

I believe we should continue supporting the mechanisms we support 
today.  However, for people that invoke qemu directly from the command 
line, I believe we should provide a mechanism like the tap helper that 
can be used to call out to a separate program to create these initial 
file descriptors.  We'll have to think about how we can make this 
integrate well so that the syntax isn't clumsy.

Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver (V2)

2009-11-18 Thread Anthony Liguori

Rusty Russell wrote:
 On Wed, 18 Nov 2009 07:06:29 am Adam Litke wrote:
   
 virtio: Add memory statistics reporting to the balloon driver (V2)

 Changes since V1:
  - Use a virtqueue instead of the device config space
 

 Hi Adam,

 If Anthony's happy, I'm happy with this approach.

 Couple of minor points:

   
 +static inline void update_stat(struct virtio_balloon *vb, int idx,
 +unsigned int tag, unsigned long val)
 +{
 +BUG_ON(idx = VIRTIO_BALLOON_S_NR);
 +vb-stats[idx].tag = tag;
 +vb-stats[idx].val = cpu_to_le32(val);
 +}
 

 The little-endian conversion of the balloon driver is a historical mistake
 (no other driver does this).  Let's not extend it to the stats.
   

I think the mistake is that the other drivers don't do that.

We cheat in qemu and assume that the guest is always in a fixed 
endianness but this is not always the case for all architectures.

That said, since we make this mistake everywhere, I guess I understand 
the argument to have consistency and to just admit that we're broken 
here.  But this is where the endianness bits come from.

 Here you've done one and not the other, which is even worse.  (Sparse would
 have found this, I assume).
   

Yup, that's definitely wrong.

 +struct virtio_balloon_stat
 +{
 +__le16 tag;
 +__le32 val;
 +};
 

 Is 32 bits sufficient?  A big machine might get over 4bn faults, and certainly
 4 TB of memory isn't that far away.
   

I think making the interface u64 and byte based would be the best 
solution.  Making assumptions about page size across guest and host is 
another thing we should try to avoid.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver

2009-11-10 Thread Anthony Liguori

Rusty Russell wrote:
 On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
   
 A simpler approach is to collect memory statistics in the virtio
 balloon driver and communicate them to the host via the device config space.
 

 There are two issues I see with this.  First, there's an atomicity problem
 since you can't tell when the stats are consistent.

Actually, config writes always require notification from the guest to 
the host.  This means the host knows when they config space is changed 
so atomicity isn't a problem.

In fact, if it were a problem, then the balloon driver would be 
fundamentally broken because target and actual are stored in the config 
space.

If you recall, we had this discussion originally wrt the balloon driver :-)

   Second, polling is
 ugly.
   

As opposed to?  The guest could set a timer and update the values 
periodically but that's even uglier because then the host cannot 
determine the update granularity.

 A stats vq might solve this more cleanly?
   

actual and target are both really just stats.  Had we implemented those 
with a vq, I'd be inclined to agree with you but since they're 
implemented in the config space, it seems natural to extend the config 
space with other stats.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: virtio: Add memory statistics reporting to the balloon driver

2009-11-10 Thread Anthony Liguori

Avi Kivity wrote:
 On 11/10/2009 04:36 PM, Anthony Liguori wrote:

 A stats vq might solve this more cleanly?

 actual and target are both really just stats.  Had we implemented 
 those with a vq, I'd be inclined to agree with you but since they're 
 implemented in the config space, it seems natural to extend the 
 config space with other stats.


 There is in fact a difference; actual and target are very rarely 
 updated, while the stats are updated very often.  Using a vq means a 
 constant number of exits per batch instead of one exit per statistic.  
 If the vq is host-driven, it also allows the host to control the 
 update frequency dynamically (i.e. stop polling when there is no 
 memory pressure).

I'm not terribly opposed to using a vq for this.  I would expect the 
stat update interval to be rather long (10s probably) but a vq works 
just as well.

-- 
Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: virtio: Add memory statistics reporting to the balloon driver

2009-11-10 Thread Anthony Liguori

Rusty Russell wrote:
 On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
   
 A simpler approach is to collect memory statistics in the virtio
 balloon driver and communicate them to the host via the device config space.
 

 There are two issues I see with this.  First, there's an atomicity problem
 since you can't tell when the stats are consistent.  Second, polling is
 ugly.

 A stats vq might solve this more cleanly?
   

This turns out to not work so nicely.  You really need bidirectional 
communication.  You need to request that stats be collected and then you 
need to tell the hypervisor about the stats that were collected.  You 
don't need any real correlation between requests and stat reports either.

This really models how target/actual work and I think it suggests that 
we want to reuse that mechanism for the stats too.

 Rusty.
   


-- 
Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: virtio: Add memory statistics reporting to the balloon driver

2009-11-10 Thread Anthony Liguori

Rusty Russell wrote:
 On Wed, 11 Nov 2009 08:22:42 am Anthony Liguori wrote:
   
 Rusty Russell wrote:
 
 On Tue, 10 Nov 2009 03:02:06 am Adam Litke wrote:
   
   
 A simpler approach is to collect memory statistics in the virtio
 balloon driver and communicate them to the host via the device config 
 space.
 
 
 There are two issues I see with this.  First, there's an atomicity problem
 since you can't tell when the stats are consistent.  Second, polling is
 ugly.

 A stats vq might solve this more cleanly?
   
   
 This turns out to not work so nicely.  You really need bidirectional 
 communication.  You need to request that stats be collected and then you 
 need to tell the hypervisor about the stats that were collected.  You 
 don't need any real correlation between requests and stat reports either.
 

 You register an outbuf at initialization time.  The host hands it back when
 it wants you to refill it with stats.
   

That's strangely backwards.  Guest send a stat buffer that's filled out, 
host acks it when it wants another.  That doesn't seem bizarre to you?

 This really models how target/actual work and I think it suggests that 
 we want to reuse that mechanism for the stats too.
 

 Sure, I want to.  You want to.  It's simple.

 But the universe is remarkably indifferent to what we want.  Is it actually
 sufficient or are we going to regret our laziness?
   

It's not laziness, it's consistency.  How is actual different than free 
memory or any other stat?

 Cheers,
 Rusty.
   


-- 
Regards,

Anthony Liguori

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv4 1/6] qemu/virtio: move features to an inline function

2009-11-02 Thread Anthony Liguori

Michael S. Tsirkin wrote:
 devices should have the final say over which virtio features they
 support. E.g. indirect entries may or may not make sense in the context
 of virtio-console. In particular, for vhost, we do not want to report to
 guest bits not supported by kernel backend.  Move the common bits from
 virtio-pci to an inline function and let each device call it.

 No functional changes.
   

This is a layering violation.  There are transport specific features and 
device specific features.  The virtio-net device should have no 
knowledge or nack'ing ability for transport features.

If you need to change transport features, it suggests you're modeling 
things incorrectly and should be supplying an alternative transport 
implementation.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv4 0/6] qemu-kvm: vhost net support

2009-11-02 Thread Anthony Liguori

Hi Michael,

I'll reserve individual patch review until they're in a mergable state, 
but I do have some comments about the overall integration architecture.

Generally speaking, I think the integration unnecessarily invasive.  It 
adds things to the virtio infrastructure that shouldn't be there like 
the irqfd/queuefd bindings.  It also sneaks in things like raw backend 
support which really isn't needed.

I think we can do better.  Here's what I suggest:

The long term goal should be to have a NetDevice interface that looks 
very much like virtio-net but as an API, not an ABI.  Roughly, it would 
look something like:

struct NetDevice {
   int add_xmit(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);
   int add recv(NetDevice *dev, struct iovec *iov, int iovcnt, void *token);

   void *get_xmit(NetDevice *dev);
   void *get_recv(NetDevice *dev);

   void kick(NetDevice *dev);

   ...
};

That gives us a better API for use with virtio-net, e1000, etc.

Assuming we had this interface, I think a natural extension would be:

int add_ring(NetDevice *dev, void *address);
int add_kickfd(NetDevice *dev, int fd);

For slot management, it really should happen outside of the NetDevice 
structure.  We'll need a slot notifier mechanism such that we can keep 
this up to date as things change.

vhost-net because a NetDevice.  It can support things like the e1000 by 
doing ring translation behind the scenes. virtio-net can be fast pathed 
in the case that we're using KVM but otherwise, it would also rely on 
the ring translation.  N.B. in the case vhost-net is fast pathed, it 
requires a different device in QEMU that uses a separate virtio 
transport.  We should reuse as much code as possible obviously.  It 
doesn't make sense to have all of the virtio-pci code and virtio-net 
code in place when we aren't using it.

All this said, I'm *not* suggesting you have to implement all of this to 
get vhost-net merged.  Rather, I'm suggesting that we should try to 
structure the current vhost-net implementation to complement this 
architecture assuming we all agree this is the sane thing to do.  That 
means I would make the following changes to your series:

 - move vhost-net support to a VLANClientState backend.
 - do not introduce a raw socket backend
 - if for some reason you want to back to tap and raw, those should be 
options to the vhost-net backend.
 - when fast pathing with vhost-net, we should introduce interfaces to 
VLANClientState similar to add_ring and add_kickfd.  They'll be very 
specific to vhost-net for now, but that's okay.
 - sort out the layering of vhost-net within the virtio infrastructure.  
vhost-net should really be it's own qdev device.  I don't see very much 
code reuse happening right now so I don't understand why it's not that 
way currently.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-29 Thread Anthony Liguori

Christoph Hellwig wrote:
 On Thu, Oct 29, 2009 at 10:14:19AM -0500, Anthony Liguori wrote:
   
 Which patches are those?
 

 http://repo.or.cz/w/qemu/kraxel.git?a=commitdiff;h=1ee5ee08e4427c3db7e1322d30cc0e58e5ca48b9

 and

 http://repo.or.cz/w/qemu/kraxel.git?a=commitdiff;h=a6e6178185786c582141f993272e00521d3f125a
   

Doesn't look like they've gone on the list yet.

When they do, poke me and I'll make sure to process them quickly.  One 
of my short term goals is to get better at handling easy patches more 
quickly.

Regards,

Anthony Liguori


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-18 Thread Anthony Liguori

H. Peter Anvin wrote:
 On 09/18/2009 10:55 AM, Anthony Liguori wrote:
   
 I fail to see how this is at all relevant.  This is a virtual machine,
 we're presenting virtual hardware that behaves like a serial device. 
 Where web servers fit in is completely beyond me.

 

 s/virtio_console/virtio_serial/

 There is a fairly noticeable difference between a console device and a
 serial device.  However, something that can be extended and exported
 to a physical serial port is definitely the latter.
   

Indeed.  I think part of the confusion here is that virtio_console 
started as just as console and hence it used hvc.  As part of the 
current reworking, I think it makes sense to rename the driver 
virtio_serial.

Regards,

Anthony Liguori

   -hpa

   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-16 Thread Anthony Liguori

Alan Cox wrote:
 This device is very much a serial port.  I don't see any reason not
 to treat it like one.
 

 Here are a few

 - You don't need POSIX multi-open semantics, hangup and the like
   

We do actually want hangup and a few other of the tty specific ops.  The 
only thing we really don't want is a baud rate.

 - Seek makes sense on some kinds of fixed attributes
   

I don't think we're dealing with fixed attributes.  These are streams.  
Fundamentally, this is a paravirtual uart.  The improvement over a 
standard uart is that there can be a larger number of ports, ports can 
have some identification associated with them, and we are not 
constrained to the emulated hardware interface which doesn't exist on 
certain platforms (like s390).

 - TTY has a relatively large memory overhead per device
 - Sysfs is what everything else uses
 - Sysfs has some rather complete lifetime management you'll need to
   redo by hand
   

sysfs doesn't model streaming data which is what this driver provides.

 - You don't need idiotic games with numbering spaces

 Abusing tty for this is ridiculous.

If the argument is that tty is an awkward interface that should only be 
used for legacy purposes, then sure, we should just implement a new 
userspace interface for this.  In fact, this is probably supported by 
the very existence of hvc.

On the other hand, this is fundamentally a paravirtual serial device.  
Since serial devices are exposed via the tty subsystem, it seems like a 
logical choice.

  In some ways putting much of it in
 kernel is ridiculous too as you can do it with a FUSE fs or simply
 export the info guest-guest using SNMP.
   

This device cannot be implemented as-is in userspace because it depends 
on DMA which precludes the use of something like uio_pci.  We could 
modify the device to avoid dma if the feeling was that there was no 
interest in putting this in the kernel.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-15 Thread Anthony Liguori

Amit Shah wrote:
 Hey Greg,

 Can you tell me how this could work out -- each console port could have
 a role string associated with it (obtainable from the invoking qemu
 process in case of qemu/kvm). Something that I have in mind currently
 is:

 $ qemu-kvm ... -virtioconsole role=org/qemu/clipboard

 and then the guest kernel sees the string, and puts the
 org/qemu/clipboard in some file in sysfs. Guest userspace should then
 be able to open and read/write to

 /dev/virtio_console/org/qemu/clipboard
   

That's probably not what we want.  I imagine what we want is:

/dev/ttyV0
/dev/ttyV1
/dev/ttyVN

And then we want:

/sys/class/virtio-console/ttyV0/name - org.qemu.clipboard

Userspace can detect when new virtio-consoles appear via udev events.  
When it sees a new ttyVN, it can then look in sysfs to discover it's name.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-15 Thread Anthony Liguori

Amit Shah wrote:
 On (Tue) Sep 15 2009 [07:57:10], Anthony Liguori wrote:
   
 Amit Shah wrote:
 
 Hey Greg,

 Can you tell me how this could work out -- each console port could have
 a role string associated with it (obtainable from the invoking qemu
 process in case of qemu/kvm). Something that I have in mind currently
 is:

 $ qemu-kvm ... -virtioconsole role=org/qemu/clipboard

 and then the guest kernel sees the string, and puts the
 org/qemu/clipboard in some file in sysfs. Guest userspace should then
 be able to open and read/write to

 /dev/virtio_console/org/qemu/clipboard
   
   
 That's probably not what we want.  I imagine what we want is:

 /dev/ttyV0
 /dev/ttyV1
 /dev/ttyVN

 And then we want:

 /sys/class/virtio-console/ttyV0/name - org.qemu.clipboard

 Userspace can detect when new virtio-consoles appear via udev events.   
 When it sees a new ttyVN, it can then look in sysfs to discover it's 
 name.
 

 OK; but that's kind of roundabout isn't it? An application, instead of
 watching for the console port it's interested in, has to instead monitor
 all the ports.
   

If you wanted to use /dev/virtio/org/qemu/clipboard you still have the 
same problem.  You have to use udev or inotify to listen for a new file 
in a directory.

The /dev/ path may look nicer from a high level, but the code ends up 
being roughly the same for either approach.  What I propose has the 
advantage of looking like existing subsystems.  It also avoids encoding 
device information in the device name.

 So in effect there has to be one app monitoring for new ports and then
 that app exec'ing the corresponding app meant for that port.
   

I think if you think through both models, they end up looking the same.

Regards,

Anthony Liguroi
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-15 Thread Anthony Liguori

Gerd Hoffmann wrote:
 Userspace can detect when new virtio-consoles appear via udev events.
 When it sees a new ttyVN, it can then look in sysfs to discover it's 
 name.

 No.  udev can create symlinks for you, so apps don't have to dig into 
 sysfs.  You'll need a rule along the lines (untested):

SUBSYSTEM==virtio, \
DRIVERS=virtio-console, \
SYMLINK+=vcon/$attr{name}

 then udev will create a /dev/vcon/org.qemu.clipboard symlink pointing 
 to dev/ttyV0.  Apps can just open /dev/vcon/$name then.

That works equally well.  I don't necessarily think that naming is more 
useful.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: pci: is reset incomplete?

2009-09-14 Thread Anthony Liguori

Michael S. Tsirkin wrote:
 On Mon, Sep 14, 2009 at 12:15:29PM -0500, Anthony Liguori wrote:
   
 Michael S. Tsirkin wrote:
 
 Hi!
 pci bus reset does not seem to clear pci config registers, such as BAR
 registers, or memory space enable, of the attached devices: it only
 clears the interrupt state.

 This seems wrong, but easy to fix.
   
   
 I don't think most pci devices reset their config space in their reset  
 callbacks.
 

 For things like BAR registers, they really must.
   

BARs should be registered via pci_register_bar so you should be able to 
centralize their reset.

 class codes are read only registers. Your proposal might be correct for
 some of these. But PCI registers that are reset, change as a result of
 guest activity, and reset values are typically specified by guest spec.
 So I don't think we should let users tweak these.
   

Well, I guess my general point was that it would be good to add more 
structure to how config space is initialized.  I think a natural 
consequence of that is that it becomes easier to automatically fix the 
values on reset.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] virtio_console: Add support for multiple ports for generic guest and host communication

2009-09-11 Thread Anthony Liguori

Amit Shah wrote:
 Right; there was some discussion about this. A few alternatives were
 suggested like

 - udev scripts to create symlinks from ports to function, like:

   /dev/vcon3 - /dev/virtio-console/clipboard

 - Some fqdn-like hierarchy, like

   /dev/virtio-console/com/redhat/clipboard

   which again can be created by udev scripts
   

And I dislike all of them.  What I'd rather have is these devices 
exposed as tty's with a sys attribute that exposed the name of the device.

This is not all that different to how usb serial devices work.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] SCSI driver for VMware's virtual HBA - V4.

2009-09-09 Thread Anthony Liguori

 specifying the LUN.
 + * - _pad should be 0.
 + */
 +
 +typedef struct PVSCSICmdDescAbortCmd {
 + u64 context;
 + u32 target;
 + u32 _pad;
 +} __packed PVSCSICmdDescAbortCmd;
 +
 +/*
 + * Command descriptor for PVSCSI_CMD_SETUP_RINGS --
 + *
 + * Notes:
 + * - reqRingNumPages and cmpRingNumPages need to be power of two.
 + * - reqRingNumPages and cmpRingNumPages need to be different from 0,
 + * - reqRingNumPages and cmpRingNumPages need to be inferior to
 + *   PVSCSI_SETUP_RINGS_MAX_NUM_PAGES.
 + */
 +
 +#define PVSCSI_SETUP_RINGS_MAX_NUM_PAGES32
 +typedef struct PVSCSICmdDescSetupRings {
 + u32 reqRingNumPages;
 + u32 cmpRingNumPages;
 + u64 ringsStatePPN;
 + u64 reqRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
 + u64 cmpRingPPNs[PVSCSI_SETUP_RINGS_MAX_NUM_PAGES];
 +} __packed PVSCSICmdDescSetupRings;
 +
 +/*
 + * Command descriptor for PVSCSI_CMD_SETUP_MSG_RING --
 + *
 + * Notes:
 + * - this command was not supported in the initial revision of the h/w
 + *   interface. Before using it, you need to check that it is supported by
 + *   writing PVSCSI_CMD_SETUP_MSG_RING to the 'command' register, then
 + *   immediately after read the 'command status' register:
 + *   * a value of -1 means that the cmd is NOT supported,
 + *   * a value != -1 means that the cmd IS supported.
 + *   If it's supported the 'command status' register should return:
 + *  sizeof(PVSCSICmdDescSetupMsgRing) / sizeof(u32).
 + * - this command should be issued _after_ the usual SETUP_RINGS so that the
 + *   RingsState page is already setup. If not, the command is a nop.
 + * - numPages needs to be a power of two,
 + * - numPages needs to be different from 0,
 + * - _pad should be zero.
 + */
 +
 +#define PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES  16
 +
 +typedef struct PVSCSICmdDescSetupMsgRing {
 + u32 numPages;
 + u32 _pad;
 + u64 ringPPNs[PVSCSI_SETUP_MSG_RING_MAX_NUM_PAGES];
 +} __packed PVSCSICmdDescSetupMsgRing;
 +
 +enum PVSCSIMsgType {
 + PVSCSI_MSG_DEV_ADDED  = 0,
 + PVSCSI_MSG_DEV_REMOVED= 1,
 + PVSCSI_MSG_LAST   = 2,
 +};
 +
 +/*
 + * Msg descriptor.
 + *
 + * sizeof(struct PVSCSIRingMsgDesc) == 128.
 + *
 + * - type is of type enum PVSCSIMsgType.
 + * - the content of args depend on the type of event being delivered.
 + */
 +
 +typedef struct PVSCSIRingMsgDesc {
 + u32 type;
 + u32 args[31];
 +} __packed PVSCSIRingMsgDesc;
 +
 +typedef struct PVSCSIMsgDescDevStatusChanged {
 + u32 type;  /* PVSCSI_MSG_DEV _ADDED / _REMOVED */
 + u32 bus;
 + u32 target;
 + u8  lun[8];
 + u32 pad[27];
 +} __packed PVSCSIMsgDescDevStatusChanged;
 +
 +/*
 + * Rings state.
 + *
 + * - the fields:
 + *. msgProdIdx,
 + *. msgConsIdx,
 + *. msgNumEntriesLog2,
 + *   .. are only used once the SETUP_MSG_RING cmd has been issued.
 + * - '_pad' helps to ensure that the msg related fields are on their own
 + *   cache-line.
 + */
 +
 +typedef struct PVSCSIRingsState {
 + u32 reqProdIdx;
 + u32 reqConsIdx;
 + u32 reqNumEntriesLog2;
 +
 + u32 cmpProdIdx;
 + u32 cmpConsIdx;
 + u32 cmpNumEntriesLog2;
 +
 + u8  _pad[104];
 +
 + u32 msgProdIdx;
 + u32 msgConsIdx;
 + u32 msgNumEntriesLog2;
 +} __packed PVSCSIRingsState;

All of this can be hidden behind a struct virtqueue.  You could then 
introduce a virtio-vmwring that implemented this ABI.

You could then separate out the actual scsi logic into a separate 
virtio-scsi.c driver.

There are numerous advantages to this layering.  You get to remain ABI 
compatible with yourself.  You can potentially reuse the ring logic in 
other drivers.  Other VMMs can use different ring transports without 
introducing new drivers.  For instance, in KVM, we would use virtio-pci 
+ virtio-scsi instead of using virtio-vmwring + virtio-scsi.

The virtio infrastructure has been backported to various old kernels too 
so there really is no reason not to use it.

We have the interfaces in virtio to do notification suppression and MSI 
so I don't think there's any real limitation that it would impose on you.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] SCSI driver for VMware's virtual HBA - V4.

2009-09-09 Thread Anthony Liguori

Alok Kataria wrote:
 I see your point, but the ring logic or the ABI that we use to
 communicate between the hypervisor and guest is not shared between our
 storage and network drivers. As a result, I don't see any benefit of
 separating out this ring handling mechanism, on the contrary it might
 just add some overhead of translating between various layers for our
 SCSI driver.
   

But if you separate out the ring logic, it allows the scsi logic to be 
shared by other paravirtual device drivers.  This is significant and 
important from a Linux point of view.

There is almost nothing vmware specific about the vast majority of your 
code.  If you split out the scsi logic, then you will receive help 
debugging, adding future features, and improving performance from other 
folks interested in this.  In the long term, it will make your life 
much, much easier by making the driver relevant to a wider audience :-)

 Having said that, I will like to add that yes if in some future
 iteration of our paravirtualized drivers, if we decide to share this
 ring mechanism for our various device drivers this might be an
 interesting proposition. 
   

I am certainly not the block subsystem maintainer, but I'd hate to see 
this merged without any attempt at making the code more reusable.  If 
adding the virtio layering is going to somehow hurt performance, break 
your ABI, or in some way limit you, then that's something to certainly 
discuss and would be valid concerns.  That said, I don't think it's a 
huge change to your current patch and I don't see any obvious problems 
it would cause.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] SCSI driver for VMware's virtual HBA - V4.

2009-09-09 Thread Anthony Liguori

Christoph Hellwig wrote:
 On Wed, Sep 09, 2009 at 05:12:26PM -0500, Anthony Liguori wrote:
   
 Alok Kataria wrote:
 
 I see your point, but the ring logic or the ABI that we use to
 communicate between the hypervisor and guest is not shared between our
 storage and network drivers. As a result, I don't see any benefit of
 separating out this ring handling mechanism, on the contrary it might
 just add some overhead of translating between various layers for our
 SCSI driver.
   
   
 But if you separate out the ring logic, it allows the scsi logic to be  
 shared by other paravirtual device drivers.  This is significant and  
 important from a Linux point of view.
 

 As someone who has been hacking on a virtio scsi prototype I don't think
 it's a good idea.  The vmware driver is a horrible design and I don't
 think it should be merged.

What are the issues with the design compared to how you're approaching 
virtio-scsi?

   Besides beeing a ugly driver and ABI we
 really should not support this kind of closed protocol development.
   

I don't see how a VMM that doesn't share the source code for it's 
backends or doesn't implement standard ABIs is any different than the 
hundreds of hardware vendors that behave exactly the same way.

We haven't even been successful in getting the Xen folks to present 
their work on lkml before shipping it to their users.  Why would we 
expect more from VMware if we're willing to merge the Xen stuff?

Regards,

Anthony Liguori


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH] SCSI driver for VMware's virtual HBA - V4.

2009-09-09 Thread Anthony Liguori

Jeremy Fitzhardinge wrote:
 On 09/09/09 16:34, Anthony Liguori wrote:
   
 We haven't even been successful in getting the Xen folks to present 
 their work on lkml before shipping it to their users.  Why would we 
 expect more from VMware if we're willing to merge the Xen stuff?
   
 

 The Xen code may be out of tree, but it has always been readily
 available from a well-known place.  I don't think its comparable.
   

Once an ABI is set in stone, there's very little that can be done on 
lkml to review the ABI in any serious way.

VMware has repeatedly done this in the past.  Ship a product with their 
own drivers, then submit something to lkml saying that the ABI cannot be 
changed because it's present in a shipping product.

I'll admit, Xen hasn't been as bad as VMware in this regard, but it's 
certainly not perfect either.

Regards,

Anthony Liguori

 J
   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-31 Thread Anthony Liguori

Amit Shah wrote:
 No flags, assume it's a streaming protocol and don't assume anything  
 about message sizes.  IOW, when you send clipboard data, send size and  
 then the data.  QEMU consumes bytes until it reaches size.
 

 Same intent but a different method: I'll have to specify that particular
 data is size and that data after this special data is the actual data
 stream.
   

Sounds like every stream protocol in existence :-)

 - A lock has to be introduced to fetch one unused buffer from the list
   and pass it on to the host. And this lock has to be a spinlock, just
   because writes can be called from irq context.
   
 I don't see a problem here.
 

 You mean you don't see a problem in using a spinlock vs not using one?
   

Right.  This isn't a fast path.

 Userspace will typically send the entire buffer to be transmitted in one
 system call. If it's large, the system call will have to be broken into
 several. This results in multiple guest system calls, each one to be
 handled with a spinlock held.

 Compare this with the entire write handled in one system call in the
 current method.
   

Does it matter?  This isn't a fast path.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-31 Thread Anthony Liguori

Amit Shah wrote:
 Can you please explain your rationale for being so rigid about merging
 the two drivers?
   

Because they do the same thing.  I'm not going to constantly rehash 
this.  It's been explained multiple times.

If there are implementation issues within the Linux drivers because of 
peculiarities of hvc then hvc needs to be fixed.  It has nothing to do 
with the driver ABI which is what qemu cares about.

Regards,

Anthony Liguori

   Amit
   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-31 Thread Anthony Liguori

Amit Shah wrote:
 On (Mon) Aug 31 2009 [09:21:13], Anthony Liguori wrote:
   
 Amit Shah wrote:
 
 Can you please explain your rationale for being so rigid about merging
 the two drivers?
   
   
 Because they do the same thing.  I'm not going to constantly rehash  
 this.  It's been explained multiple times.
 

 It hardly looks like the same thing each passing day.
   

That's BS.  The very first time you posted, you received the same 
feedback from both Paul and I.  See 
http://article.gmane.org/gmane.comp.emulators.qemu/44778.  That was back 
in June.  You've consistently received the same feedback both on the ML 
and in private.

 We're ending up having to compromise on the performance or functionality
 or simplicity the devices just because of this restriction.
   

This is _not_ a high performance device and there so far has been no 
functionality impact.  I don't understand why you keep dragging your 
feet about this.  It's very simple, if you post a functional set of 
patches for a converged virtio-console driver, we'll merge it.  If you 
keep arguing about having a separate virtio-serial driver, it's not 
going to get merged.  I don't know how to be more clear than this.

 If there are implementation issues within the Linux drivers because of  
 peculiarities of hvc then hvc needs to be fixed.  It has nothing to do  
 with the driver ABI which is what qemu cares about.
 

 I'd welcome that effort as well. But we all know that's not going to
 happen anytime soon.
   

That is not a justification to add a new device in QEMU.  If we add a 
new device everytime we encounter a less than ideal interface within a 
guest, we're going to end up having hundreds of devices.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-31 Thread Anthony Liguori

Amit Shah wrote:
 We're ending up having to compromise on the performance or functionality
 or simplicity the devices just because of this restriction.
   
   
 This is _not_ a high performance device and there so far has been no  
 functionality impact.  I don't understand why you keep dragging your  
 feet about this.  It's very simple, if you post a functional set of  
 patches for a converged virtio-console driver, we'll merge it.  If you  
 

 I have already posted them and have received no feedback about the
 patches since. Let me add another request here for you to review them.
   

But the guest drivers do not have proper locking.  Have you posted a new 
series with that fixed?

 keep arguing about having a separate virtio-serial driver, it's not  
 going to get merged.  I don't know how to be more clear than this.
 

 I'm not at all arguing for a separate virtio-serial driver. Please note
 the difference in what I'm asking for: I'm just asking for a good
 justification for the merging of the two since it just makes both the
 drivers not simple and also introduces dependencies on code outside our
 control.
   

Functionally speaking, both virtio-console and virtio-serial do the same 
thing.  In fact, virtio-console is just a subset of virtio-serial.

If there are problems converging the two drivers in Linux, then I 
suggest you have two separate driver modules in Linux.  That would 
obviously be rejected for Linux though because you cannot have two 
drivers for the same device.  Why should qemu have a different policy?

  

 That is not a justification to add a new device in QEMU.  If we add a  
 new device everytime we encounter a less than ideal interface within a  
 guest, we're going to end up having hundreds of devices.
 

 I just find this argument funny.
   

I'm finding this discussion not so productive.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCHv5 3/3] vhost_net: a kernel-level virtio server

2009-08-31 Thread Anthony Liguori

Avi Kivity wrote:
 On 08/31/2009 02:42 PM, Xin, Xiaohui wrote:
 Hi, Michael
 That's a great job. We are now working on support VMDq on KVM, and 
 since the VMDq hardware presents L2 sorting based on MAC addresses 
 and VLAN tags, our target is to implement a zero copy solution using 
 VMDq. We stared from the virtio-net architecture. What we want to 
 proposal is to use AIO combined with direct I/O:
 1) Modify virtio-net Backend service in Qemu to submit aio requests 
 composed from virtqueue.
 2) Modify TUN/TAP device to support aio operations and the user space 
 buffer directly mapping into the host kernel.
 3) Let a TUN/TAP device binds to single rx/tx queue from the NIC.
 4) Modify the net_dev and skb structure to permit allocated skb to 
 use user space directly mapped payload buffer address rather then 
 kernel allocated.

 As zero copy is also your goal, we are interested in what's in your 
 mind, and would like to collaborate with you if possible.


 One way to share the effort is to make vmdq queues available as normal 
 kernel interfaces.

It may be possible to make vmdq appear like an sr-iov capable device 
from userspace.  sr-iov provides the userspace interfaces to allocate 
interfaces and assign mac addresses.  To make it useful, you would have 
to handle tx multiplexing in the driver but that would be much easier to 
consume for kvm.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-30 Thread Anthony Liguori

Amit Shah wrote:
 I did think about that as well, but there are problems:

 - vnc clients (at least tigervnc) wants to receive the entire clipboard
   in a single flush command. So in the pre-allocated buffers scenario we
   could run short of the available buffers in some cases. So there will
   have to be a flag with each buffer that says 'there's more data
   pending for this particular write' which will have to be passed on to
   qemu and qemu will then flush it once it receives all the data
   

No flags, assume it's a streaming protocol and don't assume anything 
about message sizes.  IOW, when you send clipboard data, send size and 
then the data.  QEMU consumes bytes until it reaches size.

 - A lock has to be introduced to fetch one unused buffer from the list
   and pass it on to the host. And this lock has to be a spinlock, just
   because writes can be called from irq context.
   

I don't see a problem here.

Regards,

Anthony Liguori

   Amit
   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: Extending virtio_console to support multiple ports

2009-08-28 Thread Anthony Liguori

Amit Shah wrote:
 On (Tue) Aug 25 2009 [11:47:20], Amit Shah wrote:
   
 Hello all,

 Here is a new iteration of the patch series that implements a
 transport for guest and host communications.

 The code has been updated to reuse the virtio-console device instead
 of creating a new virtio-serial device.
 

 And the problem now is that hvc calls the put_chars function with
 spinlocks held and we now allocate pages in send_buf(), called from
 put_chars.
   

Don't allocate pages in send_buf.  There's a fixed number of possible 
entries on the ring.  Preallocate them up front and then you don't need 
to sleep.

 A few solutions:
 - Keep things as they are, virtio_console.c remains as it is and
   virtio_serial.c gets added
   
Not an option from a QEMU perspective.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: vhost net: performance with ping benchmark

2009-08-25 Thread Anthony Liguori

Avi Kivity wrote:
 I think this is likely going to be needed regardless.  I also think 
 the tap compatibility suggestion would simplify the consumption of 
 this in userspace.

 What about veth pairs?

Does veth support GSO and checksum offload?

 I'd like some time to look at get_state/set_state ioctl()s along with 
 dirty tracking support.  It's a much better model for live migration 
 IMHO.

 My preference is ring proxying.  Not we'll need ring proxying (or at 
 least event proxying) for non-MSI guests.

I avoided suggested ring proxying because I didn't want to suggest that 
merging should be contingent on it.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: vhost net: performance with ping benchmark

2009-08-25 Thread Anthony Liguori

Michael S. Tsirkin wrote:
 My preference is ring proxying.  Not we'll need ring proxying (or at  
 least event proxying) for non-MSI guests.
 

 Exactly, that's what I meant earlier. That's enough, isn't it, Anthony?
   

It is if we have a working implementation that demonstrates the 
userspace interface is sufficient.  Once it goes into the upstream 
kernel, we need to have backwards compatibility code in QEMU forever to 
support that kernel version.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: vhost net: performance with ping benchmark

2009-08-24 Thread Anthony Liguori

Michael S. Tsirkin wrote:
 On Mon, Aug 24, 2009 at 11:12:41AM +0300, Michael S. Tsirkin wrote:
   
 At Rusty's suggestion, I tested vhost base performance with ping.
 Results below, and seem to be what you'd expect.
 

 Rusty, any chance you could look at the code?  Is it in reasonable
 shape? I think it makes sense to merge it through you. What do you
 think?  One comment on file placement: I put files under a separate
 vhost directory to avoid confusion with virtio-net which runs in guest.
 Does this sound sane?  Also, can a minimal version (without TSO, tap or
 any other features) be merged upstream first so that features can be
 added later? Or do we have to wait until it's more full featured?
 Finally, can it reasonably make 2.6.32, or you think it needs more time
 out of tree?
   

I think 2.6.32 is pushing it.  I think some time is needed to flush out 
the userspace interface.  In particular, I don't think Mark's comments 
have been adequately addressed.  If a version were merged without GSO 
support, some mechanism to do feature detection would be needed in the 
userspace API.  I think this is likely going to be needed regardless.  I 
also think the tap compatibility suggestion would simplify the 
consumption of this in userspace.

I'd like some time to look at get_state/set_state ioctl()s along with 
dirty tracking support.  It's a much better model for live migration IMHO.

I think so more thorough benchmarking would be good too.  In particular, 
netperf/iperf runs would be nice.

Regards,

Anthony Liguori

 Thanks very much,

   

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-14 Thread Anthony Liguori

Amit Shah wrote:
 On (Mon) Aug 10 2009 [11:59:31], Anthony Liguori wrote:
   
 However, as I've mentioned repeatedly, the reason I won't merge  
 virtio-serial is that it duplicates functionality with virtio-console.   
 If the two are converged, I'm happy to merge it.  I'm not opposed to  
 having more functionality.
 

 The guest code sort-of ends up looking like this after merging
 virtio_console into virtio_serial. Diff is against virtio_serial in my
 git tree, but that should be pretty close to the last submission I made
 at

 http://patchwork.kernel.org/patch/39335/

 or the tree at

 git://git.kernel.org/pub/scm/linux/kernel/git/amit/vs-kernel.git

 I've merged bits from virtio_console.c into virtio_serial.c. If needed,
 virtio_serial can be renamed to virtio_console. The VIRITIO_ID also
 needs to change to that of virtio_console's.

 Similar changes are needed for userspace.

 Since there's support for only one console as of now, I've assigned port
 #2 as the console port so that we hook into hvc when a port is found at
 that location.

 One issue that crops up for put_chars: a copy of the buffer to be sent
 has to be made as we don't wait for host to ack the buffer before we
 move on.

 The biggest loss so far is Rusty's excellent comments: they will have to
 be reworked and added for the whole of the new file.

 Is this approach acceptable?
   

I think we want to keep virtio_console.c and definitely continue using 
the virtio_console id.  It looks like you are still creating character 
devices as opposed to tty devices.  Is this just an incremental step or 
are you choosing to not do tty devices for a reason (as if so, what's 
that reason)?

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: virtio-serial: An interface for host-guest communication

2009-08-14 Thread Anthony Liguori

Gerd Hoffmann wrote:
 On 08/14/09 10:15, Amit Shah wrote:
 The guest code sort-of ends up looking like this after merging
 virtio_console into virtio_serial.

 I think it should better go the other way around: add multichannel 
 support to virtio-concole, probably guarded by a feature flag so old 
 host+new guest and new host+old guest combinations continue to work.

 Since there's support for only one console as of now, I've assigned port
 #2 as the console port so that we hook into hvc when a port is found at
 that location.

 Doesn't sound like this is going to be backward compatible ...

 Also I still think passing a 'protocol' string for each port is a good 
 idea, so you can stick that into a sysfs file for guests use.

Or drops ports altogether and just use protocol strings...

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [Qemu-devel] Re: [PATCH] qemu/virtio: move features to an inline function

2009-08-12 Thread Anthony Liguori

Michael S. Tsirkin wrote:
 On
   
 Why bother switching to userspace for migration?  Can't you just have  
 get/set ioctls for the state?
 

 I have these. But live migration requires dirty page logging.
 I do not want to implement it in kernel.
   

Is it really that difficult?  I think it would be better to just do that.

I wonder though if mmu notifiers can be used to make it transparent...

Regards,

Anthony Liguori

 Regards,

 Anthony Liguori

 

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] vhost_net: a kernel-level virtio server

2009-08-12 Thread Anthony Liguori

Michael S. Tsirkin wrote:

 We discussed this before, and I still think this could be directly derived
 from struct virtqueue, in the same way that vring_virtqueue is derived from
 struct virtqueue.
 

 I prefer keeping it simple. Much of abstraction in virtio is due to the
 fact that it needs to work on top of different hardware emulations:
 lguest,kvm, possibly others in the future.  vhost is always working on
 real hardware, using eventfd as the interface, so it does not need that.
   

Actually, vhost may not always be limited to real hardware.

We may on day use vhost as the basis of a driver domain.  There's quite 
a lot of interest in this for networking.

At any rate, I'd like to see performance results before we consider 
trying to reuse virtio code.

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

Re: [PATCH 2/2] vhost_net: a kernel-level virtio server

2009-08-12 Thread Anthony Liguori

Arnd Bergmann wrote:
 As I pointed out earlier, most code in virtio net is asymmetrical: guest
 provides buffers, host consumes them.  Possibly, one could use virtio
 rings in a symmetrical way, but support of existing guest virtio net
 means there's almost no shared code.
 

 The trick is to swap the virtqueues instead. virtio-net is actually
 mostly symmetric in just the same way that the physical wires on a
 twisted pair ethernet are symmetric (I like how that analogy fits).
   

It's already been done between two guests.  See 
http://article.gmane.org/gmane.linux.kernel.virtualization/5423

Regards,

Anthony Liguori
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linux-foundation.org/mailman/listinfo/virtualization

1 2 3 >

1 - 100 of 274 matches

Mail list logo