Re: Using block device instead of character device for virtio-serial

2014-02-17 Thread Stefan Hajnoczi
On Sat, Feb 15, 2014 at 10:30:13AM +0530, Jobin Raju George wrote:
 On Fri, Feb 14, 2014 at 4:05 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 
  On Sun, Feb 09, 2014 at 11:39:19PM +0530, Jobin Raju George wrote:
   On Sun, Feb 9, 2014 at 2:42 PM, Stefan Hajnoczi stefa...@gmail.com 
   wrote:
  
On Thu, Feb 06, 2014 at 12:22:36PM +0530, Jobin Raju George wrote:
 I am trying to establish a communication mechanism between the guest
 and its host using virtio-serial. For this I am using the following to
 boot the VM:

 qemu-system-x86_64 -m 1024 \
 -name ubuntu_vm \
 -hda ubuntu \
 -device virtio-serial \
 -chardev socket,path=/tmp/virt_socket,server,nowait,id=virt_socket \
 -device virtconsole,name=v_soc,chardev=virt_socket,name=ubuntu_vm_soc

 This creates a character device on the guest machine and a UNIX socket
 on the host machine.

 1) Is there a way I can create sockets on the host as well as the 
 guest?
   
The syntax is documented on the QEMU man page.  Try:
   
-chardev socket,port=1234,server,nowait,id=virt_socket
   
  
   I did not try this out, but would this create a socket instead of a
   character device(/dev/hvc0) on the guest?
 
  Things should be unchanged inside the guest.  This just creates a TCP
  socket on the host.
 My main concern is creating a socket in the guest.

The virtio-serial Linux guest driver is a character device, not a
socket.  So if a socket is really necessary you'd need a small program
to forward data between a socket and the character device (like
netcat/socat).

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 69361] Host call trace and guest hang after create guest.

2014-02-17 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=69361

robert...@intel.com changed:

   What|Removed |Added

 CC||robert...@intel.com

--- Comment #8 from robert...@intel.com ---
Is this patch in upstream now? any update?

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends

2014-02-17 Thread David Vrabel
On 15/02/14 02:59, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 This v2 series changes the approach from my original virtualization
 multicast patch series [0] by abandoning completely the multicast
 issues and instead generalizing an approach for virtualization
 backends. There are two things in common with virtualization
 backends:
 
   0) they should not become the root bridge
   1) they don't need ipv4 / ipv6 interfaces

Why?  There's no real difference between a backend network device and a
physical device (from the point of view of the backend domain).  I do
not think these are intrinsic properties of backend devices.

I can see these being useful knobs for administrators (or management
toolstacks) to turn on, on a per-device basis.

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 3/4] xen-netback: use a random MAC address

2014-02-17 Thread David Vrabel
On 15/02/14 02:59, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF
 was to prevent our backend interfaces from being used by the
 bridge and nominating our interface as a root bridge. This was
 possible given that the bridge code will use the lowest MAC
 address for a port once a new interface gets added to the bridge.
 The bridge code has a generic feature now to allow interfaces
 to opt out from root bridge nominations, use that instead.
[...]
 --- a/drivers/net/xen-netback/interface.c
 +++ b/drivers/net/xen-netback/interface.c
 @@ -42,6 +42,8 @@
  #define XENVIF_QUEUE_LENGTH 32
  #define XENVIF_NAPI_WEIGHT  64
  
 +static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e };

You shouldn't use a vendor prefix with a random MAC address.  You should
set the locally administered bit and clear the multicast/unicast bit and
randomize the remaining 46 bits.

(If existing VIF scripts are doing something similar, they also need to
be fixed.)

David
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)

2014-02-17 Thread Michael S. Tsirkin

On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote:
 Hi
 
 Please, send any topic that you are interested in covering.
 
 * Should we change anything to get more people to sign for the call?
   There hasn't been a call in quite a long time.  Ideas?
 
 Thanks, Juan.
 
 Call details:
 
 09:00 AM to 10:00 AM EDT
 Every two weeks
 
 If you need phone number details,  contact me privately

How about discussing plans for x2apic?
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)

2014-02-17 Thread Peter Maydell
On 17 February 2014 14:19, Michael S. Tsirkin m...@redhat.com wrote:

 On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote:
 Hi

 Please, send any topic that you are interested in covering.

 * Should we change anything to get more people to sign for the call?
   There hasn't been a call in quite a long time.  Ideas?

 Thanks, Juan.

 Call details:

 09:00 AM to 10:00 AM EDT
 Every two weeks

 If you need phone number details,  contact me privately

 How about discussing plans for x2apic?

We were going to discuss the release timetable/plans
this week as well, weren't we?

thanks
-- PMM
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)

2014-02-17 Thread Michael S. Tsirkin
On Mon, Feb 17, 2014 at 02:17:21PM +, Peter Maydell wrote:
 On 17 February 2014 14:19, Michael S. Tsirkin m...@redhat.com wrote:
 
  On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote:
  Hi
 
  Please, send any topic that you are interested in covering.
 
  * Should we change anything to get more people to sign for the call?
There hasn't been a call in quite a long time.  Ideas?
 
  Thanks, Juan.
 
  Call details:
 
  09:00 AM to 10:00 AM EDT
  Every two weeks
 
  If you need phone number details,  contact me privately
 
  How about discussing plans for x2apic?
 
 We were going to discuss the release timetable/plans
 this week as well, weren't we?
 
 thanks
 -- PMM

I haven't seen the agenda on list - did I miss it?
But it sure makes sense.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces

2014-02-17 Thread Zoltan Kiss

There is a valid scenario to put IP addresses on the backend VIFs:

http://wiki.xen.org/wiki/Xen_Networking#Routing

Also, the backend is not necessarily Dom0, you can connect twou guests 
with backend/frontend pairs.


Zoli

On 15/02/14 02:59, Luis R. Rodriguez wrote:

From: Luis R. Rodriguez mcg...@suse.com

The xen-netback driver is used only to provide a backend
interface for the frontend. The link is the only thing we
use, and that is used internally for letting us know when the
xen-netfront is ready, when it switches to XenbusStateConnected.

Note that only when the both the xen-netfront and xen-netback
are both in state XenbusStateConnected will xen-netback allow
userspace on the host (backend) to bring up the interface. Enabling
and disabling the interface will simply enable or disable NAPI
respectively, and that's used for IRQ communication set up with
the xen event channels.

Cc: Paul Durrant paul.durr...@citrix.com
Cc: Ian Campbell ian.campb...@citrix.com
Cc: Wei Liu wei.l...@citrix.com
Cc: xen-de...@lists.xenproject.org
Cc: net...@vger.kernel.org
Signed-off-by: Luis R. Rodriguez mcg...@suse.com
---
  drivers/net/xen-netback/interface.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index d380e3f..07e6fd2 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -351,7 +351,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t 
domid,

eth_hw_addr_random(dev);
memcpy(dev-dev_addr, xen_oui, 3);
-   dev-priv_flags |= IFF_BRIDGE_NON_ROOT;
+   dev-priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP;
netif_napi_add(dev, vif-napi, xenvif_poll, XENVIF_NAPI_WEIGHT);

netif_carrier_off(dev);



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge

2014-02-17 Thread Zoltan Kiss

On 15/02/14 02:59, Luis R. Rodriguez wrote:

From: Luis R. Rodriguez mcg...@suse.com

It doesn't make sense for some interfaces to become a root bridge
at any point in time. One example is virtual backend interfaces
which rely on other entities on the bridge for actual physical
connectivity. They only provide virtual access.


It is possible that a guest bridge together to VIF, either from the same 
Dom0 bridge or from different ones. In that case using STP on VIFs sound 
sensible to me.


Zoli
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: ioapic polarity vs. qemu os-x guest

2014-02-17 Thread Gabriel L. Somlo
On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote:
 On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote:
  Well there is a bigger issue: any interrupt with
  multiple sources is broken.
  
  __kvm_irq_line_state does a logical OR of all sources,
  before XOR with polarity.
  
  This makes no sense if polarity is active low.
 
 So, do you think something like this would make sense, to address
 active-low polarity in __kvm_irq_line_state ?
 (this would be independent of the subsequent xor in
 kvm_ioapic_set_irq()):
 
Make that rather:

-static inline int __kvm_irq_line_state(unsigned long *irq_state,
+static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity,
int irq_source_id, int level)
{
-/* Logical OR for level trig interrupt */
if (level)
__set_bit(irq_source_id, irq_state);
else
__clear_bit(irq_source_id, irq_state);

-   return !!(*irq_state);
+   if (polarity) {
+   /* Logical AND for level trig interrupt, active-low */
+   return !~(*irq_state);
+   } else {
+   /* Logical OR for level trig interrupt, active-high */
+   return !!(*irq_state);
+   }
}

Thanks, and sorry for the noise :)
--Gabriel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: ioapic polarity vs. qemu os-x guest

2014-02-17 Thread Gabriel L. Somlo
On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote:
 Well there is a bigger issue: any interrupt with
 multiple sources is broken.
 
 __kvm_irq_line_state does a logical OR of all sources,
 before XOR with polarity.
 
 This makes no sense if polarity is active low.

So, do you think something like this would make sense, to address
active-low polarity in __kvm_irq_line_state ?
(this would be independent of the subsequent xor in
kvm_ioapic_set_irq()):

-static inline int __kvm_irq_line_state(unsigned long *irq_state,
+static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity,
int irq_source_id, int level)
{
-/* Logical OR for level trig interrupt */
if (level)
__set_bit(irq_source_id, irq_state);
else
__clear_bit(irq_source_id, irq_state);

-   return !!(*irq_state);
+   if (polarity) {
+   /* Logical OR for level trig interrupt, active-high */
+   return !!(*irq_state);
+   } else { // active-low
+   /* Logical AND for level trig interrupt, active-low */
+   return !~(*irq_state);
+   }
}

Thanks,
--Gabriel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: ioapic polarity vs. qemu os-x guest

2014-02-17 Thread Paolo Bonzini

Il 17/02/2014 19:01, Gabriel L. Somlo ha scritto:

On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote:

On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote:

Well there is a bigger issue: any interrupt with
multiple sources is broken.

__kvm_irq_line_state does a logical OR of all sources,
before XOR with polarity.

This makes no sense if polarity is active low.


So, do you think something like this would make sense, to address
active-low polarity in __kvm_irq_line_state ?
(this would be independent of the subsequent xor in
kvm_ioapic_set_irq()):


Make that rather:

-static inline int __kvm_irq_line_state(unsigned long *irq_state,
+static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity,
int irq_source_id, int level)
{
-/* Logical OR for level trig interrupt */
if (level)
__set_bit(irq_source_id, irq_state);
else
__clear_bit(irq_source_id, irq_state);

-   return !!(*irq_state);
+   if (polarity) {
+   /* Logical AND for level trig interrupt, active-low */
+   return !~(*irq_state);


This is ~*irq_state == 0, i.e. *irq_state == ~0.

What if high-order bits of *irq_state are never used?  That is, do you 
need to consider the maximum valid irq_source_id too?




+   } else {
+   /* Logical OR for level trig interrupt, active-high */
+   return !!(*irq_state);


Better rewrite this as *irq_state != 0.

Paolo


+   }
}

Thanks, and sorry for the noise :)
--Gabriel



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: ioapic polarity vs. qemu os-x guest

2014-02-17 Thread Gabriel L. Somlo
On Mon, Feb 17, 2014 at 07:06:11PM +0100, Paolo Bonzini wrote:
 Il 17/02/2014 19:01, Gabriel L. Somlo ha scritto:
 On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote:
 On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote:
 Well there is a bigger issue: any interrupt with
 multiple sources is broken.
 
 __kvm_irq_line_state does a logical OR of all sources,
 before XOR with polarity.
 
 This makes no sense if polarity is active low.
 
 So, do you think something like this would make sense, to address
 active-low polarity in __kvm_irq_line_state ?
 (this would be independent of the subsequent xor in
 kvm_ioapic_set_irq()):
 
 -return !!(*irq_state);
 +if (polarity) {
 +/* Logical AND for level trig interrupt, active-low */
 +return !~(*irq_state);
 
 This is ~*irq_state == 0, i.e. *irq_state == ~0.
 
 What if high-order bits of *irq_state are never used?  That is, do
 you need to consider the maximum valid irq_source_id too?

Oh, I think I'm starting to comprehend the problem here. The bits of
*irq_state are indexed by irq_source_id, which is dynamically
assigned by kvm_request_irq_source_id().

So, doing the OR thing when assuming always-active-high makes
sense. Doing AND based on an active-low assumption doesn't make
sense, because there could ALWAYS be 0 bits that just weren't
allocated (yet), and I'm having trouble imagining how I'd keep
track of where the current allocation boundary is in a sane way :)

Which I *think* was Michael's original point...

--Gabriel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] kvm/vfio: Support for DMA coherent IOMMUs

2014-02-17 Thread Alex Williamson
VFIO now has support for using the IOMMU_CACHE flag and a mechanism
for an external user to test the current operating mode of the IOMMU.
Add support for this to the kvm-vfio pseudo device so that we only
register noncoherent DMA when necessary.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: Gleb Natapov g...@kernel.org
Cc: Paolo Bonzini pbonz...@redhat.com
---
 virt/kvm/vfio.c |   27 ---
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c
index b4f9507..ba1a93f 100644
--- a/virt/kvm/vfio.c
+++ b/virt/kvm/vfio.c
@@ -59,6 +59,22 @@ static void kvm_vfio_group_put_external_user(struct 
vfio_group *vfio_group)
symbol_put(vfio_group_put_external_user);
 }
 
+static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group)
+{
+   long (*fn)(struct vfio_group *, unsigned long);
+   long ret;
+
+   fn = symbol_get(vfio_external_check_extension);
+   if (!fn)
+   return false;
+
+   ret = fn(vfio_group, VFIO_DMA_CC_IOMMU);
+
+   symbol_put(vfio_external_check_extension);
+
+   return ret  0;
+}
+
 /*
  * Groups can use the same or different IOMMU domains.  If the same then
  * adding a new group may change the coherency of groups we've previously
@@ -75,13 +91,10 @@ static void kvm_vfio_update_coherency(struct kvm_device 
*dev)
mutex_lock(kv-lock);
 
list_for_each_entry(kvg, kv-group_list, node) {
-   /*
-* TODO: We need an interface to check the coherency of
-* the IOMMU domain this group is using.  For now, assume
-* it's always noncoherent.
-*/
-   noncoherent = true;
-   break;
+   if (!kvm_vfio_group_is_coherent(kvg-vfio_group)) {
+   noncoherent = true;
+   break;
+   }
}
 
if (noncoherent != kv-noncoherent) {

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] vfio: Add external user check extension interface

2014-02-17 Thread Alex Williamson
This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using
the external user interface, allowing KVM to probe IOMMU coherency.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/vfio.c  |6 ++
 include/linux/vfio.h |2 ++
 2 files changed, 8 insertions(+)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 21271d8..512f479 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -1413,6 +1413,12 @@ int vfio_external_user_iommu_id(struct vfio_group *group)
 }
 EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id);
 
+long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
+{
+   return vfio_ioctl_check_extension(group-container, arg);
+}
+EXPORT_SYMBOL_GPL(vfio_external_check_extension);
+
 /**
  * Module/class support
  */
diff --git a/include/linux/vfio.h b/include/linux/vfio.h
index 24579a0..81022a52 100644
--- a/include/linux/vfio.h
+++ b/include/linux/vfio.h
@@ -96,5 +96,7 @@ extern void vfio_unregister_iommu_driver(
 extern struct vfio_group *vfio_group_get_external_user(struct file *filep);
 extern void vfio_group_put_external_user(struct vfio_group *group);
 extern int vfio_external_user_iommu_id(struct vfio_group *group);
+extern long vfio_external_check_extension(struct vfio_group *group,
+ unsigned long arg);
 
 #endif /* VFIO_H */

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] vfio/type1: Add extension to test DMA cache coherence of IOMMU

2014-02-17 Thread Alex Williamson
Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to
be able to test whether coherency is currently enforced.  Add an
extension for this.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---
 drivers/vfio/vfio_iommu_type1.c |   21 +
 include/uapi/linux/vfio.h   |5 +
 2 files changed, 26 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 8c7bb9b..1f90344 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -867,6 +867,23 @@ static void vfio_iommu_type1_release(void *iommu_data)
kfree(iommu);
 }
 
+static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu)
+{
+   struct vfio_domain *domain;
+   int ret = 1;
+
+   mutex_lock(iommu-lock);
+   list_for_each_entry(domain, iommu-domain_list, next) {
+   if (!(domain-prot  IOMMU_CACHE)) {
+   ret = 0;
+   break;
+   }
+   }
+   mutex_unlock(iommu-lock);
+
+   return ret;
+}
+
 static long vfio_iommu_type1_ioctl(void *iommu_data,
   unsigned int cmd, unsigned long arg)
 {
@@ -878,6 +895,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
case VFIO_TYPE1_IOMMU:
case VFIO_TYPE1v2_IOMMU:
return 1;
+   case VFIO_DMA_CC_IOMMU:
+   if (!iommu)
+   return 0;
+   return vfio_domains_have_iommu_cache(iommu);
default:
return 0;
}
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 460fdf2..cb9023d 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -24,6 +24,11 @@
 #define VFIO_TYPE1_IOMMU   1
 #define VFIO_SPAPR_TCE_IOMMU   2
 #define VFIO_TYPE1v2_IOMMU 3
+/*
+ * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping).  This
+ * capability is subject to change as groups are added or removed.
+ */
+#define VFIO_DMA_CC_IOMMU  4
 
 /*
  * The IOCTL interface is designed for extensibility by embedding the

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] vfio: type1 multi-domain support kvm-vfio coherency checking

2014-02-17 Thread Alex Williamson
This series switches the vfio type1 IOMMU backend to support multiple
IOMMU domains per container.  As outlined in 1/4, this provides several
advantages, including supporting features like IOMMU_CACHE and allowing
bus_type independence.  With such support, we're able to provide an
interface to indicate the container is fully cache coherent and export
it for use by the external user interface.  This allows the kvm-vfio
pseudo device to properly test for non-coherent DMA rather than assume
it whenever a kvm-vfio device is registered.

A change introduced by 1/4 creates a new, v2 type1 IOMMU backend.  This
type differs subtly in unmap behavior as described in the patch itself.
For QEMU use of VFIO, there's no change and switching to v2 is
transparent.  I'd appreciate any feedback on whether we should simply
call the previous behavior broken or if we should do like implemented
here and support a compatible mode.  Comments welcome.  Thanks,

Alex

---

Alex Williamson (4):
  vfio/iommu_type1: Multi-IOMMU domain support
  vfio/type1: Add extension to test DMA cache coherence of IOMMU
  vfio: Add external user check extension interface
  kvm/vfio: Support for DMA coherent IOMMUs


 drivers/vfio/vfio.c |6 
 drivers/vfio/vfio_iommu_type1.c |  656 +--
 include/linux/vfio.h|2 
 include/uapi/linux/vfio.h   |6 
 virt/kvm/vfio.c |   27 +-
 5 files changed, 389 insertions(+), 308 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] vfio/iommu_type1: Multi-IOMMU domain support

2014-02-17 Thread Alex Williamson
We currently have a problem that we cannot support advanced features
of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee
that those features will be supported by all of the hardware units
involved with the domain over its lifetime.  For instance, the Intel
VT-d architecture does not require that all DRHDs support snoop
control.  If we create a domain based on a device behind a DRHD that
does support snoop control and enable SNP support via the IOMMU_CACHE
mapping option, we cannot then add a device behind a DRHD which does
not support snoop control or we'll get reserved bit faults from the
SNP bit in the pagetables.  To add to the complexity, we can't know
the properties of a domain until a device is attached.

We could pass this problem off to userspace and require that a
separate vfio container be used, but we don't know how to handle page
accounting in that case.  How do we know that a page pinned in one
container is the same page as a different container and avoid double
billing the user for the page.

The solution is therefore to support multiple IOMMU domains per
container.  In the majority of cases, only one domain will be required
since hardware is typically consistent within a system.  However, this
provides us the ability to validate compatibility of domains and
support mixed environments where page table flags can be different
between domains.

To do this, our DMA tracking needs to change.  We currently try to
coalesce user mappings into as few tracking entries as possible.  The
problem then becomes that we lose granularity of user mappings.  We've
never guaranteed that a user is able to unmap at a finer granularity
than the original mapping, but we must honor the granularity of the
original mapping.  This coalescing code is therefore removed, allowing
only unmaps covering complete maps.  The change in accounting is
fairly small here, a typical QEMU VM will start out with roughly a
dozen entries, so it's arguable if this coalescing was ever needed.

We also move IOMMU domain creation to the point where a group is
attached to the container.  An interesting side-effect of this is that
we now have access to the device at the time of domain creation and
can probe the devices within the group to determine the bus_type.
This finally makes vfio_iommu_type1 completely device/bus agnostic.
In fact, each IOMMU domain can host devices on different buses managed
by different physical IOMMUs, and present a single DMA mapping
interface to the user.  When a new domain is created, mappings are
replayed to bring the IOMMU pagetables up to the state of the current
container.  And of course, DMA mapping and unmapping automatically
traverse all of the configured IOMMU domains.

Signed-off-by: Alex Williamson alex.william...@redhat.com
Cc: Varun Sethi varun.se...@freescale.com
---
 drivers/vfio/vfio_iommu_type1.c |  637 +--
 include/uapi/linux/vfio.h   |1 
 2 files changed, 336 insertions(+), 302 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 4fb7a8f..8c7bb9b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -30,7 +30,6 @@
 #include linux/iommu.h
 #include linux/module.h
 #include linux/mm.h
-#include linux/pci.h /* pci_bus_type */
 #include linux/rbtree.h
 #include linux/sched.h
 #include linux/slab.h
@@ -55,11 +54,17 @@ MODULE_PARM_DESC(disable_hugepages,
 Disable VFIO IOMMU support for IOMMU hugepages.);
 
 struct vfio_iommu {
-   struct iommu_domain *domain;
+   struct list_headdomain_list;
struct mutexlock;
struct rb_root  dma_list;
+   bool v2;
+};
+
+struct vfio_domain {
+   struct iommu_domain *domain;
+   struct list_headnext;
struct list_headgroup_list;
-   boolcache;
+   int prot;   /* IOMMU_CACHE */
 };
 
 struct vfio_dma {
@@ -99,7 +104,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu 
*iommu,
return NULL;
 }
 
-static void vfio_insert_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
+static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new)
 {
struct rb_node **link = iommu-dma_list.rb_node, *parent = NULL;
struct vfio_dma *dma;
@@ -118,7 +123,7 @@ static void vfio_insert_dma(struct vfio_iommu *iommu, 
struct vfio_dma *new)
rb_insert_color(new-node, iommu-dma_list);
 }
 
-static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
+static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old)
 {
rb_erase(old-node, iommu-dma_list);
 }
@@ -322,32 +327,39 @@ static long vfio_unpin_pages(unsigned long pfn, long 
npage,
return unlocked;
 }
 
-static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma,
-   dma_addr_t iova, size_t *size)
+static 

Re: [RFC v2 2/4] net: enables interface option to skip IP

2014-02-17 Thread Dan Williams
On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote:
 From: Luis R. Rodriguez mcg...@suse.com
 
 Some interfaces do not need to have any IPv4 or IPv6
 addresses, so enable an option to specify this. One
 example where this is observed are virtualization
 backend interfaces which just use the net_device
 constructs to help with their respective frontends.
 
 This should optimize boot time and complexity on
 virtualization environments for each backend interface
 while also avoiding triggering SLAAC and DAD, which is
 simply pointless for these type of interfaces.

Would it not be better/cleaner to use disable_ipv6 and then add a
disable_ipv4 sysctl, then use those with that interface?  The
IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is
already doing.

Dan

 Cc: David S. Miller da...@davemloft.net
 cC: Alexey Kuznetsov kuz...@ms2.inr.ac.ru
 Cc: James Morris jmor...@namei.org
 Cc: Hideaki YOSHIFUJI yoshf...@linux-ipv6.org
 Cc: Patrick McHardy ka...@trash.net
 Cc: net...@vger.kernel.org
 Cc: linux-ker...@vger.kernel.org
 Signed-off-by: Luis R. Rodriguez mcg...@suse.com
 ---
  include/uapi/linux/if.h | 1 +
  net/ipv4/devinet.c  | 3 +++
  net/ipv6/addrconf.c | 6 ++
  3 files changed, 10 insertions(+)
 
 diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h
 index 8d10382..566d856 100644
 --- a/include/uapi/linux/if.h
 +++ b/include/uapi/linux/if.h
 @@ -85,6 +85,7 @@
* change when it's running */
  #define IFF_MACVLAN 0x20 /* Macvlan device */
  #define IFF_BRIDGE_NON_ROOT 0x40/* Don't consider for root bridge */
 +#define IFF_SKIP_IP  0x80/* Skip IPv4, IPv6 */
  
 
  #define IF_GET_IFACE 0x0001  /* for querying only */
 diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c
 index a1b5bcb..8e9ef07 100644
 --- a/net/ipv4/devinet.c
 +++ b/net/ipv4/devinet.c
 @@ -1342,6 +1342,9 @@ static int inetdev_event(struct notifier_block *this, 
 unsigned long event,
  
   ASSERT_RTNL();
  
 + if (dev-priv_flags  IFF_SKIP_IP)
 + goto out;
 +
   if (!in_dev) {
   if (event == NETDEV_REGISTER) {
   in_dev = inetdev_init(dev);
 diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c
 index 4b6b720..57f58e3 100644
 --- a/net/ipv6/addrconf.c
 +++ b/net/ipv6/addrconf.c
 @@ -314,6 +314,9 @@ static struct inet6_dev *ipv6_add_dev(struct net_device 
 *dev)
  
   ASSERT_RTNL();
  
 + if (dev-priv_flags  IFF_SKIP_IP)
 + return NULL;
 +
   if (dev-mtu  IPV6_MIN_MTU)
   return NULL;
  
 @@ -2749,6 +2752,9 @@ static int addrconf_notify(struct notifier_block *this, 
 unsigned long event,
   int run_pending = 0;
   int err;
  
 + if (dev-priv_flags  IFF_SKIP_IP)
 + return NOTIFY_OK;
 +
   switch (event) {
   case NETDEV_REGISTER:
   if (!idev  dev-mtu = IPV6_MIN_MTU) {


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RFC: ioapic polarity vs. qemu os-x guest

2014-02-17 Thread Gabriel L. Somlo
On Mon, Feb 17, 2014 at 02:38:09PM -0500, Gabriel L. Somlo wrote:
 Oh, I think I'm starting to comprehend the problem here. The bits of
 *irq_state are indexed by irq_source_id, which is dynamically
 assigned by kvm_request_irq_source_id().
 
 So, doing the OR thing when assuming always-active-high makes
 sense. Doing AND based on an active-low assumption doesn't make
 sense, because there could ALWAYS be 0 bits that just weren't
 allocated (yet), and I'm having trouble imagining how I'd keep
 track of where the current allocation boundary is in a sane way :)

Hmm, I thought maybe I could use kvm-arch.irq_sources_bitmap, but
that's global across the whole VM, whereas irq_state belongs to
one given GSI. So, the per-GSI bitmap is sparse, so it's at least
as bad as I thought earlier, if not worse :)

Am I missing anything that would put this in a better light ?

Thanks,
--Gabriel
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-17 Thread Zhanghaoyu (A)
Hi, all

The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
attaching a pass-through PCI card to the non-pass-through VM for the first 
time. 
The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, 
which needs a lot of time, and during this time, the qemu_global_mutex
lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be 
blocked to waiting main-thread to release the qemu_global_mutex lock,
so the VM got stuck.
The race between qemu-main-thread and vcpu-thread is shown as below,

  QEMU-main-threadvcpu-thread   

 | |
  qemu_mutex_lock_iothread 
qemu_mutex_lock(qemu_global_mutex)
 | |
+loop- -+   +loop+
   
||   | |
|  qemu_mutex_unlock_iothread| 
qemu_mutex_unlock_iothread 
||   | |
  
|   poll |
kvm_vcpu_ioctl(KVM_RUN) 
||   | |
  
| qemu_mutex_lock_iothread   | |
||   | | 
 
--
||   |  qemu_mutex_lock_iothread
|   kvm_device_pci_assign| |
  
||   |   blocked to waiting 
main-thread to release the qemu lock
|  about 6 sec for 20GB memory   | |
  
||   | |
 
++   +-+
  


Any advises?

Thanks,
Zhang Haoyu

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time

2014-02-17 Thread Zhanghaoyu (A)
Hi, all

The VM will get stuck for a while(about 6s for a VM with 20GB memory) when 
attaching a pass-through PCI card to the non-pass-through VM for the first 
time. 
The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, 
which needs a lot of time, and during this time, the qemu_global_mutex
lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be 
blocked to waiting main-thread to release the qemu_global_mutex lock,
so the VM got stuck.
The race between qemu-main-thread and vcpu-thread is shown as below,

  QEMU-main-threadvcpu-thread   

 | |
  qemu_mutex_lock_iothread 
qemu_mutex_lock(qemu_global_mutex)
 | |
+loop- -+   +loop+
   
||   | |
|  qemu_mutex_unlock_iothread| 
qemu_mutex_unlock_iothread 
||   | |
  
|   poll |
kvm_vcpu_ioctl(KVM_RUN) 
||   | |
  
| qemu_mutex_lock_iothread   | |
||   | | 
 
--
||   |  qemu_mutex_lock_iothread
|   kvm_device_pci_assign| |
  
||   |   blocked to waiting 
main-thread to release the qemu lock
|  about 6 sec for 20GB memory   | |
  
||   | |
 
++   +-+
  


Any advises?

Thanks,
Zhang Haoyu
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html