Re: Using block device instead of character device for virtio-serial
On Sat, Feb 15, 2014 at 10:30:13AM +0530, Jobin Raju George wrote: On Fri, Feb 14, 2014 at 4:05 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Sun, Feb 09, 2014 at 11:39:19PM +0530, Jobin Raju George wrote: On Sun, Feb 9, 2014 at 2:42 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Thu, Feb 06, 2014 at 12:22:36PM +0530, Jobin Raju George wrote: I am trying to establish a communication mechanism between the guest and its host using virtio-serial. For this I am using the following to boot the VM: qemu-system-x86_64 -m 1024 \ -name ubuntu_vm \ -hda ubuntu \ -device virtio-serial \ -chardev socket,path=/tmp/virt_socket,server,nowait,id=virt_socket \ -device virtconsole,name=v_soc,chardev=virt_socket,name=ubuntu_vm_soc This creates a character device on the guest machine and a UNIX socket on the host machine. 1) Is there a way I can create sockets on the host as well as the guest? The syntax is documented on the QEMU man page. Try: -chardev socket,port=1234,server,nowait,id=virt_socket I did not try this out, but would this create a socket instead of a character device(/dev/hvc0) on the guest? Things should be unchanged inside the guest. This just creates a TCP socket on the host. My main concern is creating a socket in the guest. The virtio-serial Linux guest driver is a character device, not a socket. So if a socket is really necessary you'd need a small program to forward data between a socket and the character device (like netcat/socat). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 69361] Host call trace and guest hang after create guest.
https://bugzilla.kernel.org/show_bug.cgi?id=69361 robert...@intel.com changed: What|Removed |Added CC||robert...@intel.com --- Comment #8 from robert...@intel.com --- Is this patch in upstream now? any update? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v2 0/4] net: bridge / ip optimizations for virtual net backends
On 15/02/14 02:59, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com This v2 series changes the approach from my original virtualization multicast patch series [0] by abandoning completely the multicast issues and instead generalizing an approach for virtualization backends. There are two things in common with virtualization backends: 0) they should not become the root bridge 1) they don't need ipv4 / ipv6 interfaces Why? There's no real difference between a backend network device and a physical device (from the point of view of the backend domain). I do not think these are intrinsic properties of backend devices. I can see these being useful knobs for administrators (or management toolstacks) to turn on, on a per-device basis. David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v2 3/4] xen-netback: use a random MAC address
On 15/02/14 02:59, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com The purpose of using a static MAC address of FE:FF:FF:FF:FF:FF was to prevent our backend interfaces from being used by the bridge and nominating our interface as a root bridge. This was possible given that the bridge code will use the lowest MAC address for a port once a new interface gets added to the bridge. The bridge code has a generic feature now to allow interfaces to opt out from root bridge nominations, use that instead. [...] --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -42,6 +42,8 @@ #define XENVIF_QUEUE_LENGTH 32 #define XENVIF_NAPI_WEIGHT 64 +static const u8 xen_oui[3] = { 0x00, 0x16, 0x3e }; You shouldn't use a vendor prefix with a random MAC address. You should set the locally administered bit and clear the multicast/unicast bit and randomize the remaining 46 bits. (If existing VIF scripts are doing something similar, they also need to be fixed.) David -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)
On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote: Hi Please, send any topic that you are interested in covering. * Should we change anything to get more people to sign for the call? There hasn't been a call in quite a long time. Ideas? Thanks, Juan. Call details: 09:00 AM to 10:00 AM EDT Every two weeks If you need phone number details, contact me privately How about discussing plans for x2apic? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)
On 17 February 2014 14:19, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote: Hi Please, send any topic that you are interested in covering. * Should we change anything to get more people to sign for the call? There hasn't been a call in quite a long time. Ideas? Thanks, Juan. Call details: 09:00 AM to 10:00 AM EDT Every two weeks If you need phone number details, contact me privately How about discussing plans for x2apic? We were going to discuss the release timetable/plans this week as well, weren't we? thanks -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2014-02-17 (was Re: KVM call agenda for 2014-02-04)
On Mon, Feb 17, 2014 at 02:17:21PM +, Peter Maydell wrote: On 17 February 2014 14:19, Michael S. Tsirkin m...@redhat.com wrote: On Mon, Feb 03, 2014 at 01:57:02PM +0100, Juan Quintela wrote: Hi Please, send any topic that you are interested in covering. * Should we change anything to get more people to sign for the call? There hasn't been a call in quite a long time. Ideas? Thanks, Juan. Call details: 09:00 AM to 10:00 AM EDT Every two weeks If you need phone number details, contact me privately How about discussing plans for x2apic? We were going to discuss the release timetable/plans this week as well, weren't we? thanks -- PMM I haven't seen the agenda on list - did I miss it? But it sure makes sense. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v2 4/4] xen-netback: skip IPv4 and IPv6 interfaces
There is a valid scenario to put IP addresses on the backend VIFs: http://wiki.xen.org/wiki/Xen_Networking#Routing Also, the backend is not necessarily Dom0, you can connect twou guests with backend/frontend pairs. Zoli On 15/02/14 02:59, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com The xen-netback driver is used only to provide a backend interface for the frontend. The link is the only thing we use, and that is used internally for letting us know when the xen-netfront is ready, when it switches to XenbusStateConnected. Note that only when the both the xen-netfront and xen-netback are both in state XenbusStateConnected will xen-netback allow userspace on the host (backend) to bring up the interface. Enabling and disabling the interface will simply enable or disable NAPI respectively, and that's used for IRQ communication set up with the xen event channels. Cc: Paul Durrant paul.durr...@citrix.com Cc: Ian Campbell ian.campb...@citrix.com Cc: Wei Liu wei.l...@citrix.com Cc: xen-de...@lists.xenproject.org Cc: net...@vger.kernel.org Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- drivers/net/xen-netback/interface.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index d380e3f..07e6fd2 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -351,7 +351,7 @@ struct xenvif *xenvif_alloc(struct device *parent, domid_t domid, eth_hw_addr_random(dev); memcpy(dev-dev_addr, xen_oui, 3); - dev-priv_flags |= IFF_BRIDGE_NON_ROOT; + dev-priv_flags |= IFF_BRIDGE_NON_ROOT | IFF_SKIP_IP; netif_napi_add(dev, vif-napi, xenvif_poll, XENVIF_NAPI_WEIGHT); netif_carrier_off(dev); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Xen-devel] [RFC v2 1/4] bridge: enable interfaces to opt out from becoming the root bridge
On 15/02/14 02:59, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com It doesn't make sense for some interfaces to become a root bridge at any point in time. One example is virtual backend interfaces which rely on other entities on the bridge for actual physical connectivity. They only provide virtual access. It is possible that a guest bridge together to VIF, either from the same Dom0 bridge or from different ones. In that case using STP on VIFs sound sensible to me. Zoli -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: ioapic polarity vs. qemu os-x guest
On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote: On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote: Well there is a bigger issue: any interrupt with multiple sources is broken. __kvm_irq_line_state does a logical OR of all sources, before XOR with polarity. This makes no sense if polarity is active low. So, do you think something like this would make sense, to address active-low polarity in __kvm_irq_line_state ? (this would be independent of the subsequent xor in kvm_ioapic_set_irq()): Make that rather: -static inline int __kvm_irq_line_state(unsigned long *irq_state, +static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity, int irq_source_id, int level) { -/* Logical OR for level trig interrupt */ if (level) __set_bit(irq_source_id, irq_state); else __clear_bit(irq_source_id, irq_state); - return !!(*irq_state); + if (polarity) { + /* Logical AND for level trig interrupt, active-low */ + return !~(*irq_state); + } else { + /* Logical OR for level trig interrupt, active-high */ + return !!(*irq_state); + } } Thanks, and sorry for the noise :) --Gabriel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: ioapic polarity vs. qemu os-x guest
On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote: Well there is a bigger issue: any interrupt with multiple sources is broken. __kvm_irq_line_state does a logical OR of all sources, before XOR with polarity. This makes no sense if polarity is active low. So, do you think something like this would make sense, to address active-low polarity in __kvm_irq_line_state ? (this would be independent of the subsequent xor in kvm_ioapic_set_irq()): -static inline int __kvm_irq_line_state(unsigned long *irq_state, +static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity, int irq_source_id, int level) { -/* Logical OR for level trig interrupt */ if (level) __set_bit(irq_source_id, irq_state); else __clear_bit(irq_source_id, irq_state); - return !!(*irq_state); + if (polarity) { + /* Logical OR for level trig interrupt, active-high */ + return !!(*irq_state); + } else { // active-low + /* Logical AND for level trig interrupt, active-low */ + return !~(*irq_state); + } } Thanks, --Gabriel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: ioapic polarity vs. qemu os-x guest
Il 17/02/2014 19:01, Gabriel L. Somlo ha scritto: On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote: On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote: Well there is a bigger issue: any interrupt with multiple sources is broken. __kvm_irq_line_state does a logical OR of all sources, before XOR with polarity. This makes no sense if polarity is active low. So, do you think something like this would make sense, to address active-low polarity in __kvm_irq_line_state ? (this would be independent of the subsequent xor in kvm_ioapic_set_irq()): Make that rather: -static inline int __kvm_irq_line_state(unsigned long *irq_state, +static inline int __kvm_irq_line_state(unsigned long *irq_state, int polarity, int irq_source_id, int level) { -/* Logical OR for level trig interrupt */ if (level) __set_bit(irq_source_id, irq_state); else __clear_bit(irq_source_id, irq_state); - return !!(*irq_state); + if (polarity) { + /* Logical AND for level trig interrupt, active-low */ + return !~(*irq_state); This is ~*irq_state == 0, i.e. *irq_state == ~0. What if high-order bits of *irq_state are never used? That is, do you need to consider the maximum valid irq_source_id too? + } else { + /* Logical OR for level trig interrupt, active-high */ + return !!(*irq_state); Better rewrite this as *irq_state != 0. Paolo + } } Thanks, and sorry for the noise :) --Gabriel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: ioapic polarity vs. qemu os-x guest
On Mon, Feb 17, 2014 at 07:06:11PM +0100, Paolo Bonzini wrote: Il 17/02/2014 19:01, Gabriel L. Somlo ha scritto: On Mon, Feb 17, 2014 at 12:57:00PM -0500, Gabriel L. Somlo wrote: On Sun, Feb 16, 2014 at 06:23:11PM +0200, Michael S. Tsirkin wrote: Well there is a bigger issue: any interrupt with multiple sources is broken. __kvm_irq_line_state does a logical OR of all sources, before XOR with polarity. This makes no sense if polarity is active low. So, do you think something like this would make sense, to address active-low polarity in __kvm_irq_line_state ? (this would be independent of the subsequent xor in kvm_ioapic_set_irq()): -return !!(*irq_state); +if (polarity) { +/* Logical AND for level trig interrupt, active-low */ +return !~(*irq_state); This is ~*irq_state == 0, i.e. *irq_state == ~0. What if high-order bits of *irq_state are never used? That is, do you need to consider the maximum valid irq_source_id too? Oh, I think I'm starting to comprehend the problem here. The bits of *irq_state are indexed by irq_source_id, which is dynamically assigned by kvm_request_irq_source_id(). So, doing the OR thing when assuming always-active-high makes sense. Doing AND based on an active-low assumption doesn't make sense, because there could ALWAYS be 0 bits that just weren't allocated (yet), and I'm having trouble imagining how I'd keep track of where the current allocation boundary is in a sane way :) Which I *think* was Michael's original point... --Gabriel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] kvm/vfio: Support for DMA coherent IOMMUs
VFIO now has support for using the IOMMU_CACHE flag and a mechanism for an external user to test the current operating mode of the IOMMU. Add support for this to the kvm-vfio pseudo device so that we only register noncoherent DMA when necessary. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: Gleb Natapov g...@kernel.org Cc: Paolo Bonzini pbonz...@redhat.com --- virt/kvm/vfio.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/virt/kvm/vfio.c b/virt/kvm/vfio.c index b4f9507..ba1a93f 100644 --- a/virt/kvm/vfio.c +++ b/virt/kvm/vfio.c @@ -59,6 +59,22 @@ static void kvm_vfio_group_put_external_user(struct vfio_group *vfio_group) symbol_put(vfio_group_put_external_user); } +static bool kvm_vfio_group_is_coherent(struct vfio_group *vfio_group) +{ + long (*fn)(struct vfio_group *, unsigned long); + long ret; + + fn = symbol_get(vfio_external_check_extension); + if (!fn) + return false; + + ret = fn(vfio_group, VFIO_DMA_CC_IOMMU); + + symbol_put(vfio_external_check_extension); + + return ret 0; +} + /* * Groups can use the same or different IOMMU domains. If the same then * adding a new group may change the coherency of groups we've previously @@ -75,13 +91,10 @@ static void kvm_vfio_update_coherency(struct kvm_device *dev) mutex_lock(kv-lock); list_for_each_entry(kvg, kv-group_list, node) { - /* -* TODO: We need an interface to check the coherency of -* the IOMMU domain this group is using. For now, assume -* it's always noncoherent. -*/ - noncoherent = true; - break; + if (!kvm_vfio_group_is_coherent(kvg-vfio_group)) { + noncoherent = true; + break; + } } if (noncoherent != kv-noncoherent) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] vfio: Add external user check extension interface
This lets us check extensions, particularly VFIO_DMA_CC_IOMMU using the external user interface, allowing KVM to probe IOMMU coherency. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio.c |6 ++ include/linux/vfio.h |2 ++ 2 files changed, 8 insertions(+) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 21271d8..512f479 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -1413,6 +1413,12 @@ int vfio_external_user_iommu_id(struct vfio_group *group) } EXPORT_SYMBOL_GPL(vfio_external_user_iommu_id); +long vfio_external_check_extension(struct vfio_group *group, unsigned long arg) +{ + return vfio_ioctl_check_extension(group-container, arg); +} +EXPORT_SYMBOL_GPL(vfio_external_check_extension); + /** * Module/class support */ diff --git a/include/linux/vfio.h b/include/linux/vfio.h index 24579a0..81022a52 100644 --- a/include/linux/vfio.h +++ b/include/linux/vfio.h @@ -96,5 +96,7 @@ extern void vfio_unregister_iommu_driver( extern struct vfio_group *vfio_group_get_external_user(struct file *filep); extern void vfio_group_put_external_user(struct vfio_group *group); extern int vfio_external_user_iommu_id(struct vfio_group *group); +extern long vfio_external_check_extension(struct vfio_group *group, + unsigned long arg); #endif /* VFIO_H */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] vfio/type1: Add extension to test DMA cache coherence of IOMMU
Now that the type1 IOMMU backend can support IOMMU_CACHE, we need to be able to test whether coherency is currently enforced. Add an extension for this. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio_iommu_type1.c | 21 + include/uapi/linux/vfio.h |5 + 2 files changed, 26 insertions(+) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 8c7bb9b..1f90344 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -867,6 +867,23 @@ static void vfio_iommu_type1_release(void *iommu_data) kfree(iommu); } +static int vfio_domains_have_iommu_cache(struct vfio_iommu *iommu) +{ + struct vfio_domain *domain; + int ret = 1; + + mutex_lock(iommu-lock); + list_for_each_entry(domain, iommu-domain_list, next) { + if (!(domain-prot IOMMU_CACHE)) { + ret = 0; + break; + } + } + mutex_unlock(iommu-lock); + + return ret; +} + static long vfio_iommu_type1_ioctl(void *iommu_data, unsigned int cmd, unsigned long arg) { @@ -878,6 +895,10 @@ static long vfio_iommu_type1_ioctl(void *iommu_data, case VFIO_TYPE1_IOMMU: case VFIO_TYPE1v2_IOMMU: return 1; + case VFIO_DMA_CC_IOMMU: + if (!iommu) + return 0; + return vfio_domains_have_iommu_cache(iommu); default: return 0; } diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h index 460fdf2..cb9023d 100644 --- a/include/uapi/linux/vfio.h +++ b/include/uapi/linux/vfio.h @@ -24,6 +24,11 @@ #define VFIO_TYPE1_IOMMU 1 #define VFIO_SPAPR_TCE_IOMMU 2 #define VFIO_TYPE1v2_IOMMU 3 +/* + * IOMMU enforces DMA cache coherence (ex. PCIe NoSnoop stripping). This + * capability is subject to change as groups are added or removed. + */ +#define VFIO_DMA_CC_IOMMU 4 /* * The IOCTL interface is designed for extensibility by embedding the -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] vfio: type1 multi-domain support kvm-vfio coherency checking
This series switches the vfio type1 IOMMU backend to support multiple IOMMU domains per container. As outlined in 1/4, this provides several advantages, including supporting features like IOMMU_CACHE and allowing bus_type independence. With such support, we're able to provide an interface to indicate the container is fully cache coherent and export it for use by the external user interface. This allows the kvm-vfio pseudo device to properly test for non-coherent DMA rather than assume it whenever a kvm-vfio device is registered. A change introduced by 1/4 creates a new, v2 type1 IOMMU backend. This type differs subtly in unmap behavior as described in the patch itself. For QEMU use of VFIO, there's no change and switching to v2 is transparent. I'd appreciate any feedback on whether we should simply call the previous behavior broken or if we should do like implemented here and support a compatible mode. Comments welcome. Thanks, Alex --- Alex Williamson (4): vfio/iommu_type1: Multi-IOMMU domain support vfio/type1: Add extension to test DMA cache coherence of IOMMU vfio: Add external user check extension interface kvm/vfio: Support for DMA coherent IOMMUs drivers/vfio/vfio.c |6 drivers/vfio/vfio_iommu_type1.c | 656 +-- include/linux/vfio.h|2 include/uapi/linux/vfio.h |6 virt/kvm/vfio.c | 27 +- 5 files changed, 389 insertions(+), 308 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] vfio/iommu_type1: Multi-IOMMU domain support
We currently have a problem that we cannot support advanced features of an IOMMU domain (ex. IOMMU_CACHE), because we have no guarantee that those features will be supported by all of the hardware units involved with the domain over its lifetime. For instance, the Intel VT-d architecture does not require that all DRHDs support snoop control. If we create a domain based on a device behind a DRHD that does support snoop control and enable SNP support via the IOMMU_CACHE mapping option, we cannot then add a device behind a DRHD which does not support snoop control or we'll get reserved bit faults from the SNP bit in the pagetables. To add to the complexity, we can't know the properties of a domain until a device is attached. We could pass this problem off to userspace and require that a separate vfio container be used, but we don't know how to handle page accounting in that case. How do we know that a page pinned in one container is the same page as a different container and avoid double billing the user for the page. The solution is therefore to support multiple IOMMU domains per container. In the majority of cases, only one domain will be required since hardware is typically consistent within a system. However, this provides us the ability to validate compatibility of domains and support mixed environments where page table flags can be different between domains. To do this, our DMA tracking needs to change. We currently try to coalesce user mappings into as few tracking entries as possible. The problem then becomes that we lose granularity of user mappings. We've never guaranteed that a user is able to unmap at a finer granularity than the original mapping, but we must honor the granularity of the original mapping. This coalescing code is therefore removed, allowing only unmaps covering complete maps. The change in accounting is fairly small here, a typical QEMU VM will start out with roughly a dozen entries, so it's arguable if this coalescing was ever needed. We also move IOMMU domain creation to the point where a group is attached to the container. An interesting side-effect of this is that we now have access to the device at the time of domain creation and can probe the devices within the group to determine the bus_type. This finally makes vfio_iommu_type1 completely device/bus agnostic. In fact, each IOMMU domain can host devices on different buses managed by different physical IOMMUs, and present a single DMA mapping interface to the user. When a new domain is created, mappings are replayed to bring the IOMMU pagetables up to the state of the current container. And of course, DMA mapping and unmapping automatically traverse all of the configured IOMMU domains. Signed-off-by: Alex Williamson alex.william...@redhat.com Cc: Varun Sethi varun.se...@freescale.com --- drivers/vfio/vfio_iommu_type1.c | 637 +-- include/uapi/linux/vfio.h |1 2 files changed, 336 insertions(+), 302 deletions(-) diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index 4fb7a8f..8c7bb9b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -30,7 +30,6 @@ #include linux/iommu.h #include linux/module.h #include linux/mm.h -#include linux/pci.h /* pci_bus_type */ #include linux/rbtree.h #include linux/sched.h #include linux/slab.h @@ -55,11 +54,17 @@ MODULE_PARM_DESC(disable_hugepages, Disable VFIO IOMMU support for IOMMU hugepages.); struct vfio_iommu { - struct iommu_domain *domain; + struct list_headdomain_list; struct mutexlock; struct rb_root dma_list; + bool v2; +}; + +struct vfio_domain { + struct iommu_domain *domain; + struct list_headnext; struct list_headgroup_list; - boolcache; + int prot; /* IOMMU_CACHE */ }; struct vfio_dma { @@ -99,7 +104,7 @@ static struct vfio_dma *vfio_find_dma(struct vfio_iommu *iommu, return NULL; } -static void vfio_insert_dma(struct vfio_iommu *iommu, struct vfio_dma *new) +static void vfio_link_dma(struct vfio_iommu *iommu, struct vfio_dma *new) { struct rb_node **link = iommu-dma_list.rb_node, *parent = NULL; struct vfio_dma *dma; @@ -118,7 +123,7 @@ static void vfio_insert_dma(struct vfio_iommu *iommu, struct vfio_dma *new) rb_insert_color(new-node, iommu-dma_list); } -static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *old) +static void vfio_unlink_dma(struct vfio_iommu *iommu, struct vfio_dma *old) { rb_erase(old-node, iommu-dma_list); } @@ -322,32 +327,39 @@ static long vfio_unpin_pages(unsigned long pfn, long npage, return unlocked; } -static int vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, - dma_addr_t iova, size_t *size) +static
Re: [RFC v2 2/4] net: enables interface option to skip IP
On Fri, 2014-02-14 at 18:59 -0800, Luis R. Rodriguez wrote: From: Luis R. Rodriguez mcg...@suse.com Some interfaces do not need to have any IPv4 or IPv6 addresses, so enable an option to specify this. One example where this is observed are virtualization backend interfaces which just use the net_device constructs to help with their respective frontends. This should optimize boot time and complexity on virtualization environments for each backend interface while also avoiding triggering SLAAC and DAD, which is simply pointless for these type of interfaces. Would it not be better/cleaner to use disable_ipv6 and then add a disable_ipv4 sysctl, then use those with that interface? The IFF_SKIP_IP seems to duplicate at least part of what disable_ipv6 is already doing. Dan Cc: David S. Miller da...@davemloft.net cC: Alexey Kuznetsov kuz...@ms2.inr.ac.ru Cc: James Morris jmor...@namei.org Cc: Hideaki YOSHIFUJI yoshf...@linux-ipv6.org Cc: Patrick McHardy ka...@trash.net Cc: net...@vger.kernel.org Cc: linux-ker...@vger.kernel.org Signed-off-by: Luis R. Rodriguez mcg...@suse.com --- include/uapi/linux/if.h | 1 + net/ipv4/devinet.c | 3 +++ net/ipv6/addrconf.c | 6 ++ 3 files changed, 10 insertions(+) diff --git a/include/uapi/linux/if.h b/include/uapi/linux/if.h index 8d10382..566d856 100644 --- a/include/uapi/linux/if.h +++ b/include/uapi/linux/if.h @@ -85,6 +85,7 @@ * change when it's running */ #define IFF_MACVLAN 0x20 /* Macvlan device */ #define IFF_BRIDGE_NON_ROOT 0x40/* Don't consider for root bridge */ +#define IFF_SKIP_IP 0x80/* Skip IPv4, IPv6 */ #define IF_GET_IFACE 0x0001 /* for querying only */ diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index a1b5bcb..8e9ef07 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -1342,6 +1342,9 @@ static int inetdev_event(struct notifier_block *this, unsigned long event, ASSERT_RTNL(); + if (dev-priv_flags IFF_SKIP_IP) + goto out; + if (!in_dev) { if (event == NETDEV_REGISTER) { in_dev = inetdev_init(dev); diff --git a/net/ipv6/addrconf.c b/net/ipv6/addrconf.c index 4b6b720..57f58e3 100644 --- a/net/ipv6/addrconf.c +++ b/net/ipv6/addrconf.c @@ -314,6 +314,9 @@ static struct inet6_dev *ipv6_add_dev(struct net_device *dev) ASSERT_RTNL(); + if (dev-priv_flags IFF_SKIP_IP) + return NULL; + if (dev-mtu IPV6_MIN_MTU) return NULL; @@ -2749,6 +2752,9 @@ static int addrconf_notify(struct notifier_block *this, unsigned long event, int run_pending = 0; int err; + if (dev-priv_flags IFF_SKIP_IP) + return NOTIFY_OK; + switch (event) { case NETDEV_REGISTER: if (!idev dev-mtu = IPV6_MIN_MTU) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: RFC: ioapic polarity vs. qemu os-x guest
On Mon, Feb 17, 2014 at 02:38:09PM -0500, Gabriel L. Somlo wrote: Oh, I think I'm starting to comprehend the problem here. The bits of *irq_state are indexed by irq_source_id, which is dynamically assigned by kvm_request_irq_source_id(). So, doing the OR thing when assuming always-active-high makes sense. Doing AND based on an active-low assumption doesn't make sense, because there could ALWAYS be 0 bits that just weren't allocated (yet), and I'm having trouble imagining how I'd keep track of where the current allocation boundary is in a sane way :) Hmm, I thought maybe I could use kvm-arch.irq_sources_bitmap, but that's global across the whole VM, whereas irq_state belongs to one given GSI. So, the per-GSI bitmap is sparse, so it's at least as bad as I thought earlier, if not worse :) Am I missing anything that would put this in a better light ? Thanks, --Gabriel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
Hi, all The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock, so the VM got stuck. The race between qemu-main-thread and vcpu-thread is shown as below, QEMU-main-threadvcpu-thread | | qemu_mutex_lock_iothread qemu_mutex_lock(qemu_global_mutex) | | +loop- -+ +loop+ || | | | qemu_mutex_unlock_iothread| qemu_mutex_unlock_iothread || | | | poll | kvm_vcpu_ioctl(KVM_RUN) || | | | qemu_mutex_lock_iothread | | || | | -- || | qemu_mutex_lock_iothread | kvm_device_pci_assign| | || | blocked to waiting main-thread to release the qemu lock | about 6 sec for 20GB memory | | || | | ++ +-+ Any advises? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
hotplug: VM got stuck when attaching a pass-through device to the non-pass-through VM for the first time
Hi, all The VM will get stuck for a while(about 6s for a VM with 20GB memory) when attaching a pass-through PCI card to the non-pass-through VM for the first time. The reason is that the host will build the whole VT-d GPA-HPA DMAR page-table, which needs a lot of time, and during this time, the qemu_global_mutex lock is hold by the main-thread, if the vcpu thread IOCTL return, it will be blocked to waiting main-thread to release the qemu_global_mutex lock, so the VM got stuck. The race between qemu-main-thread and vcpu-thread is shown as below, QEMU-main-threadvcpu-thread | | qemu_mutex_lock_iothread qemu_mutex_lock(qemu_global_mutex) | | +loop- -+ +loop+ || | | | qemu_mutex_unlock_iothread| qemu_mutex_unlock_iothread || | | | poll | kvm_vcpu_ioctl(KVM_RUN) || | | | qemu_mutex_lock_iothread | | || | | -- || | qemu_mutex_lock_iothread | kvm_device_pci_assign| | || | blocked to waiting main-thread to release the qemu lock | about 6 sec for 20GB memory | | || | | ++ +-+ Any advises? Thanks, Zhang Haoyu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html