Re: [PATCH] [RFC] Add support for a USB audio device model
On Fri, Sep 10, 2010 at 10:47 PM, H. Peter Anvin h...@linux.intel.com wrote: diff --git a/hw/usb-audio.c b/hw/usb-audio.c new file mode 100644 index 000..d4cf488 --- /dev/null +++ b/hw/usb-audio.c @@ -0,0 +1,702 @@ +/* + * QEMU USB Net devices + * + * Copyright (c) 2006 Thomas Sailer + * Copyright (c) 2008 Andrzej Zaborowski Want to update this for usb-audio? Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
Playing with rlimit on data path, transparently to the application in this way looks strange to me, I suspect this has unexpected security implications. Further, applications may have other uses for locked memory besides mpassthru - you should not just take it because it's there. Can we have an ioctl that lets userspace configure how much memory to lock? This ioctl will decrement the rlimit and store the data in the device structure so we can do accounting internally. Put it back on close or on another ioctl. Yes, we can decrement the rlimit in ioctl in one time to avoid data path. Need to be careful for when this operation gets called again with 0 or another small value while we have locked memory - maybe just fail with EBUSY? or wait until it gets unlocked? Maybe 0 can be special-cased and deactivate zero-copy?. How about we don't use a new ioctl, but just check the rlimit in one MPASSTHRU_BINDDEV ioctl? If we find mp device break the rlimit, then we fail the bind ioctl, and thus can't do zero copy any more. In fact, if we choose RLIMIT_MEMLOCK to limit the lock memory, the default value is only 16 pages. It's too small to make the device to work. So we always to configure it with a large value. I think, if rlimit value after decrement is 0, then deactivate zero-copy is better. 0 maybe ok. + + if (ctor-lock_pages + count lock_limit npages) { + printk(KERN_INFO exceed the locked memory rlimit.); + return NULL; + } + + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL); You seem to fill in all memory, why zalloc? this is data path ... Ok, Let me check this. + + if (!info) + return NULL; + + for (i = j = 0; i count; i++) { + base = (unsigned long)iov[i].iov_base; + len = iov[i].iov_len; + + if (!len) + continue; + n = ((base ~PAGE_MASK) + len + ~PAGE_MASK) PAGE_SHIFT; + + rc = get_user_pages_fast(base, n, npages ? 1 : 0, npages controls whether this is a write? Why? We use npages as a flag here. In mp_sendmsg(), we called alloc_page_info() with npages = 0. + info-pages[j]); + if (rc != n) + goto failed; + + while (n--) { + frags[j].offset = base ~PAGE_MASK; + frags[j].size = min_t(int, len, + PAGE_SIZE - frags[j].offset); + len -= frags[j].size; + base += frags[j].size; + j++; + } + } + +#ifdef CONFIG_HIGHMEM + if (npages !(dev-features NETIF_F_HIGHDMA)) { + for (i = 0; i j; i++) { + if (PageHighMem(info-pages[i])) + goto failed; + } + } +#endif Are non-highdma devices worth bothering with? If yes - are there other limitations devices might have that we need to handle? E.g. what about non-s/g devices or no checksum offloading?. Basically I think there is no limitations for both, but let me have a check. + skb_push(skb, ETH_HLEN); + + if (skb_is_gso(skb)) { + hdr.hdr.hdr_len = skb_headlen(skb); + hdr.hdr.gso_size = shinfo-gso_size; + if (shinfo-gso_type SKB_GSO_TCPV4) + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV4; + else if (shinfo-gso_type SKB_GSO_TCPV6) + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_TCPV6; + else if (shinfo-gso_type SKB_GSO_UDP) + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_UDP; + else + BUG(); + if (shinfo-gso_type SKB_GSO_TCP_ECN) + hdr.hdr.gso_type |= VIRTIO_NET_HDR_GSO_ECN; + + } else + hdr.hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE; + + if (skb-ip_summed == CHECKSUM_PARTIAL) { + hdr.hdr.flags = VIRTIO_NET_HDR_F_NEEDS_CSUM; + hdr.hdr.csum_start = + skb-csum_start - skb_headroom(skb); + hdr.hdr.csum_offset = skb-csum_offset; + } We have this code in tun, macvtap and packet socket already. Could this be a good time to move these into networking core? I'm not asking you to do this right now, but could this generic virtio-net to skb stuff be encapsulated in functions? It seems reasonable. -- MST -- To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 10/16] Add a hook to intercept external buffers from NIC driver.
From: Xin Xiaohui xiaohui@intel.com The hook is called in netif_receive_skb(). Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- net/core/dev.c | 35 +++ 1 files changed, 35 insertions(+), 0 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 636f11b..4b379b1 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2517,6 +2517,37 @@ err: EXPORT_SYMBOL(netdev_mp_port_prep); #endif +#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE) +/* Add a hook to intercept mediate passthru(zero-copy) packets, + * and insert it to the socket queue owned by mp_port specially. + */ +static inline struct sk_buff *handle_mpassthru(struct sk_buff *skb, + struct packet_type **pt_prev, + int *ret, + struct net_device *orig_dev) +{ + struct mpassthru_port *mp_port = NULL; + struct sock *sk = NULL; + + if (!dev_is_mpassthru(skb-dev)) + return skb; + mp_port = skb-dev-mp_port; + + if (*pt_prev) { + *ret = deliver_skb(skb, *pt_prev, orig_dev); + *pt_prev = NULL; + } + + sk = mp_port-sock-sk; + skb_queue_tail(sk-sk_receive_queue, skb); + sk-sk_state_change(sk); + + return NULL; +} +#else +#define handle_mpassthru(skb, pt_prev, ret, orig_dev) (skb) +#endif + /** * netif_receive_skb - process receive buffer from network * @skb: buffer to process @@ -2598,6 +2629,10 @@ int netif_receive_skb(struct sk_buff *skb) ncls: #endif + /* To intercept mediate passthru(zero-copy) packets here */ + skb = handle_mpassthru(skb, pt_prev, ret, orig_dev); + if (!skb) + goto out; skb = handle_bridge(skb, pt_prev, ret, orig_dev); if (!skb) goto out; -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 16/16]An example how to alloc user buffer based on napi_gro_frags() interface.
From: Xin Xiaohui xiaohui@intel.com This example is made on ixgbe driver which using napi_gro_frags(). It can get buffers from guest side directly using netdev_alloc_page() and release guest buffers using netdev_free_page(). --- drivers/net/ixgbe/ixgbe_main.c | 25 + 1 files changed, 21 insertions(+), 4 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index 905d6d2..0977f2f 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -691,7 +691,14 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw, static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi, struct net_device *dev) { - return true; + return dev_is_mpassthru(dev); +} + +static u32 get_page_skb_offset(struct net_device *dev) +{ + if (!dev_is_mpassthru(dev)) + return 0; + return dev-mp_port-vnet_hlen; } /** @@ -764,7 +771,8 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, adapter-alloc_rx_page_failed++; goto no_buffers; } - bi-page_skb_offset = 0; + bi-page_skb_offset = + get_page_skb_offset(adapter-netdev); bi-dma = pci_map_page(pdev, bi-page_skb, bi-page_skb_offset, (PAGE_SIZE / 2), @@ -899,8 +907,10 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, len = le16_to_cpu(rx_desc-wb.upper.length); } - if (is_no_buffer(rx_buffer_info)) + if (is_no_buffer(rx_buffer_info)) { + printk(no buffers\n); break; + } cleaned = true; @@ -959,6 +969,12 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, rx_buffer_info-page_skb, rx_buffer_info-page_skb_offset, len); + if (dev_is_mpassthru(netdev) + netdev-mp_port-hash) + skb_shinfo(skb)-destructor_arg = + netdev-mp_port-hash(netdev, + rx_buffer_info-page_skb); + rx_buffer_info-page_skb = NULL; skb-len += len; skb-data_len += len; @@ -976,7 +992,8 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, upper_len); if ((rx_ring-rx_buf_len (PAGE_SIZE / 2)) || - (page_count(rx_buffer_info-page) != 1)) + (page_count(rx_buffer_info-page) != 1) || + dev_is_mpassthru(netdev)) rx_buffer_info-page = NULL; else get_page(rx_buffer_info-page); -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 15/16] An example how to modifiy NIC driver to use napi_gro_frags() interface
From: Xin Xiaohui xiaohui@intel.com This example is made on ixgbe driver. It provides API is_rx_buffer_mapped_as_page() to indicate if the driver use napi_gro_frags() interface or not. The example allocates 2 pages for DMA for one ring descriptor using netdev_alloc_page(). When packets is coming, using napi_gro_frags() to allocate skb and to receive the packets. --- drivers/net/ixgbe/ixgbe.h |3 + drivers/net/ixgbe/ixgbe_main.c | 151 2 files changed, 125 insertions(+), 29 deletions(-) diff --git a/drivers/net/ixgbe/ixgbe.h b/drivers/net/ixgbe/ixgbe.h index 79c35ae..fceffc5 100644 --- a/drivers/net/ixgbe/ixgbe.h +++ b/drivers/net/ixgbe/ixgbe.h @@ -131,6 +131,9 @@ struct ixgbe_rx_buffer { struct page *page; dma_addr_t page_dma; unsigned int page_offset; + u16 mapped_as_page; + struct page *page_skb; + unsigned int page_skb_offset; }; struct ixgbe_queue_stats { diff --git a/drivers/net/ixgbe/ixgbe_main.c b/drivers/net/ixgbe/ixgbe_main.c index 6c00ee4..905d6d2 100644 --- a/drivers/net/ixgbe/ixgbe_main.c +++ b/drivers/net/ixgbe/ixgbe_main.c @@ -688,6 +688,12 @@ static inline void ixgbe_release_rx_desc(struct ixgbe_hw *hw, IXGBE_WRITE_REG(hw, IXGBE_RDT(rx_ring-reg_idx), val); } +static bool is_rx_buffer_mapped_as_page(struct ixgbe_rx_buffer *bi, + struct net_device *dev) +{ + return true; +} + /** * ixgbe_alloc_rx_buffers - Replace used receive buffers; packet split * @adapter: address of board private structure @@ -704,13 +710,17 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, i = rx_ring-next_to_use; bi = rx_ring-rx_buffer_info[i]; + while (cleaned_count--) { rx_desc = IXGBE_RX_DESC_ADV(*rx_ring, i); + bi-mapped_as_page = + is_rx_buffer_mapped_as_page(bi, adapter-netdev); + if (!bi-page_dma (rx_ring-flags IXGBE_RING_RX_PS_ENABLED)) { if (!bi-page) { - bi-page = alloc_page(GFP_ATOMIC); + bi-page = netdev_alloc_page(adapter-netdev); if (!bi-page) { adapter-alloc_rx_page_failed++; goto no_buffers; @@ -727,7 +737,7 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, PCI_DMA_FROMDEVICE); } - if (!bi-skb) { + if (!bi-mapped_as_page !bi-skb) { struct sk_buff *skb; /* netdev_alloc_skb reserves 32 bytes up front!! */ uint bufsz = rx_ring-rx_buf_len + SMP_CACHE_BYTES; @@ -747,6 +757,19 @@ static void ixgbe_alloc_rx_buffers(struct ixgbe_adapter *adapter, rx_ring-rx_buf_len, PCI_DMA_FROMDEVICE); } + + if (bi-mapped_as_page !bi-page_skb) { + bi-page_skb = netdev_alloc_page(adapter-netdev); + if (!bi-page_skb) { + adapter-alloc_rx_page_failed++; + goto no_buffers; + } + bi-page_skb_offset = 0; + bi-dma = pci_map_page(pdev, bi-page_skb, + bi-page_skb_offset, + (PAGE_SIZE / 2), + PCI_DMA_FROMDEVICE); + } /* Refresh the desc even if buffer_addrs didn't change because * each write-back erases this info. */ if (rx_ring-flags IXGBE_RING_RX_PS_ENABLED) { @@ -823,6 +846,13 @@ struct ixgbe_rsc_cb { dma_addr_t dma; }; +static bool is_no_buffer(struct ixgbe_rx_buffer *rx_buffer_info) +{ + return (!rx_buffer_info-skb || + !rx_buffer_info-page_skb) + !rx_buffer_info-page; +} + #define IXGBE_RSC_CB(skb) ((struct ixgbe_rsc_cb *)(skb)-cb) static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, @@ -832,6 +862,7 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, struct ixgbe_adapter *adapter = q_vector-adapter; struct net_device *netdev = adapter-netdev; struct pci_dev *pdev = adapter-pdev; + struct napi_struct *napi = q_vector-napi; union ixgbe_adv_rx_desc *rx_desc, *next_rxd; struct ixgbe_rx_buffer *rx_buffer_info, *next_buffer; struct sk_buff *skb; @@ -868,29 +899,71 @@ static bool ixgbe_clean_rx_irq(struct ixgbe_q_vector *q_vector, len = le16_to_cpu(rx_desc-wb.upper.length); } + if
[RFC PATCH v10 14/16] Provides multiple submits and asynchronous notifications.
From: Xin Xiaohui xiaohui@intel.com The vhost-net backend now only supports synchronous send/recv operations. The patch provides multiple submits and asynchronous notifications. This is needed for zero-copy case. Signed-off-by: Xin Xiaohui xiaohui@intel.com --- drivers/vhost/net.c | 348 + drivers/vhost/vhost.c | 79 +++ drivers/vhost/vhost.h | 15 ++ 3 files changed, 414 insertions(+), 28 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index b38abc6..c4bc815 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -24,6 +24,8 @@ #include linux/if_arp.h #include linux/if_tun.h #include linux/if_macvlan.h +#include linux/mpassthru.h +#include linux/aio.h #include net/sock.h @@ -39,6 +41,8 @@ enum { VHOST_NET_VQ_MAX = 2, }; +static struct kmem_cache *notify_cache; + enum vhost_net_poll_state { VHOST_NET_POLL_DISABLED = 0, VHOST_NET_POLL_STARTED = 1, @@ -49,6 +53,7 @@ struct vhost_net { struct vhost_dev dev; struct vhost_virtqueue vqs[VHOST_NET_VQ_MAX]; struct vhost_poll poll[VHOST_NET_VQ_MAX]; + struct kmem_cache *cache; /* Tells us whether we are polling a socket for TX. * We only do this when socket buffer fills up. * Protected by tx vq lock. */ @@ -93,11 +98,190 @@ static void tx_poll_start(struct vhost_net *net, struct socket *sock) net-tx_poll_state = VHOST_NET_POLL_STARTED; } +struct kiocb *notify_dequeue(struct vhost_virtqueue *vq) +{ + struct kiocb *iocb = NULL; + unsigned long flags; + + spin_lock_irqsave(vq-notify_lock, flags); + if (!list_empty(vq-notifier)) { + iocb = list_first_entry(vq-notifier, + struct kiocb, ki_list); + list_del(iocb-ki_list); + } + spin_unlock_irqrestore(vq-notify_lock, flags); + return iocb; +} + +static void handle_iocb(struct kiocb *iocb) +{ + struct vhost_virtqueue *vq = iocb-private; + unsigned long flags; + + spin_lock_irqsave(vq-notify_lock, flags); + list_add_tail(iocb-ki_list, vq-notifier); + spin_unlock_irqrestore(vq-notify_lock, flags); +} + +static int is_async_vq(struct vhost_virtqueue *vq) +{ + return (vq-link_state == VHOST_VQ_LINK_ASYNC); +} + +static void handle_async_rx_events_notify(struct vhost_net *net, + struct vhost_virtqueue *vq, + struct socket *sock) +{ + struct kiocb *iocb = NULL; + struct vhost_log *vq_log = NULL; + int rx_total_len = 0; + unsigned int head, log, in, out; + int size; + int count; + + struct virtio_net_hdr_mrg_rxbuf hdr = { + .hdr.flags = 0, + .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE + }; + + if (!is_async_vq(vq)) + return; + + if (sock-sk-sk_data_ready) + sock-sk-sk_data_ready(sock-sk, 0); + + vq_log = unlikely(vhost_has_feature(net-dev, VHOST_F_LOG_ALL)) ? + vq-log : NULL; + + while ((iocb = notify_dequeue(vq)) != NULL) { + if (!iocb-ki_left) { + vhost_add_used_and_signal(net-dev, vq, + iocb-ki_pos, iocb-ki_nbytes); + size = iocb-ki_nbytes; + head = iocb-ki_pos; + rx_total_len += iocb-ki_nbytes; + + if (iocb-ki_dtor) + iocb-ki_dtor(iocb); + kmem_cache_free(net-cache, iocb); + + /* when log is enabled, recomputing the log is needed, +* since these buffers are in async queue, may not get +* the log info before. +*/ + if (unlikely(vq_log)) { + if (!log) + __vhost_get_desc(net-dev, vq, vq-iov, + ARRAY_SIZE(vq-iov), + out, in, vq_log, + log, head); + vhost_log_write(vq, vq_log, log, size); + } + if (unlikely(rx_total_len = VHOST_NET_WEIGHT)) { + vhost_poll_queue(vq-poll); + break; + } + } else { + int i = 0; + int count = iocb-ki_left; + int hc = count; + while (count--) { + if (iocb) { + vq-heads[i].id = iocb-ki_pos; + vq-heads[i].len = iocb-ki_nbytes; +
[RFC PATCH v10 13/16] Add mp(mediate passthru) device.
From: Xin Xiaohui xiaohui@intel.com The patch add mp(mediate passthru) device, which now based on vhost-net backend driver and provides proto_ops to send/receive guest buffers data from/to guest vitio-net driver. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- drivers/vhost/mpassthru.c | 1407 + 1 files changed, 1407 insertions(+), 0 deletions(-) create mode 100644 drivers/vhost/mpassthru.c diff --git a/drivers/vhost/mpassthru.c b/drivers/vhost/mpassthru.c new file mode 100644 index 000..d86d94c --- /dev/null +++ b/drivers/vhost/mpassthru.c @@ -0,0 +1,1407 @@ +/* + * MPASSTHRU - Mediate passthrough device. + * Copyright (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#define DRV_NAMEmpassthru +#define DRV_DESCRIPTION Mediate passthru device driver +#define DRV_COPYRIGHT (C) 2009 ZhaoYu, XinXiaohui, Dike, Jeffery G + +#include linux/compat.h +#include linux/module.h +#include linux/errno.h +#include linux/kernel.h +#include linux/major.h +#include linux/slab.h +#include linux/smp_lock.h +#include linux/poll.h +#include linux/fcntl.h +#include linux/init.h +#include linux/aio.h + +#include linux/skbuff.h +#include linux/netdevice.h +#include linux/etherdevice.h +#include linux/miscdevice.h +#include linux/ethtool.h +#include linux/rtnetlink.h +#include linux/if.h +#include linux/if_arp.h +#include linux/if_ether.h +#include linux/crc32.h +#include linux/nsproxy.h +#include linux/uaccess.h +#include linux/virtio_net.h +#include linux/mpassthru.h +#include net/net_namespace.h +#include net/netns/generic.h +#include net/rtnetlink.h +#include net/sock.h + +#include asm/system.h + +/* Uncomment to enable debugging */ +/* #define MPASSTHRU_DEBUG 1 */ + +#ifdef MPASSTHRU_DEBUG +static int debug; + +#define DBG if (mp-debug) printk +#define DBG1 if (debug == 2) printk +#else +#define DBG(a...) +#define DBG1(a...) +#endif + +#define COPY_THRESHOLD (L1_CACHE_BYTES * 4) +#define COPY_HDR_LEN (L1_CACHE_BYTES 64 ? 64 : L1_CACHE_BYTES) + +struct frag { + u16 offset; + u16 size; +}; + +#defineHASH_BUCKETS(8192*2) + +struct page_info { + struct list_headlist; + struct page_info*next; + struct page_info*prev; + struct page *pages[MAX_SKB_FRAGS]; + struct sk_buff *skb; + struct page_ctor*ctor; + + /* The pointer relayed to skb, to indicate +* it's a external allocated skb or kernel +*/ + struct skb_ext_pageext_page; + +#define INFO_READ 0 +#define INFO_WRITE 1 + unsignedflags; + unsignedpnum; + + /* The fields after that is for backend +* driver, now for vhost-net. +*/ + + struct kiocb*iocb; + unsigned intdesc_pos; + struct iovechdr[2]; + struct ioveciov[MAX_SKB_FRAGS]; +}; + +static struct kmem_cache *ext_page_info_cache; + +struct page_ctor { + struct list_headreadq; + int wq_len; + int rq_len; + spinlock_t read_lock; + /* record the locked pages */ + int lock_pages; + struct rlimit o_rlim; + struct net_device *dev; + struct mpassthru_port port; + struct page_info**hash_table; +}; + +struct mp_struct { + struct mp_file *mfile; + struct net_device *dev; + struct page_ctor*ctor; + struct socket socket; + +#ifdef MPASSTHRU_DEBUG + int debug; +#endif +}; + +struct mp_file { + atomic_t count; + struct mp_struct *mp; + struct net *net; +}; + +struct mp_sock { + struct sock sk; + struct mp_struct*mp; +}; + +static int mp_dev_change_flags(struct net_device *dev, unsigned flags) +{ + int ret = 0; + + rtnl_lock(); + ret = dev_change_flags(dev, flags); + rtnl_unlock(); + + if (ret 0) + printk(KERN_ERR failed to change dev state of %s, dev-name); + + return ret; +} + +/* The main function to allocate external buffers */ +static struct skb_ext_page *page_ctor(struct mpassthru_port *port, +
[RFC PATCH v10 12/16] Add a kconfig entry and make entry for mp device.
From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- drivers/vhost/Kconfig | 10 ++ drivers/vhost/Makefile |2 ++ 2 files changed, 12 insertions(+), 0 deletions(-) diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig index e4e2fd1..a6b8cbf 100644 --- a/drivers/vhost/Kconfig +++ b/drivers/vhost/Kconfig @@ -9,3 +9,13 @@ config VHOST_NET To compile this driver as a module, choose M here: the module will be called vhost_net. +config MEDIATE_PASSTHRU + tristate mediate passthru network driver (EXPERIMENTAL) + depends on VHOST_NET + ---help--- + zerocopy network I/O support, we call it as mediate passthru to + be distiguish with hardare passthru. + + To compile this driver as a module, choose M here: the module will + be called mpassthru. + diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile index 72dd020..c18b9fc 100644 --- a/drivers/vhost/Makefile +++ b/drivers/vhost/Makefile @@ -1,2 +1,4 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o vhost_net-y := vhost.o net.o + +obj-$(CONFIG_MEDIATE_PASSTHRU) += mpassthru.o -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 00/16] Provide a zero-copy method on KVM virtio-net.
We provide an zero-copy method which driver side may get external buffers to DMA. Here external means driver don't use kernel space to allocate skb buffers. Currently the external buffer can be from guest virtio-net driver. The idea is simple, just to pin the guest VM user space and then let host NIC driver has the chance to directly DMA to it. The patches are based on vhost-net backend driver. We add a device which provides proto_ops as sendmsg/recvmsg to vhost-net to send/recv directly to/from the NIC driver. KVM guest who use the vhost-net backend may bind any ethX interface in the host side to get copyless data transfer thru guest virtio-net frontend. patch 01-10:net core and kernel changes. patch 11-13:new device as interface to mantpulate external buffers. patch 14: for vhost-net. patch 15: An example on modifying NIC driver to using napi_gro_frags(). patch 16: An example how to get guest buffers based on driver who using napi_gro_frags(). The guest virtio-net driver submits multiple requests thru vhost-net backend driver to the kernel. And the requests are queued and then completed after corresponding actions in h/w are done. For read, user space buffers are dispensed to NIC driver for rx when a page constructor API is invoked. Means NICs can allocate user buffers from a page constructor. We add a hook in netif_receive_skb() function to intercept the incoming packets, and notify the zero-copy device. For write, the zero-copy deivce may allocates a new host skb and puts payload on the skb_shinfo(skb)-frags, and copied the header to skb-data. The request remains pending until the skb is transmitted by h/w. We provide multiple submits and asynchronous notifiicaton to vhost-net too. Our goal is to improve the bandwidth and reduce the CPU usage. Exact performance data will be provided later. What we have not done yet: Performance tuning what we have done in v1: polish the RCU usage deal with write logging in asynchroush mode in vhost add notifier block for mp device rename page_ctor to mp_port in netdevice.h to make it looks generic add mp_dev_change_flags() for mp device to change NIC state add CONIFG_VHOST_MPASSTHRU to limit the usage when module is not load a small fix for missing dev_put when fail using dynamic minor instead of static minor number a __KERNEL__ protect to mp_get_sock() what we have done in v2: remove most of the RCU usage, since the ctor pointer is only changed by BIND/UNBIND ioctl, and during that time, NIC will be stopped to get good cleanup(all outstanding requests are finished), so the ctor pointer cannot be raced into wrong situation. Remove the struct vhost_notifier with struct kiocb. Let vhost-net backend to alloc/free the kiocb and transfer them via sendmsg/recvmsg. use get_user_pages_fast() and set_page_dirty_lock() when read. Add some comments for netdev_mp_port_prep() and handle_mpassthru(). what we have done in v3: the async write logging is rewritten a drafted synchronous write function for qemu live migration a limit for locked pages from get_user_pages_fast() to prevent Dos by using RLIMIT_MEMLOCK what we have done in v4: add iocb completion callback from vhost-net to queue iocb in mp device replace vq-receiver by mp_sock_data_ready() remove stuff in mp device which access structures from vhost-net modify skb_reserve() to ignore host NIC driver reserved space rebase to the latest vhost tree split large patches into small pieces, especially for net core part. what we have done in v5: address Arnd Bergmann's comments -remove IFF_MPASSTHRU_EXCL flag in mp device -Add CONFIG_COMPAT macro -remove mp_release ops move dev_is_mpassthru() as inline func fix a bug in memory relinquish Apply to current git (2.6.34-rc6) tree. what we have done in v6: move create_iocb() out of page_dtor which may happen in interrupt context -This remove the potential issues which lock called in interrupt context make the cache used by mp, vhost as static, and created/destoryed during modules init/exit functions. -This makes multiple mp guest created at the same time. what we have done in v7: some cleanup prepared to suppprt PS mode what we have done in v8: discarding the modifications to point skb-data to guest buffer directly. Add code to modify driver to support napi_gro_frags() with Herbert's comments. To support PS mode. Add mergeable buffer support in mp device. Add GSO/GRO support in mp deice. Address comments from Eric Dumazet about cache line and rcu usage. what we have done in v9: v8
[RFC PATCH v10 02/16] Add a new struct for device to manipulate external buffer.
From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/netdevice.h | 22 +- 1 files changed, 21 insertions(+), 1 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index fa8b476..ba582e1 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -530,6 +530,25 @@ struct netdev_queue { unsigned long tx_dropped; } cacheline_aligned_in_smp; +/* Add a structure in structure net_device, the new field is + * named as mp_port. It's for mediate passthru (zero-copy). + * It contains the capability for the net device driver, + * a socket, and an external buffer creator, external means + * skb buffer belongs to the device may not be allocated from + * kernel space. + */ +struct mpassthru_port { + int hdr_len; + int data_len; + int npages; + unsignedflags; + struct socket *sock; + int vnet_hlen; + struct skb_ext_page *(*ctor)(struct mpassthru_port *, + struct sk_buff *, int); + struct skb_ext_page *(*hash)(struct net_device *, + struct page *); +}; /* * This structure defines the management hooks for network devices. @@ -952,7 +971,8 @@ struct net_device { struct macvlan_port *macvlan_port; /* GARP */ struct garp_port*garp_port; - + /* mpassthru */ + struct mpassthru_port *mp_port; /* class/net/name entry */ struct device dev; /* space for optional device, statistics, and wireless sysfs groups */ -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 04/16] Add a function make external buffer owner to query capability.
From: Xin Xiaohui xiaohui@intel.com The external buffer owner can use the functions to get the capability of the underlying NIC driver. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/netdevice.h |2 + net/core/dev.c| 49 + 2 files changed, 51 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index aba0308..5f192de 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1599,6 +1599,8 @@ extern gro_result_t napi_frags_finish(struct napi_struct *napi, gro_result_t ret); extern struct sk_buff *napi_frags_skb(struct napi_struct *napi); extern gro_result_tnapi_gro_frags(struct napi_struct *napi); +extern int netdev_mp_port_prep(struct net_device *dev, + struct mpassthru_port *port); static inline void napi_free_frags(struct napi_struct *napi) { diff --git a/net/core/dev.c b/net/core/dev.c index 264137f..636f11b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2468,6 +2468,55 @@ void netif_nit_deliver(struct sk_buff *skb) rcu_read_unlock(); } +/* To support meidate passthru(zero-copy) with NIC driver, + * we'd better query NIC driver for the capability it can + * provide, especially for packet split mode, now we only + * query for the header size, and the payload a descriptor + * may carry. If a driver does not use the API to export, + * then we may try to use a default value, currently, + * we use the default value from an IGB driver. Now, + * it's only called by mpassthru device. + */ +#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE) +int netdev_mp_port_prep(struct net_device *dev, + struct mpassthru_port *port) +{ + int rc; + int npages, data_len; + const struct net_device_ops *ops = dev-netdev_ops; + + if (ops-ndo_mp_port_prep) { + rc = ops-ndo_mp_port_prep(dev, port); + if (rc) + return rc; + } else { + /* If the NIC driver did not report this, +* then we try to use default value. +*/ + port-hdr_len = 128; + port-data_len = 2048; + port-npages = 1; + } + + if (port-hdr_len = 0) + goto err; + + npages = port-npages; + data_len = port-data_len; + if (npages = 0 || npages MAX_SKB_FRAGS || + (data_len PAGE_SIZE * (npages - 1) || +data_len PAGE_SIZE * npages)) + goto err; + + return 0; +err: + dev_warn(dev-dev, invalid page constructor parameters\n); + + return -EINVAL; +} +EXPORT_SYMBOL(netdev_mp_port_prep); +#endif + /** * netif_receive_skb - process receive buffer from network * @skb: buffer to process -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 05/16] Add a function to indicate if device use external buffer.
From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/netdevice.h |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 5f192de..23d6ec0 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1602,6 +1602,11 @@ extern gro_result_t napi_gro_frags(struct napi_struct *napi); extern int netdev_mp_port_prep(struct net_device *dev, struct mpassthru_port *port); +static inline bool dev_is_mpassthru(struct net_device *dev) +{ + return dev dev-mp_port; +} + static inline void napi_free_frags(struct napi_struct *napi) { kfree_skb(napi-skb); -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 08/16] Modify netdev_free_page() to release external buffer
From: Xin Xiaohui xiaohui@intel.com Currently, it can get external buffers from mp device. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/skbuff.h |4 +++- net/core/skbuff.c | 24 2 files changed, 27 insertions(+), 1 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index ab29675..3d7f70e 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -1512,9 +1512,11 @@ static inline struct page *netdev_alloc_page(struct net_device *dev) return __netdev_alloc_page(dev, GFP_ATOMIC); } +extern void __netdev_free_page(struct net_device *dev, struct page *page); + static inline void netdev_free_page(struct net_device *dev, struct page *page) { - __free_page(page); + __netdev_free_page(dev, page); } /** diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1a61e2b..bbf4707 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -306,6 +306,30 @@ struct page *__netdev_alloc_page(struct net_device *dev, gfp_t gfp_mask) } EXPORT_SYMBOL(__netdev_alloc_page); +void netdev_free_ext_page(struct net_device *dev, struct page *page) +{ + struct skb_ext_page *ext_page = NULL; + if (dev_is_mpassthru(dev) dev-mp_port-hash) { + ext_page = dev-mp_port-hash(dev, page); + if (ext_page) + ext_page-dtor(ext_page); + else + __free_page(page); + } +} +EXPORT_SYMBOL(netdev_free_ext_page); + +void __netdev_free_page(struct net_device *dev, struct page *page) +{ + if (dev_is_mpassthru(dev)) { + netdev_free_ext_page(dev, page); + return; + } + + __free_page(page); +} +EXPORT_SYMBOL(__netdev_free_page); + void skb_add_rx_frag(struct sk_buff *skb, int i, struct page *page, int off, int size) { -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 11/16] Add header file for mp device.
From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/mpassthru.h | 25 + 1 files changed, 25 insertions(+), 0 deletions(-) create mode 100644 include/linux/mpassthru.h diff --git a/include/linux/mpassthru.h b/include/linux/mpassthru.h new file mode 100644 index 000..ba8f320 --- /dev/null +++ b/include/linux/mpassthru.h @@ -0,0 +1,25 @@ +#ifndef __MPASSTHRU_H +#define __MPASSTHRU_H + +#include linux/types.h +#include linux/if_ether.h + +/* ioctl defines */ +#define MPASSTHRU_BINDDEV _IOW('M', 213, int) +#define MPASSTHRU_UNBINDDEV_IO('M', 214) + +#ifdef __KERNEL__ +#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE) +struct socket *mp_get_socket(struct file *); +#else +#include linux/err.h +#include linux/errno.h +struct file; +struct socket; +static inline struct socket *mp_get_socket(struct file *f) +{ + return ERR_PTR(-EINVAL); +} +#endif /* CONFIG_MEDIATE_PASSTHRU */ +#endif /* __KERNEL__ */ +#endif /* __MPASSTHRU_H */ -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 06/16] Use callback to deal with skb_release_data() specially.
From: Xin Xiaohui xiaohui@intel.com If buffer is external, then use the callback to destruct buffers. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/skbuff.h |3 ++- net/core/skbuff.c |8 2 files changed, 10 insertions(+), 1 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 74af06c..ab29675 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -197,10 +197,11 @@ struct skb_shared_info { union skb_shared_tx tx_flags; struct sk_buff *frag_list; struct skb_shared_hwtstamps hwtstamps; - skb_frag_t frags[MAX_SKB_FRAGS]; /* Intermediate layers must ensure that destructor_arg * remains valid until skb destructor */ void * destructor_arg; + + skb_frag_t frags[MAX_SKB_FRAGS]; }; /* The structure is for a skb which pages may point to diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 93c4e06..117d82b 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -217,6 +217,7 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, shinfo-gso_type = 0; shinfo-ip6_frag_id = 0; shinfo-tx_flags.flags = 0; + shinfo-destructor_arg = NULL; skb_frag_list_init(skb); memset(shinfo-hwtstamps, 0, sizeof(shinfo-hwtstamps)); @@ -350,6 +351,13 @@ static void skb_release_data(struct sk_buff *skb) if (skb_has_frags(skb)) skb_drop_fraglist(skb); + if (skb-dev dev_is_mpassthru(skb-dev)) { + struct skb_ext_page *ext_page = + skb_shinfo(skb)-destructor_arg; + if (ext_page ext_page-dtor) + ext_page-dtor(ext_page); + } + kfree(skb-head); } } -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 03/16] Add a ndo_mp_port_prep pointer to net_device_ops.
From: Xin Xiaohui xiaohui@intel.com If the driver want to allocate external buffers, then it can export it's capability, as the skb buffer header length, the page length can be DMA, etc. The external buffers owner may utilize this. Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/netdevice.h |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index ba582e1..aba0308 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -710,6 +710,10 @@ struct net_device_ops { int (*ndo_fcoe_get_wwn)(struct net_device *dev, u64 *wwn, int type); #endif +#if defined(CONFIG_MEDIATE_PASSTHRU) || defined(CONFIG_MEDIATE_PASSTHRU_MODULE) + int (*ndo_mp_port_prep)(struct net_device *dev, + struct mpassthru_port *port); +#endif }; /* -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC PATCH v10 01/16] Add a new structure for skb buffer from external.
From: Xin Xiaohui xiaohui@intel.com Signed-off-by: Xin Xiaohui xiaohui@intel.com Signed-off-by: Zhao Yu yzhao81...@gmail.com Reviewed-by: Jeff Dike jd...@linux.intel.com --- include/linux/skbuff.h |9 + 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 124f90c..74af06c 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -203,6 +203,15 @@ struct skb_shared_info { void * destructor_arg; }; +/* The structure is for a skb which pages may point to + * an external buffer, which is not allocated from kernel space. + * It also contains a destructor for itself. + */ +struct skb_ext_page { + struct page *page; + void(*dtor)(struct skb_ext_page *); +}; + /* We divide dataref into two halves. The higher 16 bits hold references * to the payload part of skb-data. The lower 16 bits hold references to * the entire skb-data. A clone of a headerless skb holds the length of -- 1.5.4.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [RFC PATCH v9 12/16] Add mp(mediate passthru) device.
+ + if (ctor-lock_pages + count lock_limit npages) { + printk(KERN_INFO exceed the locked memory rlimit.); + return NULL; + } + + info = kmem_cache_zalloc(ext_page_info_cache, GFP_KERNEL); You seem to fill in all memory, why zalloc? this is data path ... Ok, Let me check this. It's mainly for info-next and info-prev, these two fields will be used in hash functions. But you are right, since most fields will be refilled. The new version includes the fix. Thanks Xiaohui -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On 10.09.2010, at 17:38, Joerg Roedel wrote: This patch adds the svm cpuid feature flags to the qemu intialization path. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- target-i386/cpu.h | 12 +++ target-i386/cpuid.c | 80 --- target-i386/kvm.c |3 ++ 3 files changed, 78 insertions(+), 17 deletions(-) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 1144d4e..77eeab1 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -405,6 +405,17 @@ #define CPUID_EXT3_IBS (1 10) #define CPUID_EXT3_SKINIT (1 12) +#define CPUID_SVM_NPT (1 0) +#define CPUID_SVM_LBRV (1 1) +#define CPUID_SVM_SVMLOCK (1 2) +#define CPUID_SVM_NRIPSAVE (1 3) +#define CPUID_SVM_TSCSCALE (1 4) +#define CPUID_SVM_VMCBCLEAN(1 5) +#define CPUID_SVM_FLUSHASID(1 6) +#define CPUID_SVM_DECODEASSIST (1 7) +#define CPUID_SVM_PAUSEFILTER (1 10) +#define CPUID_SVM_PFTHRESHOLD (1 12) + #define CPUID_VENDOR_INTEL_1 0x756e6547 /* Genu */ #define CPUID_VENDOR_INTEL_2 0x49656e69 /* ineI */ #define CPUID_VENDOR_INTEL_3 0x6c65746e /* ntel */ @@ -702,6 +713,7 @@ typedef struct CPUX86State { uint8_t has_error_code; uint32_t sipi_vector; uint32_t cpuid_kvm_features; +uint32_t cpuid_svm_features; /* in order to simplify APIC support, we leave this pointer to the user */ diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c index 3fcf78f..ea1ac73 100644 --- a/target-i386/cpuid.c +++ b/target-i386/cpuid.c @@ -79,6 +79,17 @@ static const char *kvm_feature_name[] = { NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, }; +static const char *svm_feature_name[] = { +npt, lbrv, svm_lock, nrip_save, +tsc_scale, vmcb_clean, flushbyasid, decodeassists, +NULL, NULL, pause_filter, NULL, +pfthreshold, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +NULL, NULL, NULL, NULL, +}; + /* collects per-function cpuid data */ typedef struct model_features_t { @@ -192,13 +203,15 @@ static void add_flagname_to_bitmaps(const char *flagname, uint32_t *features, uint32_t *ext_features, uint32_t *ext2_features, uint32_t *ext3_features, -uint32_t *kvm_features) +uint32_t *kvm_features, +uint32_t *svm_features) { if (!lookup_feature(features, flagname, NULL, feature_name) !lookup_feature(ext_features, flagname, NULL, ext_feature_name) !lookup_feature(ext2_features, flagname, NULL, ext2_feature_name) !lookup_feature(ext3_features, flagname, NULL, ext3_feature_name) -!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name)) +!lookup_feature(kvm_features, flagname, NULL, kvm_feature_name) +!lookup_feature(svm_features, flagname, NULL, svm_feature_name)) fprintf(stderr, CPU feature %s not found\n, flagname); } @@ -210,7 +223,8 @@ typedef struct x86_def_t { int family; int model; int stepping; -uint32_t features, ext_features, ext2_features, ext3_features, kvm_features; +uint32_t features, ext_features, ext2_features, ext3_features; +uint32_t kvm_features, svm_features; uint32_t xlevel; char model_id[48]; int vendor_override; @@ -253,6 +267,7 @@ typedef struct x86_def_t { CPUID_EXT2_PDPE1GB */ #define TCG_EXT3_FEATURES (CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | \ CPUID_EXT3_CR8LEG | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A) +#define TCG_SVM_FEATURES 0 /* maintains list of cpu model definitions */ @@ -278,6 +293,8 @@ static x86_def_t builtin_x86_defs[] = { CPUID_EXT2_LM | CPUID_EXT2_SYSCALL | CPUID_EXT2_NX, .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A, +.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | CPUID_SVM_NRIPSAVE | +CPUID_SVM_VMCBCLEAN, .xlevel = 0x800A, .model_id = QEMU Virtual CPU version QEMU_VERSION, }, @@ -305,6 +322,8 @@ static x86_def_t builtin_x86_defs[] = { CPUID_EXT3_OSVW, CPUID_EXT3_IBS */ .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A, +.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | CPUID_SVM_NRIPSAVE | +CPUID_SVM_VMCBCLEAN, Does that phenom already do all those? It does NPT, but I'm not sure about NRIPSAVE for example. .xlevel = 0x801A, .model_id = AMD Phenom(tm) 9550 Quad-Core Processor }, @@ -505,6 +524,15 @@ static int cpu_x86_fill_host(x86_def_t *x86_cpu_def) cpu_x86_fill_model_id(x86_cpu_def-model_id);
Re: [Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model
On Fri, 10 Sep 2010, H. Peter Anvin wrote: On 09/10/2010 07:47 PM, H. Peter Anvin wrote: On 09/10/2010 06:08 PM, H. Peter Anvin wrote: Any remotely recent stock distro should have support for it. I say should, because I haven't actually tested it with a Linux guest yet. I'll try to do that later; I have to leave now. Just tested it on a stock Fedora 13 64 bit VM; it behaves exactly the same way as under Win7. Sorry but I have no idea what stock Fedora 13 64 bit VM is. Just for the sake of completeness, the Qemu command line was: ~/qemu/git-kvm/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 1024 -usb -soundhw usb -hda qemu-fc13-64.img -serial stdio ... and this was with the usb-audio patch applied against top of the the qemu-kvm git tree (the patch applies against the top of the main qemu tree too, but because KVM isn't very stable there it was faster to use the KVM tree.) ^^^ this doesn't parse, somewhere QEMU was replaced by KVM i think Anywho, i tried it with linux-test and custom/minimal compiled 2.6.32 [1] x86_64-softmmu/qemu-system-x86_64 -kernel \ ~/x/bld/linux-2.6.32/arch/x86_64/boot/bzImage -append root=/dev/hda \ -vnc :0 -soundhw usb ~/x/img/linux-0.2.img -usb [-enable-kvm] ^^^ this has no consequence [2] Inside the guest `$ madplay 20thfull.mp2' and things sounded fine with OSS, with ALSA the story is somewhat different, it stuttered for a while but then settled and things went back to smooth playback. So i need a reproduction scenario [1] .config available on request [2] Well actually it has - on the speed `-enable-kvm' makes boot sluggish for whatever reason -- mailto:av1...@comtv.ru -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On Sat, Sep 11, 2010 at 03:43:02PM +0200, Alexander Graf wrote: @@ -305,6 +322,8 @@ static x86_def_t builtin_x86_defs[] = { CPUID_EXT3_OSVW, CPUID_EXT3_IBS */ .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A, +.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | CPUID_SVM_NRIPSAVE | +CPUID_SVM_VMCBCLEAN, Does that phenom already do all those? It does NPT, but I'm not sure about NRIPSAVE for example. Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old Phenoms don't have it. For the SVM features it is not that important what the host hardware supports but what KVM can emulate. VMCBCLEAN can be emulated without supporting it in the host for example. +/* + * Every SVM feature requires emulation support in KVM - so we can't just + * read the host features here. KVM might even support SVM features not + * available on the host hardware + */ +x86_cpu_def-svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | +CPUID_SVM_NRIPSAVE | CPUID_SVM_VMCBCLEAN; Hrm. Wouldn't it make more sense to declare this to -1? This will still go through the kernel space matcher which tells us which features are available anyways, right? Yeah, that would make sense. I thought about it while porting the patches but could not actually made me do it because I am not entirely sure that this is a good idea. But I may revisit that, especially after your question :-) - -if (kvm_enabled()) { -/* Nested SVM not yet supported in upstream QEMU */ -*ecx = ~CPUID_EXT3_SVM; -} Have you made sure that the default cpu type doesn't enable the SVM bit? I couldn't find any trace of an override to kvm64 as default type when KVM is used. No, the default CPU type has SVM still enabled by default. I thought about removing the SVM flag from the qemu64 cpu definition but that breaks on TCG where SVM is emulated too. What I implemented in this patch is to enable SVM by default and mask it out if KVM does not support it on the given machine. Problem here is that KVM is currently buggy because it always reports support for SVM, even on Intel machines. I fixed that with patch 29 of my npt-virt patch-set. The patch will hopefully make it into the various stable trees and then we have a clean solution. The rest looks good :). Thanks a lot for this patch set! Great, thanks :-) Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On 11.09.2010, at 16:20, Joerg Roedel wrote: On Sat, Sep 11, 2010 at 03:43:02PM +0200, Alexander Graf wrote: @@ -305,6 +322,8 @@ static x86_def_t builtin_x86_defs[] = { CPUID_EXT3_OSVW, CPUID_EXT3_IBS */ .ext3_features = CPUID_EXT3_LAHF_LM | CPUID_EXT3_SVM | CPUID_EXT3_ABM | CPUID_EXT3_SSE4A, +.svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | CPUID_SVM_NRIPSAVE | +CPUID_SVM_VMCBCLEAN, Does that phenom already do all those? It does NPT, but I'm not sure about NRIPSAVE for example. Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old Phenoms don't have it. For the SVM features it is not that important what the host hardware supports but what KVM can emulate. VMCBCLEAN can be emulated without supporting it in the host for example. That particular one was my workstation - a Phenom 9550 which is one of the early 4-core ones. +/* + * Every SVM feature requires emulation support in KVM - so we can't just + * read the host features here. KVM might even support SVM features not + * available on the host hardware + */ +x86_cpu_def-svm_features = CPUID_SVM_NPT | CPUID_SVM_LBRV | +CPUID_SVM_NRIPSAVE | CPUID_SVM_VMCBCLEAN; Hrm. Wouldn't it make more sense to declare this to -1? This will still go through the kernel space matcher which tells us which features are available anyways, right? Yeah, that would make sense. I thought about it while porting the patches but could not actually made me do it because I am not entirely sure that this is a good idea. But I may revisit that, especially after your question :-) - -if (kvm_enabled()) { -/* Nested SVM not yet supported in upstream QEMU */ -*ecx = ~CPUID_EXT3_SVM; -} Have you made sure that the default cpu type doesn't enable the SVM bit? I couldn't find any trace of an override to kvm64 as default type when KVM is used. No, the default CPU type has SVM still enabled by default. I thought about removing the SVM flag from the qemu64 cpu definition but that breaks on TCG where SVM is emulated too. What I implemented in this patch is to enable SVM by default and mask it out if KVM does not support it on the given machine. Problem here is that KVM is currently buggy because it always reports support for SVM, even on Intel machines. I fixed that with patch 29 of my npt-virt patch-set. The patch will hopefully make it into the various stable trees and then we have a clean solution. It still won't be clean as it breaks cross vendor migration :(. The real fix would be to set the default machine to kvm64 instead of qemu64 in pc.c when kvm_enabled(). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On 11.09.2010, at 16:36, Joerg Roedel wrote: On Sat, Sep 11, 2010 at 04:29:18PM +0200, Alexander Graf wrote: Depends on which Phenom you have. A Phenom II has NRIPSAVE but the old Phenoms don't have it. For the SVM features it is not that important what the host hardware supports but what KVM can emulate. VMCBCLEAN can be emulated without supporting it in the host for example. That particular one was my workstation - a Phenom 9550 which is one of the early 4-core ones. Yes, the 9550 don't have the nripsave feature. No, the default CPU type has SVM still enabled by default. I thought about removing the SVM flag from the qemu64 cpu definition but that breaks on TCG where SVM is emulated too. What I implemented in this patch is to enable SVM by default and mask it out if KVM does not support it on the given machine. Problem here is that KVM is currently buggy because it always reports support for SVM, even on Intel machines. I fixed that with patch 29 of my npt-virt patch-set. The patch will hopefully make it into the various stable trees and then we have a clean solution. It still won't be clean as it breaks cross vendor migration :(. The real fix would be to set the default machine to kvm64 instead of qemu64 in pc.c when kvm_enabled(). I am not sure that I am the right person to do such an invasive change. At least not in this patch-set. I could think of removing SVM from the qemu64 definition and add it again in the TCG specific path. It's not an invasive change and IMHO the only correct one. I'm not even sure why it's not done yet - after all the reason for the kvm* cpu types is exactly that. Please just add it as an early patch in your series. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On Sat, Sep 11, 2010 at 04:38:51PM +0200, Alexander Graf wrote: I am not sure that I am the right person to do such an invasive change. At least not in this patch-set. I could think of removing SVM from the qemu64 definition and add it again in the TCG specific path. It's not an invasive change and IMHO the only correct one. I'm not even sure why it's not done yet - after all the reason for the kvm* cpu types is exactly that. Please just add it as an early patch in your series. Okay, if you say its ok I will change it. But if anyone comes to me with regressions I will send them straight to you :-P Joerg -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] qemu-kvm: Add svm cpuid features
On 11.09.2010, at 16:42, Joerg Roedel wrote: On Sat, Sep 11, 2010 at 04:38:51PM +0200, Alexander Graf wrote: I am not sure that I am the right person to do such an invasive change. At least not in this patch-set. I could think of removing SVM from the qemu64 definition and add it again in the TCG specific path. It's not an invasive change and IMHO the only correct one. I'm not even sure why it's not done yet - after all the reason for the kvm* cpu types is exactly that. Please just add it as an early patch in your series. Okay, if you say its ok I will change it. But if anyone comes to me with regressions I will send them straight to you :-P Feel free to do so :). We really need to have a different default CPU for migration safe KVM and This is what TCG can emulate. They don't match semantically. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] [RFC] Add support for a USB audio device model
I meant just take the Fedora 13 DVD and install it onto a virtual hard disk. More later when I'm at a real computer. malc av1...@comtv.ru wrote: On Fri, 10 Sep 2010, H. Peter Anvin wrote: On 09/10/2010 07:47 PM, H. Peter Anvin wrote: On 09/10/2010 06:08 PM, H. Peter Anvin wrote: Any remotely recent stock distro should have support for it. I say should, because I haven't actually tested it with a Linux guest yet. I'll try to do that later; I have to leave now. Just tested it on a stock Fedora 13 64 bit VM; it behaves exactly the same way as under Win7. Sorry but I have no idea what stock Fedora 13 64 bit VM is. Just for the sake of completeness, the Qemu command line was: ~/qemu/git-kvm/x86_64-softmmu/qemu-system-x86_64 -enable-kvm -smp 2 -m 1024 -usb -soundhw usb -hda qemu-fc13-64.img -serial stdio ... and this was with the usb-audio patch applied against top of the the qemu-kvm git tree (the patch applies against the top of the main qemu tree too, but because KVM isn't very stable there it was faster to use the KVM tree.) ^^^ this doesn't parse, somewhere QEMU was replaced by KVM i think Anywho, i tried it with linux-test and custom/minimal compiled 2.6.32 [1] x86_64-softmmu/qemu-system-x86_64 -kernel \ ~/x/bld/linux-2.6.32/arch/x86_64/boot/bzImage -append root=/dev/hda \ -vnc :0 -soundhw usb ~/x/img/linux-0.2.img -usb [-enable-kvm] ^^^ this has no consequence [2] Inside the guest `$ madplay 20thfull.mp2' and things sounded fine with OSS, with ALSA the story is somewhat different, it stuttered for a while but then settled and things went back to smooth playback. So i need a reproduction scenario [1] .config available on request [2] Well actually it has - on the speed `-enable-kvm' makes boot sluggish for whatever reason -- mailto:av1...@comtv.ru -- Sent from my mobile phone. Please pardon any lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Add support for a USB audio device model
It would certainly be a worthwhile project. Alexander Graf ag...@suse.de wrote: On 11.09.2010, at 03:08, H. Peter Anvin wrote: [snip] I know. Someone else is welcome to do that... since it would require knowing both the VirtualBox and the Qemu sound subsystem interfaces and in what ways they are similar or different. They should be reasonably close. About 80% of the VBox device model consists of qemu, mangled through an OS2'ifier. Alex -- Sent from my mobile phone. Please pardon any lack of formatting. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[no subject]
Your ID won £1,000,000.00, in the BT Promo. Send Names.Tel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] S390: Add virtio hotplug add support
On 24.08.2010, at 15:48, Alexander Graf wrote: The one big missing feature in s390-virtio was hotplugging. This is no more. This patch implements hotplug add support, so you can on the fly add new devices in the guest. Keep in mind that this needs a patch for qemu to actually leverage the functionality. Signed-off-by: Alexander Graf ag...@suse.de ping (on the patch set)? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/11] arch/powerpc: Remove pr_level uses of KERN_level
Signed-off-by: Joe Perches j...@perches.com --- arch/powerpc/kvm/emulate.c |4 ++-- arch/powerpc/sysdev/pmi.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 4568ec3..b83ba58 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -145,7 +145,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) /* this default type might be overwritten by subcategories */ kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS); - pr_debug(KERN_INFO Emulating opcode %d / %d\n, get_op(inst), get_xop(inst)); + pr_debug(Emulating opcode %d / %d\n, get_op(inst), get_xop(inst)); switch (get_op(inst)) { case OP_TRAP: @@ -275,7 +275,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) { u64 jd = get_tb() - vcpu-arch.dec_jiffies; kvmppc_set_gpr(vcpu, rt, vcpu-arch.dec - jd); - pr_debug(KERN_INFO mfDEC: %x - %llx = %lx\n, + pr_debug(mfDEC: %x - %llx = %lx\n, vcpu-arch.dec, jd, kvmppc_get_gpr(vcpu, rt)); break; diff --git a/arch/powerpc/sysdev/pmi.c b/arch/powerpc/sysdev/pmi.c index 24a0bb9..4260f36 100644 --- a/arch/powerpc/sysdev/pmi.c +++ b/arch/powerpc/sysdev/pmi.c @@ -114,7 +114,7 @@ static void pmi_notify_handlers(struct work_struct *work) spin_lock(data-handler_spinlock); list_for_each_entry(handler, data-handler, node) { - pr_debug(KERN_INFO pmi: notifying handler %p\n, handler); + pr_debug(pmi: notifying handler %p\n, handler); if (handler-type == data-msg.type) handler-handle_pmi_message(data-msg); } -- 1.7.3.rc1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] [RFC] Add support for a USB audio device model
On 09/11/2010 12:41 AM, Stefan Hajnoczi wrote: On Fri, Sep 10, 2010 at 10:47 PM, H. Peter Anvin h...@linux.intel.com wrote: diff --git a/hw/usb-audio.c b/hw/usb-audio.c new file mode 100644 index 000..d4cf488 --- /dev/null +++ b/hw/usb-audio.c @@ -0,0 +1,702 @@ +/* + * QEMU USB Net devices + * + * Copyright (c) 2006 Thomas Sailer + * Copyright (c) 2008 Andrzej Zaborowski Want to update this for usb-audio? Stefan Yeah, obviously... -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/11] arch/powerpc: Remove pr_level uses of KERN_level
Signed-off-by: Joe Perches j...@perches.com --- arch/powerpc/kvm/emulate.c |4 ++-- arch/powerpc/sysdev/pmi.c |2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index 4568ec3..b83ba58 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -145,7 +145,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) /* this default type might be overwritten by subcategories */ kvmppc_set_exit_type(vcpu, EMULATED_INST_EXITS); - pr_debug(KERN_INFO Emulating opcode %d / %d\n, get_op(inst), get_xop(inst)); + pr_debug(Emulating opcode %d / %d\n, get_op(inst), get_xop(inst)); switch (get_op(inst)) { case OP_TRAP: @@ -275,7 +275,7 @@ int kvmppc_emulate_instruction(struct kvm_run *run, struct kvm_vcpu *vcpu) { u64 jd = get_tb() - vcpu-arch.dec_jiffies; kvmppc_set_gpr(vcpu, rt, vcpu-arch.dec - jd); - pr_debug(KERN_INFO mfDEC: %x - %llx = %lx\n, + pr_debug(mfDEC: %x - %llx = %lx\n, vcpu-arch.dec, jd, kvmppc_get_gpr(vcpu, rt)); break; diff --git a/arch/powerpc/sysdev/pmi.c b/arch/powerpc/sysdev/pmi.c index 24a0bb9..4260f36 100644 --- a/arch/powerpc/sysdev/pmi.c +++ b/arch/powerpc/sysdev/pmi.c @@ -114,7 +114,7 @@ static void pmi_notify_handlers(struct work_struct *work) spin_lock(data-handler_spinlock); list_for_each_entry(handler, data-handler, node) { - pr_debug(KERN_INFO pmi: notifying handler %p\n, handler); + pr_debug(pmi: notifying handler %p\n, handler); if (handler-type == data-msg.type) handler-handle_pmi_message(data-msg); } -- 1.7.3.rc1 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html