Re: [PATCH for-3.19] vhost/net: fix up num_buffers endian-ness
On Wed, Feb 04, 2015 at 01:59:42PM -0800, David Miller wrote: > From: "Michael S. Tsirkin" > Date: Tue, 3 Feb 2015 11:07:06 +0200 > > > In virtio 1.0 mode, when mergeable buffers are enabled on a big-endian > > host, num_buffers wasn't byte-swapped correctly, so large incoming > > packets got corrupted. > > > > To fix, fill it in within hdr - this also makes sure it gets > > the correct type. > > > > Signed-off-by: Michael S. Tsirkin > > Applied. FWIW, conflict with iov_iter patches is trivial; once it shows up in your kernel.org tree I can either rebase the series or just push #merge-candidate - whichever you prefer. Linus usually prefers the second variant, but then he seriously dislikes rebasing of any sort; I've no idea what your preferences are in that area... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 16/18] vhost: don't bother with copying iovec in handle_tx()
From: Al Viro just advance the msg.msg_iter and be done with that. Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Al Viro --- drivers/vhost/net.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 6906f76..d86cc9b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -336,7 +336,7 @@ static void handle_tx(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; - unsigned out, in, s; + unsigned out, in; int head; struct msghdr msg = { .msg_name = NULL, @@ -395,16 +395,17 @@ static void handle_tx(struct vhost_net *net) break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq->iov, nvq->hdr, hdr_size, out); len = iov_length(vq->iov, out); iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); + iov_iter_advance(&msg.msg_iter, hdr_size); /* Sanity check */ - if (!len) { + if (!iov_iter_count(&msg.msg_iter)) { vq_err(vq, "Unexpected header len for TX: " "%zd expected %zd\n", - iov_length(nvq->hdr, s), hdr_size); + len, hdr_size); break; } + len = iov_iter_count(&msg.msg_iter); zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN && (nvq->upend_idx + 1) % UIO_MAXIOV != -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
From: Al Viro Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Al Viro --- drivers/vhost/vhost.c | 6 -- include/linux/uio.h | 1 - lib/iovec.c | 25 - 3 files changed, 4 insertions(+), 28 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index cb807d0..2ee2826 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1125,6 +1125,7 @@ static int get_indirect(struct vhost_virtqueue *vq, struct vring_desc desc; unsigned int i = 0, count, found = 0; u32 len = vhost32_to_cpu(vq, indirect->len); + struct iov_iter from; int ret; /* Sanity check */ @@ -1142,6 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq, vq_err(vq, "Translation failure %d in indirect.\n", ret); return ret; } + iov_iter_init(&from, READ, vq->indirect, ret, len); /* We will use the result as an address to read from, so most * architectures only need a compiler barrier here. */ @@ -1164,8 +1166,8 @@ static int get_indirect(struct vhost_virtqueue *vq, i, count); return -EINVAL; } - if (unlikely(memcpy_fromiovec((unsigned char *)&desc, - vq->indirect, sizeof desc))) { + if (unlikely(copy_from_iter(&desc, sizeof(desc), &from) != +sizeof(desc))) { vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n", i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc); return -EINVAL; diff --git a/include/linux/uio.h b/include/linux/uio.h index 1c5e453..af3439f 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -135,7 +135,6 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); -int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, int offset, int len); int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata, diff --git a/lib/iovec.c b/lib/iovec.c index 2d99cb4..4a90875 100644 --- a/lib/iovec.c +++ b/lib/iovec.c @@ -3,31 +3,6 @@ #include /* - * Copy iovec to kernel. Returns -EFAULT on error. - * - * Note: this modifies the original iovec. - */ - -int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len) -{ - while (len > 0) { - if (iov->iov_len) { - int copy = min_t(unsigned int, len, iov->iov_len); - if (copy_from_user(kdata, iov->iov_base, copy)) - return -EFAULT; - len -= copy; - kdata += copy; - iov->iov_base += copy; - iov->iov_len -= copy; - } - iov++; - } - - return 0; -} -EXPORT_SYMBOL(memcpy_fromiovec); - -/* * Copy kernel to iovec. Returns -EFAULT on error. */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()
From: Al Viro it has just verified that it asks no more than the length of the first segment of iovec. And with that the last user of stuff in lib/iovec.c is gone. RIP. Cc: Michael S. Tsirkin Cc: Nicholas A. Bellinger Cc: kvm@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Al Viro --- drivers/vhost/scsi.c | 2 +- include/linux/uio.h | 2 -- lib/Makefile | 2 +- lib/iovec.c | 36 4 files changed, 2 insertions(+), 40 deletions(-) delete mode 100644 lib/iovec.c diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index d695b16..dc78d87 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1079,7 +1079,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) req_size, vq->iov[0].iov_len); break; } - ret = memcpy_fromiovecend(req, &vq->iov[0], 0, req_size); + ret = copy_from_user(req, vq->iov[0].iov_base, req_size); if (unlikely(ret)) { vq_err(vq, "Faulted on virtio_scsi_cmd_req\n"); break; diff --git a/include/linux/uio.h b/include/linux/uio.h index 02bd8a9..3e0cb4e 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -135,6 +135,4 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); -int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, - int offset, int len); #endif diff --git a/lib/Makefile b/lib/Makefile index 3c3b30b..1071d06 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -24,7 +24,7 @@ obj-y += lockref.o obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \ -gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \ +gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \ bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o obj-y += string_helpers.o diff --git a/lib/iovec.c b/lib/iovec.c deleted file mode 100644 index d8f17a9..000 --- a/lib/iovec.c +++ /dev/null @@ -1,36 +0,0 @@ -#include -#include -#include - -/* - * Copy iovec to kernel. Returns -EFAULT on error. - */ - -int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, - int offset, int len) -{ - /* No data? Done! */ - if (len == 0) - return 0; - - /* Skip over the finished iovecs */ - while (offset >= iov->iov_len) { - offset -= iov->iov_len; - iov++; - } - - while (len > 0) { - u8 __user *base = iov->iov_base + offset; - int copy = min_t(unsigned int, len, iov->iov_len - offset); - - offset = 0; - if (copy_from_user(kdata, base, copy)) - return -EFAULT; - len -= copy; - kdata += copy; - iov++; - } - - return 0; -} -EXPORT_SYMBOL(memcpy_fromiovecend); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()
From: Al Viro Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Cc: virtualizat...@lists.linux-foundation.org Signed-off-by: Al Viro --- drivers/vhost/net.c | 82 +++-- include/linux/uio.h | 3 -- lib/iovec.c | 26 - 3 files changed, 23 insertions(+), 88 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d86cc9b..e022cc4 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -84,10 +84,6 @@ struct vhost_net_ubuf_ref { struct vhost_net_virtqueue { struct vhost_virtqueue vq; - /* hdr is used to store the virtio header. -* Since each iovec has >= 1 byte length, we never need more than -* header length entries to store the header. */ - struct iovec hdr[sizeof(struct virtio_net_hdr_mrg_rxbuf)]; size_t vhost_hlen; size_t sock_hlen; /* vhost zerocopy support fields below: */ @@ -235,44 +231,6 @@ static bool vhost_sock_zcopy(struct socket *sock) sock_flag(sock->sk, SOCK_ZEROCOPY); } -/* Pop first len bytes from iovec. Return number of segments used. */ -static int move_iovec_hdr(struct iovec *from, struct iovec *to, - size_t len, int iov_count) -{ - int seg = 0; - size_t size; - - while (len && seg < iov_count) { - size = min(from->iov_len, len); - to->iov_base = from->iov_base; - to->iov_len = size; - from->iov_len -= size; - from->iov_base += size; - len -= size; - ++from; - ++to; - ++seg; - } - return seg; -} -/* Copy iovec entries for len bytes from iovec. */ -static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, - size_t len, int iovcount) -{ - int seg = 0; - size_t size; - - while (len && seg < iovcount) { - size = min(from->iov_len, len); - to->iov_base = from->iov_base; - to->iov_len = size; - len -= size; - ++from; - ++to; - ++seg; - } -} - /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -570,9 +528,9 @@ static void handle_rx(struct vhost_net *net) .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr_mrg_rxbuf hdr = { - .hdr.flags = 0, - .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr hdr = { + .flags = 0, + .gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t total_len = 0; int err, mergeable; @@ -580,6 +538,7 @@ static void handle_rx(struct vhost_net *net) size_t vhost_hlen, sock_hlen; size_t vhost_len, sock_len; struct socket *sock; + struct iov_iter fixup; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -624,14 +583,19 @@ static void handle_rx(struct vhost_net *net) break; } /* We don't need to be notified again. */ - if (unlikely((vhost_hlen))) - /* Skip header. TODO: support TSO. */ - move_iovec_hdr(vq->iov, nvq->hdr, vhost_hlen, in); - else - /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF: -* needed because recvmsg can modify msg_iov. */ - copy_iovec_hdr(vq->iov, nvq->hdr, sock_hlen, in); - iov_iter_init(&msg.msg_iter, READ, vq->iov, in, sock_len); + iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len); + fixup = msg.msg_iter; + if (unlikely((vhost_hlen))) { + /* We will supply the header ourselves +* TODO: support TSO. +*/ + iov_iter_advance(&msg.msg_iter, vhost_hlen); + } else { + /* It'll come from socket; we'll need to patch +* ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF +*/ + iov_iter_advance(&fixup, sizeof(hdr)); + } err = sock->ops->recvmsg(NULL, sock, &msg, sock_len, MSG_DONTWAIT | MSG_TRUNC); /* Userspace might have consumed the packet meanwhile: @@ -643,18 +607,18 @@ static void handle_rx(struct vhost_net *net) vhost_discard_vq_desc(vq, headcount);
Re: [PATCH-v3 5/9] vhost/scsi: Add ANY_LAYOUT vhost_virtqueue callback
On Tue, Feb 03, 2015 at 06:29:59AM +, Nicholas A. Bellinger wrote: > + * Copy over the virtio-scsi request header, which when > + * ANY_LAYOUT is enabled may span multiple iovecs, or a > + * single iovec may contain both the header + outgoing > + * WRITE payloads. > + * > + * copy_from_iter() is modifying the iovecs as copies over > + * req_size bytes into req, so the returned out_iter.iov[0] > + * will contain the correct start + offset of the outgoing > + * WRITE payload, if DMA_TO_DEVICE is set. It does no such thing. What it does, though, is changing out_iter so that subsequent copy_from_iter() will return the data you want. Note that out_iter.iov[0] will contain the correct _segment_ of that vector, with the data you want at out_iter.iov_offset bytes from the beginning of that segment. .iov may come to point to subsequent segments and .iov_offset keeps changing, but segments themselves are never changed. > + */ > + iov_iter_init(&out_iter, READ, &vq->iov[0], out, WRITE, please - as in "this is the source of some write, we'll be copying _from_ it". READ would be "destination of some read, we'll be copying into it". > + (data_direction == DMA_TO_DEVICE) ? > + req_size + exp_data_len : req_size); > + > + ret = copy_from_iter(req, req_size, &out_iter); ... > + /* > + * Determine start of T10_PI or data payload iovec in ANY_LAYOUT > + * mode based upon data_direction. > + * > + * For DMA_TO_DEVICE, this is iov_out from copy_from_iter() > + * with the already recalculated iov_base + iov_len. ITYM "this is out_iter, which is already pointing to the right place" AFAICS, the actual use is correct, it's just that the comments are confused. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()
On Tue, Feb 03, 2015 at 04:21:54PM +0100, Michael S. Tsirkin wrote: > > Hmm having second thoughts here. > > Will this modify the iov in vq->iov? > > If yes, how will recvmsg fill it? > > OK that was just me misunderstanding what the > function does. As it doesn't modify the iovec itself, > I think there's no issue, my ack stands. That, BTW, was the point of switching ->sendmsg() and ->recvmsg() to iov_iter primitives - not only do memcpy_toiovec() et.al. change the iovec, the way it's done was protocol-dependent, up to and including the things like "have sent 60 caller-supplied bytes, iovec modified with only 6 bytes consumed". What's worse, on TCP sendmsg we usually left iovec unchanged, but sometimes it got drained. Granted, that happened only after setsockopt() playing caller with TCP_REPAIR (i.e. checkpoint/restart stuff), but... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 15/18] vhost: switch vhost get_indirect() to iov_iter, kill memcpy_fromiovec()
From: Al Viro Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Signed-off-by: Al Viro --- drivers/vhost/vhost.c | 6 -- include/linux/uio.h | 1 - lib/iovec.c | 25 - 3 files changed, 4 insertions(+), 28 deletions(-) diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index cb807d0..2ee2826 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -1125,6 +1125,7 @@ static int get_indirect(struct vhost_virtqueue *vq, struct vring_desc desc; unsigned int i = 0, count, found = 0; u32 len = vhost32_to_cpu(vq, indirect->len); + struct iov_iter from; int ret; /* Sanity check */ @@ -1142,6 +1143,7 @@ static int get_indirect(struct vhost_virtqueue *vq, vq_err(vq, "Translation failure %d in indirect.\n", ret); return ret; } + iov_iter_init(&from, READ, vq->indirect, ret, len); /* We will use the result as an address to read from, so most * architectures only need a compiler barrier here. */ @@ -1164,8 +1166,8 @@ static int get_indirect(struct vhost_virtqueue *vq, i, count); return -EINVAL; } - if (unlikely(memcpy_fromiovec((unsigned char *)&desc, - vq->indirect, sizeof desc))) { + if (unlikely(copy_from_iter(&desc, sizeof(desc), &from) != +sizeof(desc))) { vq_err(vq, "Failed indirect descriptor: idx %d, %zx\n", i, (size_t)vhost64_to_cpu(vq, indirect->addr) + i * sizeof desc); return -EINVAL; diff --git a/include/linux/uio.h b/include/linux/uio.h index 1c5e453..af3439f 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -135,7 +135,6 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); -int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len); int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, int offset, int len); int memcpy_toiovecend(const struct iovec *v, unsigned char *kdata, diff --git a/lib/iovec.c b/lib/iovec.c index 2d99cb4..4a90875 100644 --- a/lib/iovec.c +++ b/lib/iovec.c @@ -3,31 +3,6 @@ #include /* - * Copy iovec to kernel. Returns -EFAULT on error. - * - * Note: this modifies the original iovec. - */ - -int memcpy_fromiovec(unsigned char *kdata, struct iovec *iov, int len) -{ - while (len > 0) { - if (iov->iov_len) { - int copy = min_t(unsigned int, len, iov->iov_len); - if (copy_from_user(kdata, iov->iov_base, copy)) - return -EFAULT; - len -= copy; - kdata += copy; - iov->iov_base += copy; - iov->iov_len -= copy; - } - iov++; - } - - return 0; -} -EXPORT_SYMBOL(memcpy_fromiovec); - -/* * Copy kernel to iovec. Returns -EFAULT on error. */ -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 18/18] vhost: vhost_scsi_handle_vq() should just use copy_from_user()
From: Al Viro it has just verified that it asks no more than the length of the first segment of iovec. And with that the last user of stuff in lib/iovec.c is gone. RIP. Cc: Michael S. Tsirkin Cc: Nicholas A. Bellinger Cc: kvm@vger.kernel.org Signed-off-by: Al Viro --- drivers/vhost/scsi.c | 2 +- include/linux/uio.h | 2 -- lib/Makefile | 2 +- lib/iovec.c | 36 4 files changed, 2 insertions(+), 40 deletions(-) delete mode 100644 lib/iovec.c diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index d695b16..dc78d87 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1079,7 +1079,7 @@ vhost_scsi_handle_vq(struct vhost_scsi *vs, struct vhost_virtqueue *vq) req_size, vq->iov[0].iov_len); break; } - ret = memcpy_fromiovecend(req, &vq->iov[0], 0, req_size); + ret = copy_from_user(req, vq->iov[0].iov_base, req_size); if (unlikely(ret)) { vq_err(vq, "Faulted on virtio_scsi_cmd_req\n"); break; diff --git a/include/linux/uio.h b/include/linux/uio.h index 02bd8a9..3e0cb4e 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -135,6 +135,4 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) size_t csum_and_copy_to_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); -int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, - int offset, int len); #endif diff --git a/lib/Makefile b/lib/Makefile index 3c3b30b..1071d06 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -24,7 +24,7 @@ obj-y += lockref.o obj-y += bcd.o div64.o sort.o parser.o halfmd4.o debug_locks.o random32.o \ bust_spinlocks.o hexdump.o kasprintf.o bitmap.o scatterlist.o \ -gcd.o lcm.o list_sort.o uuid.o flex_array.o iovec.o clz_ctz.o \ +gcd.o lcm.o list_sort.o uuid.o flex_array.o clz_ctz.o \ bsearch.o find_last_bit.o find_next_bit.o llist.o memweight.o kfifo.o \ percpu-refcount.o percpu_ida.o rhashtable.o reciprocal_div.o obj-y += string_helpers.o diff --git a/lib/iovec.c b/lib/iovec.c deleted file mode 100644 index d8f17a9..000 --- a/lib/iovec.c +++ /dev/null @@ -1,36 +0,0 @@ -#include -#include -#include - -/* - * Copy iovec to kernel. Returns -EFAULT on error. - */ - -int memcpy_fromiovecend(unsigned char *kdata, const struct iovec *iov, - int offset, int len) -{ - /* No data? Done! */ - if (len == 0) - return 0; - - /* Skip over the finished iovecs */ - while (offset >= iov->iov_len) { - offset -= iov->iov_len; - iov++; - } - - while (len > 0) { - u8 __user *base = iov->iov_base + offset; - int copy = min_t(unsigned int, len, iov->iov_len - offset); - - offset = 0; - if (copy_from_user(kdata, base, copy)) - return -EFAULT; - len -= copy; - kdata += copy; - iov++; - } - - return 0; -} -EXPORT_SYMBOL(memcpy_fromiovecend); -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 17/18] vhost: don't bother copying iovecs in handle_rx(), kill memcpy_toiovecend()
From: Al Viro Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Signed-off-by: Al Viro --- drivers/vhost/net.c | 79 ++--- include/linux/uio.h | 3 -- lib/iovec.c | 26 -- 3 files changed, 20 insertions(+), 88 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d86cc9b..73c0ebf 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -84,10 +84,6 @@ struct vhost_net_ubuf_ref { struct vhost_net_virtqueue { struct vhost_virtqueue vq; - /* hdr is used to store the virtio header. -* Since each iovec has >= 1 byte length, we never need more than -* header length entries to store the header. */ - struct iovec hdr[sizeof(struct virtio_net_hdr_mrg_rxbuf)]; size_t vhost_hlen; size_t sock_hlen; /* vhost zerocopy support fields below: */ @@ -235,44 +231,6 @@ static bool vhost_sock_zcopy(struct socket *sock) sock_flag(sock->sk, SOCK_ZEROCOPY); } -/* Pop first len bytes from iovec. Return number of segments used. */ -static int move_iovec_hdr(struct iovec *from, struct iovec *to, - size_t len, int iov_count) -{ - int seg = 0; - size_t size; - - while (len && seg < iov_count) { - size = min(from->iov_len, len); - to->iov_base = from->iov_base; - to->iov_len = size; - from->iov_len -= size; - from->iov_base += size; - len -= size; - ++from; - ++to; - ++seg; - } - return seg; -} -/* Copy iovec entries for len bytes from iovec. */ -static void copy_iovec_hdr(const struct iovec *from, struct iovec *to, - size_t len, int iovcount) -{ - int seg = 0; - size_t size; - - while (len && seg < iovcount) { - size = min(from->iov_len, len); - to->iov_base = from->iov_base; - to->iov_len = size; - len -= size; - ++from; - ++to; - ++seg; - } -} - /* In case of DMA done not in order in lower device driver for some reason. * upend_idx is used to track end of used idx, done_idx is used to track head * of used idx. Once lower device DMA done contiguously, we will signal KVM @@ -570,9 +528,9 @@ static void handle_rx(struct vhost_net *net) .msg_controllen = 0, .msg_flags = MSG_DONTWAIT, }; - struct virtio_net_hdr_mrg_rxbuf hdr = { - .hdr.flags = 0, - .hdr.gso_type = VIRTIO_NET_HDR_GSO_NONE + struct virtio_net_hdr hdr = { + .flags = 0, + .gso_type = VIRTIO_NET_HDR_GSO_NONE }; size_t total_len = 0; int err, mergeable; @@ -580,6 +538,7 @@ static void handle_rx(struct vhost_net *net) size_t vhost_hlen, sock_hlen; size_t vhost_len, sock_len; struct socket *sock; + struct iov_iter fixup; mutex_lock(&vq->mutex); sock = vq->private_data; @@ -624,14 +583,17 @@ static void handle_rx(struct vhost_net *net) break; } /* We don't need to be notified again. */ - if (unlikely((vhost_hlen))) - /* Skip header. TODO: support TSO. */ - move_iovec_hdr(vq->iov, nvq->hdr, vhost_hlen, in); - else - /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF: -* needed because recvmsg can modify msg_iov. */ - copy_iovec_hdr(vq->iov, nvq->hdr, sock_hlen, in); - iov_iter_init(&msg.msg_iter, READ, vq->iov, in, sock_len); + iov_iter_init(&msg.msg_iter, READ, vq->iov, in, vhost_len); + fixup = msg.msg_iter; + if (unlikely((vhost_hlen))) { + /* We will supply the header ourselves +* TODO: support TSO. */ + iov_iter_advance(&msg.msg_iter, vhost_hlen); + } else { + /* It'll come from socket; we'll need to patch +* ->num_buffers over if VIRTIO_NET_F_MRG_RXBUF */ + iov_iter_advance(&fixup, sizeof(hdr)); + } err = sock->ops->recvmsg(NULL, sock, &msg, sock_len, MSG_DONTWAIT | MSG_TRUNC); /* Userspace might have consumed the packet meanwhile: @@ -643,18 +605,17 @@ static void handle_rx(struct vhost_net *net) vhost_discard_vq_desc(vq, headcount); continue; } + /* Supply virtio_net_hdr if VHOST_NET_F_VIRT
[PATCH v2 16/18] vhost: don't bother with copying iovec in handle_tx()
From: Al Viro just advance the msg.msg_iter and be done with that. Cc: Michael S. Tsirkin Cc: kvm@vger.kernel.org Signed-off-by: Al Viro --- drivers/vhost/net.c | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 6906f76..d86cc9b 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -336,7 +336,7 @@ static void handle_tx(struct vhost_net *net) { struct vhost_net_virtqueue *nvq = &net->vqs[VHOST_NET_VQ_TX]; struct vhost_virtqueue *vq = &nvq->vq; - unsigned out, in, s; + unsigned out, in; int head; struct msghdr msg = { .msg_name = NULL, @@ -395,16 +395,17 @@ static void handle_tx(struct vhost_net *net) break; } /* Skip header. TODO: support TSO. */ - s = move_iovec_hdr(vq->iov, nvq->hdr, hdr_size, out); len = iov_length(vq->iov, out); iov_iter_init(&msg.msg_iter, WRITE, vq->iov, out, len); + iov_iter_advance(&msg.msg_iter, hdr_size); /* Sanity check */ - if (!len) { + if (!iov_iter_count(&msg.msg_iter)) { vq_err(vq, "Unexpected header len for TX: " "%zd expected %zd\n", - iov_length(nvq->hdr, s), hdr_size); + len, hdr_size); break; } + len = iov_iter_count(&msg.msg_iter); zcopy_used = zcopy && len >= VHOST_GOODCOPY_LEN && (nvq->upend_idx + 1) % UIO_MAXIOV != -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2 01/11] lib/iovec: Add memcpy_fromiovec_out library function
On Mon, Feb 02, 2015 at 04:44:12AM +, Al Viro wrote: > On Mon, Feb 02, 2015 at 04:06:24AM +, Nicholas A. Bellinger wrote: > > From: Nicholas Bellinger > > > > This patch adds a new memcpy_fromiovec_out() library function which modifies > > the passed *iov following memcpy_fromiovec(), but also returns the next > > current > > iovec pointer via **iov_out. > > > > This is useful for vhost ANY_LAYOUT support when guests are allowed to > > generate > > incoming virtio request headers combined with subsequent SGL payloads into a > > single iovec. > > Please, don't. Just use copy_from_iter(); you are open-coding an uglier > variant of such. PS: see vfs.git#for-davem (or postings on netdev with the same stuff). I really hope to bury memcpy_...iovec...() crap for good; please, don't reintroduce more of it. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH-v2 01/11] lib/iovec: Add memcpy_fromiovec_out library function
On Mon, Feb 02, 2015 at 04:06:24AM +, Nicholas A. Bellinger wrote: > From: Nicholas Bellinger > > This patch adds a new memcpy_fromiovec_out() library function which modifies > the passed *iov following memcpy_fromiovec(), but also returns the next > current > iovec pointer via **iov_out. > > This is useful for vhost ANY_LAYOUT support when guests are allowed to > generate > incoming virtio request headers combined with subsequent SGL payloads into a > single iovec. Please, don't. Just use copy_from_iter(); you are open-coding an uglier variant of such. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio-pci: Use fdget() rather than eventfd_fget()
On Tue, Aug 20, 2013 at 01:18:07PM -0600, Alex Williamson wrote: > eventfd_fget() tests to see whether the file is an eventfd file, which > we then immediately pass to eventfd_ctx_fileget(), which again tests > whether the file is an eventfd file. Simplify slightly by using > fdget() so that we only test that we're looking at an eventfd once. > fget() could also be used, but fdget() makes use of fget_light() for > another slight optimization. Umm... > @@ -210,8 +210,8 @@ fail: > if (ctx && !IS_ERR(ctx)) > eventfd_ctx_put(ctx); > > - if (file && !IS_ERR(file)) > - fput(file); > + if (irqfd.file) > + fdput(irqfd); > > kfree(virqfd); IMO it's a bad style; you have three failure exits leading here, and those ifs are nothing but "how far did we get before we'd failed". fail3: eventfd_ctx_put(ctx); fail2: fdput(irqfd); fail1: kfree(virqfd); is much easier to analyse. It's a very common pattern and IME it's more robust than this kind of "flexibility"... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vfio-pci: PCI hot reset interface
On Wed, Aug 14, 2013 at 04:42:14PM -0600, Bjorn Helgaas wrote: > [+cc Al, linux-fsdevel for fdget/fdput usage] fdget/fdput use looks sane, the only thing is that I would rather have an explicit include of linux/file.h instead of relying upon linux/eventfd.h pulling it. Incidentally, there are only 5 files that include the latter without an explicit include of the former - drivers/vfio/pci/vfio_pci.c, drivers/vhost/scsi.c, kernel/cgroup.c, mm/memcontrol.c and mm/vmpressure.c. And only kernel/cgroup.c (and, with this patch, vfio_pci.c) really wants anything from linux/file.h, so I'd rather kill that indirect include in eventfd.h and slapped an explicit include of file.h in these two files... BTW, most of the eventfd_fget() users might as well be using fget() (or fdget(), for that matter). They tend to be immediately followed by eventfd_ctx_fileget(), which repeats the "is that an eventfd file?" check anyway. Completely untested patch below does that to kernel/cgroup.c; Tejun, Davide - do you have any objections against the following? Kill indirect include of file.h from eventfd.h, use fdget() in cgroup.c kernel/cgroup.c is the only place in the tree that relies on eventfd.h pulling file.h; move that include there. Switch from eventfd_fget()/fput() to fdget()/fdput(), while we are at it - eventfd_ctx_fileget() will fail on non-eventfd descriptors just fine, no need to do that check twice... Signed-off-by: Al Viro --- diff --git a/include/linux/eventfd.h b/include/linux/eventfd.h index cf5d2af..ff0b981 100644 --- a/include/linux/eventfd.h +++ b/include/linux/eventfd.h @@ -9,7 +9,6 @@ #define _LINUX_EVENTFD_H #include -#include #include /* @@ -26,6 +25,8 @@ #define EFD_SHARED_FCNTL_FLAGS (O_CLOEXEC | O_NONBLOCK) #define EFD_FLAGS_SET (EFD_SHARED_FCNTL_FLAGS | EFD_SEMAPHORE) +struct file; + #ifdef CONFIG_EVENTFD struct file *eventfd_file_create(unsigned int count, int flags); diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 781845a..f88ecaf 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -60,6 +60,7 @@ #include #include /* used in cgroup_attach_task */ #include +#include #include @@ -3969,8 +3970,8 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft, struct cgroup_event *event = NULL; struct cgroup *cgrp_cfile; unsigned int efd, cfd; - struct file *efile = NULL; - struct file *cfile = NULL; + struct fd efile; + struct fd cfile; char *endp; int ret; @@ -3993,31 +3994,31 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft, init_waitqueue_func_entry(&event->wait, cgroup_event_wake); INIT_WORK(&event->remove, cgroup_event_remove); - efile = eventfd_fget(efd); - if (IS_ERR(efile)) { - ret = PTR_ERR(efile); - goto fail; + efile = fdget(efd); + if (!efile.file) { + ret = -EBADF; + goto fail1; } - event->eventfd = eventfd_ctx_fileget(efile); + event->eventfd = eventfd_ctx_fileget(efile.file); if (IS_ERR(event->eventfd)) { ret = PTR_ERR(event->eventfd); - goto fail; + goto fail2; } - cfile = fget(cfd); - if (!cfile) { + cfile = fdget(cfd); + if (!cfile.file) { ret = -EBADF; - goto fail; + goto fail3; } /* the process need read permission on control file */ /* AV: shouldn't we check that it's been opened for read instead? */ - ret = inode_permission(file_inode(cfile), MAY_READ); + ret = inode_permission(file_inode(cfile.file), MAY_READ); if (ret < 0) goto fail; - event->cft = __file_cft(cfile); + event->cft = __file_cft(cfile.file); if (IS_ERR(event->cft)) { ret = PTR_ERR(event->cft); goto fail; @@ -4027,7 +4028,7 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft, * The file to be monitored must be in the same cgroup as * cgroup.event_control is. */ - cgrp_cfile = __d_cgrp(cfile->f_dentry->d_parent); + cgrp_cfile = __d_cgrp(cfile.file->f_dentry->d_parent); if (cgrp_cfile != cgrp) { ret = -EINVAL; goto fail; @@ -4043,7 +4044,7 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft, if (ret) goto fail; - efile->f_op->poll(efile, &event->pt); + efile.file->f_op->poll(efile.file, &event->pt); /* * Events should be removed after rmdir of cgroup directory, but before @@ -4056,21 +4057,18 @@ static int cgroup_write_event_control(struct cgroup *cgrp, struct cftype *cft, list_add(&ev
Re: [patch 1/2] vhost: potential integer overflows
On Mon, Oct 11, 2010 at 07:22:57PM +0200, Dan Carpenter wrote: > I did an audit for potential integer overflows of values which get passed > to access_ok() and here are the results. FWIW, UINT_MAX is wrong here. What you want is maximal size_t value. > Signed-off-by: Dan Carpenter > > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index dd3d6f7..c2aa12c 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -429,6 +429,14 @@ static int vq_access_ok(unsigned int num, > struct vring_avail __user *avail, > struct vring_used __user *used) > { > + > + if (num > UINT_MAX / sizeof *desc) > + return 0; > + if (num > UINT_MAX / sizeof *avail->ring - sizeof *avail) > + return 0; > + if (num > UINT_MAX / sizeof *used->ring - sizeof *used) > + return 0; > + > return access_ok(VERIFY_READ, desc, num * sizeof *desc) && > access_ok(VERIFY_READ, avail, >sizeof *avail + num * sizeof *avail->ring) && > @@ -447,6 +455,9 @@ int vhost_log_access_ok(struct vhost_dev *dev) > /* Caller should have vq mutex and device mutex */ > static int vq_log_access_ok(struct vhost_virtqueue *vq, void __user > *log_base) > { > + if (vq->num > UINT_MAX / sizeof *vq->used->ring - sizeof *vq->used) > + return 0; > + > return vq_memory_access_ok(log_base, vq->dev->memory, > vhost_has_feature(vq->dev, VHOST_F_LOG_ALL)) && > (!vq->log_used || log_access_ok(log_base, vq->log_addr, > @@ -606,12 +617,17 @@ static long vhost_set_vring(struct vhost_dev *d, int > ioctl, void __user *argp) > } > > /* Also validate log access for used ring if enabled. */ > - if ((a.flags & (0x1 << VHOST_VRING_F_LOG)) && > - !log_access_ok(vq->log_base, a.log_guest_addr, > + if (a.flags & (0x1 << VHOST_VRING_F_LOG)) { > + if (vq->num > UINT_MAX / sizeof *vq->used->ring > - sizeof *vq->used) { > + r = -EINVAL; > + break; > + } > + if (!log_access_ok(vq->log_base, > a.log_guest_addr, > sizeof *vq->used + > vq->num * sizeof *vq->used->ring)) { > - r = -EINVAL; > - break; > + r = -EINVAL; > + break; > + } > } > } > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v4 1/2] eventfd: export eventfd interfaces for module use
On Mon, May 04, 2009 at 01:57:45PM -0400, Gregory Haskins wrote: > @@ -56,6 +57,7 @@ int eventfd_signal(struct file *file, int n) > > return n; > } > +EXPORT_SYMBOL_GPL(eventfd_signal); perhaps, but... > @@ -197,6 +199,7 @@ struct file *eventfd_fget(int fd) > > return file; > } > +EXPORT_SYMBOL_GPL(eventfd_fget); this one looks very odd. Could you show legitimate users? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface
On Sun, May 03, 2009 at 01:11:39PM -0700, Davide Libenzi wrote: > On Sun, 3 May 2009, Al Viro wrote: > > > IOW, the sane solution would be to export something that returns your > > struct file *. And stop playing with fd completely. > > This builds but it's not tested at all. > > - Make all the work of the old anon_inode_getfd(), done by a new > anon_inode_getfile(), with anon_inode_getfd() using its services > > - Make all the work done by sys_eventfd(), done by a new > eventfd_file_create() (which in turn uses anon_inode_getfile()), with > sys_eventfd() using its services > > IRQfd can use eventfd_file_create(), fget(), get_unused_fd_flags() and > fd_install() just before returning. > Is that what you had in mind? More or less, but I'd like to see the irqfd side of that... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
file descriptor abuses
On Sun, May 03, 2009 at 08:01:36PM +0100, Al Viro wrote: > General rules: > * descriptor you've generated is fit only for return to userland; > * descriptor you've got from userland is fit only for *single* > fget() or equivalent, unless you are one of the core syscalls manipulating > the descriptor table itself (dup2, etc.) > * once file is installed in descriptor table, you'd better be past > the last failure exit; sys_close() on cleanup path is not acceptable. > That's what reserving descriptors is for. > > IOW, the sane solution would be to export something that returns your > struct file *. And stop playing with fd completely. Speaking of which, quick look through fget() callers shows this turd: static int p9_socket_open(struct p9_client *client, struct socket *csocket) { int fd, ret; fd = sock_map_fd(csocket, 0); . ret = p9_fd_open(client, fd, fd); if (ret < 0) { P9_EPRINTK(KERN_ERR, "p9_socket_open: failed to open fd\n"); sockfd_put(csocket); return ret; } . return 0; } where p9_fd_open() calls fget() on its 2nd and 3rd arguments. Which does worse than just a leak, AFAICT - on failure exit it leaves a dangling pointer from descriptor table. On the almost unrelated note, we have (in drivers/sneak_in^Wstaging/usbip) sockfd_to_socket(), with all callers leaking struct file, AFAICS. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface
On Sun, May 03, 2009 at 11:07:26AM -0700, Davide Libenzi wrote: > On Sun, 3 May 2009, Al Viro wrote: > > On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > > > + /* We re-use eventfd for irqfd */ > > > + fd = sys_eventfd2(0, 0); > > > + if (fd < 0) { > > > + ret = fd; > > > + goto fail; > > > + } > > > + > > > + /* We maintain a reference to eventfd for the irqfd lifetime */ > > > + file = eventfd_fget(fd); > > > + if (IS_ERR(file)) { > > > + ret = PTR_ERR(file); > > > + goto fail; > > > + } > > > + > > > + irqfd->file = file; > > > > This is just plain wrong. You have no promise whatsoever that caller of > > that sucker won't race with e.g. dup2(). IOW, you can't assume that > > file will be of the expected kind. > > The eventfd_fget() checks for the file_operations pointer, before > returning the file*, and fails if the fd in not an eventfd. Or you have > other concerns? OK, but... it's still wrong. Descriptor numbers are purely for interaction with userland; using them that way violates very general race-prevention rules, even if you do paper over the worst of consequences with check in eventfd_fget(). General rules: * descriptor you've generated is fit only for return to userland; * descriptor you've got from userland is fit only for *single* fget() or equivalent, unless you are one of the core syscalls manipulating the descriptor table itself (dup2, etc.) * once file is installed in descriptor table, you'd better be past the last failure exit; sys_close() on cleanup path is not acceptable. That's what reserving descriptors is for. IOW, the sane solution would be to export something that returns your struct file *. And stop playing with fd completely. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 2/2] kvm: add support for irqfd via eventfd-notification interface
On Mon, Apr 27, 2009 at 02:33:34PM -0400, Gregory Haskins wrote: > + /* We re-use eventfd for irqfd */ > + fd = sys_eventfd2(0, 0); > + if (fd < 0) { > + ret = fd; > + goto fail; > + } > + > + /* We maintain a reference to eventfd for the irqfd lifetime */ > + file = eventfd_fget(fd); > + if (IS_ERR(file)) { > + ret = PTR_ERR(file); > + goto fail; > + } > + > + irqfd->file = file; This is just plain wrong. You have no promise whatsoever that caller of that sucker won't race with e.g. dup2(). IOW, you can't assume that file will be of the expected kind. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html