Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
* Masami Hiramatsu wrote: > (2013/11/21 2:36), Frank Ch. Eigler wrote: [ ... ] > > one needs to resort to something like: > > > > # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do > >perf probe $symbol > > done > > > > then wait for a few hours for that to finish. Then, or while the loop > > is still running, run > > > > # perf record -e 'probe:*' -aR sleep 1 > > > > to take a kernel down. > > Um, indeed, current blacklist is not perfect. [...] Then it needs to be fixed ASAP! > [...] As I reported in this series, I've found 2 more patterns. I > guess there still have some others. > > But anyway, I don't think it is good to fix all such bugs in this > series. Fixing these bugs is far more important than any cleanup work. We can apply the fixes together with your cleanup series to make it all simpler, but the bug fixing absolutely needs to happen right now. Thanks, Ingo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
On Wed, Nov 20, 2013 at 04:22:00PM +0100, Heinz Graalfs wrote: > Hi, > > when an active virtio block device is hot-unplugged from a KVM guest, running > affected guest user applications are not aware of any errors that occur due > to the lost device. This patch-set adds code to avoid further request queueing > when a lost block device is detected, resulting in appropriate error info. > > On System z there exists no handshake mechanism between host and guest > when a device is hot-unplugged. The device is removed and no further I/O > is possible. > > When an online channel device disappears on System z the kernel's CIO layer > informs the driver (virtio_ccw) about the lost device. > > It's just the block device drivers that care to provide a notify > callback. > > Here are some more error details: > > For a particular block device virtio's request function virtblk_request() > is called by the block layer to queue requests to be handled by the host. > In case of a lost device requests can still be queued, but an appropriate > subsequent host kick usually fails. This leads to situations where no error > feedback is shown. > > In order to prevent request queueing for lost devices appropriate settings > in the block layer should be made. Exploiting System z's CIO notify handler > callback, and adding a corresponding new virtio_driver notify() handler to > 'inform' the block layer, solve this task. > > Patch 1 adds an optional notify() callback to virtio_driver. > > Patch 2 adds a new notify() callback for the virtio_blk driver. When called > for a lost device settings are made to prevent future request queueing. > > Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to > pass > on the lost device info to virtio's backend driver virtio_blk. Question: I guess remove callback is invoked eventually? Could you please clarify why isn't this sufficient? > Heinz Graalfs (3): > virtio: add notify() callback to virtio_driver > virtio_blk: add virtblk_notify() as virtio_driver's notify() callback > virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification > > drivers/block/virtio_blk.c| 14 ++ > drivers/s390/kvm/virtio_ccw.c | 14 -- > drivers/virtio/virtio.c | 8 > include/linux/virtio.h| 10 ++ > 4 files changed, 44 insertions(+), 2 deletions(-) > > -- > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver
On Wed, Nov 20, 2013 at 04:22:01PM +0100, Heinz Graalfs wrote: > Add an optional notify() callback to virtio_driver. A backend > driver can provide this callback to perform actions for a lost > device. > > notify() event values are inherited from virtio_ccw's notify() > callback. We might want to support even more of them lateron. > > notify() return values are defined in include/linux/notifier.h. > > Signed-off-by: Heinz Graalfs > --- > drivers/virtio/virtio.c | 8 > include/linux/virtio.h | 10 ++ > 2 files changed, 18 insertions(+) > > diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c > index ee59b74..a09abb4 100644 > --- a/drivers/virtio/virtio.c > +++ b/drivers/virtio/virtio.c > @@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver > *driver) > } > EXPORT_SYMBOL_GPL(unregister_virtio_driver); > > +int notify_virtio_device(struct virtio_device *vdev, int event) > +{ > + struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver); > + > + return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE; Also pls include linux/notifier.h for this value. > +} > +EXPORT_SYMBOL_GPL(notify_virtio_device); > + > int register_virtio_device(struct virtio_device *dev) > { > int err; > diff --git a/include/linux/virtio.h b/include/linux/virtio.h > index f15f6e7..da18e9a 100644 > --- a/include/linux/virtio.h > +++ b/include/linux/virtio.h > @@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev); > void unregister_virtio_device(struct virtio_device *dev); > > /** > + * notify event values > + * @VDEV_GONE: device gone > + */ > +enum { > + VDEV_GONE = 1, Seems a bit short, can lead to namespace pollution. Let's rename to VIRTIO_DEVICE_GONE ? > +}; > +int notify_virtio_device(struct virtio_device *dev, int event); > + > +/** > * virtio_driver - operations for a virtio I/O driver > * @driver: underlying device driver (populate name and owner). > * @id_table: the ids serviced by this driver. > @@ -129,6 +138,7 @@ struct virtio_driver { > void (*scan)(struct virtio_device *dev); > void (*remove)(struct virtio_device *dev); > void (*config_changed)(struct virtio_device *dev); > + int (*notify)(struct virtio_device *dev, int event); > #ifdef CONFIG_PM > int (*freeze)(struct virtio_device *dev); > int (*restore)(struct virtio_device *dev); > -- > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote: > Add virtblk_notify() as virtio_driver's notify() callback. > > When a transport driver is notified that a device disappeared it > should invoke this callback to prevent further request queueing. > > Subsequent block layer calls of virtio_blk's request function will > fail, resulting in appropriate I/O errors. > > Signed-off-by: Heinz Graalfs > --- > drivers/block/virtio_blk.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 2d43be4..7fc1d62 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev) > ida_simple_remove(&vd_index_ida, index); > } > > +static int virtblk_notify(struct virtio_device *vdev, int event) > +{ > + struct virtio_blk *vblk = vdev->priv; > + > + if (event == VDEV_GONE) { > + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue); > + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue); > + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES, > + vblk->disk->queue); > + } > + return NOTIFY_DONE; Also pls include linux/notifier.h for this value. > +} > + > #ifdef CONFIG_PM > static int virtblk_freeze(struct virtio_device *vdev) > { > @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = { > .probe = virtblk_probe, > .remove = virtblk_remove, > .config_changed = virtblk_config_changed, > + .notify = virtblk_notify, > #ifdef CONFIG_PM > .freeze = virtblk_freeze, > .restore= virtblk_restore, > -- > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
On Wed, Nov 20, 2013 at 04:22:02PM +0100, Heinz Graalfs wrote: > Add virtblk_notify() as virtio_driver's notify() callback. > > When a transport driver is notified that a device disappeared it > should invoke this callback to prevent further request queueing. > > Subsequent block layer calls of virtio_blk's request function will > fail, resulting in appropriate I/O errors. > > Signed-off-by: Heinz Graalfs > --- > drivers/block/virtio_blk.c | 14 ++ > 1 file changed, 14 insertions(+) > > diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c > index 2d43be4..7fc1d62 100644 > --- a/drivers/block/virtio_blk.c > +++ b/drivers/block/virtio_blk.c > @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev) > ida_simple_remove(&vd_index_ida, index); > } > > +static int virtblk_notify(struct virtio_device *vdev, int event) > +{ > + struct virtio_blk *vblk = vdev->priv; > + > + if (event == VDEV_GONE) { > + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue); > + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue); > + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES, > + vblk->disk->queue); > + } > + return NOTIFY_DONE; > +} > + But what serializes with the block layer? And is unlocked really safe here? Don't we need to take the queue lock? > #ifdef CONFIG_PM > static int virtblk_freeze(struct virtio_device *vdev) > { > @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = { > .probe = virtblk_probe, > .remove = virtblk_remove, > .config_changed = virtblk_config_changed, > + .notify = virtblk_notify, > #ifdef CONFIG_PM > .freeze = virtblk_freeze, > .restore= virtblk_restore, > -- > 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC 1/3] virtio: add notify() callback to virtio_driver
Heinz Graalfs writes: > Add an optional notify() callback to virtio_driver. A backend > driver can provide this callback to perform actions for a lost > device. > > notify() event values are inherited from virtio_ccw's notify() > callback. We might want to support even more of them lateron. > > notify() return values are defined in include/linux/notifier.h. > > Signed-off-by: Heinz Graalfs These patches seem sensible. I've applied them in my pending queue, but it'd be nice to have some feedback on the virtio_blk.c patch. Cheers, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers
On 11/20/2013 10:20 PM, Michael S. Tsirkin wrote: > On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote: >> >> - 原始邮件 - >>> On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: When mergeable buffer were used, we only put the first page buf leave the rest of buffers in the virt queue. This will cause the driver could not get the correct head buffer any more. Fix this by dropping the rest of buffers for this packet. The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 (virtio_net: Defer skb allocation in receive path). Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Michael Dalton Cc: Eric Dumazet Cc: Shirley Ma Signed-off-by: Jason Wang >>> Just to clarify my previous comment: it was not about the >>> idea of adding drop_mergeable_buffer - rather, I think that >>> adding knowledge about mergeable buffers into page_to_skb creates an >>> ugly internal API. >>> >>> Let's move the call to page_to_skb within receive_mergeable instead: >>> it's also nice that int offset = buf - page_address(page) logic >>> is not spread around like it was. >>> >>> Also, it's not nice that we ignore length errors when we drop >>> packets because of OOM. >>> >>> So I came up with the following - it seems to work but I didn't >>> stress test yet. >> I've no objection on this. But I've rather like my small and direct patch >> to be applied to -net first. It has lower risk and was much more easier to >> be backported to stable trees. Then we can do the re-factor like this in >> net-next. > It makes the interfaces too messy. Do you mean receive_mergeable() should call page_to_skb()? It has been there several years. And even with your changes, for big and small packets, page_to_skb() were still called directly receive_buf() and the error handling were done there. > We are not in code freeze in -net - > only feature freeze, so no reason to make code like spagetty, > and it's only 25 lines changed as compared to 40. > It's not a huge refactoring. The problem is your patch mixes several bugs fixing with re-factoring, only one of the bug fixing was really needed for stable. At least we should make incremental patches on this. > > It's just as easy to backport my patch too. > You just drop the goto in the new code path we added. It may not be so easy for the people who are not familiar with virtio-net and the s/skb->dev/dev/g changes looks irrelevant here. > Let me show you (untested) - your patch is not smaller. > > Signed-off-by: Michael S. Tsirkin > > > > commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb > Author: Michael S. Tsirkin > Date: Wed Nov 20 12:44:14 2013 +0200 > > virtio_net: fix resource leak on alloc failure > > virtio net got confused, started dropping packets. > > Signed-off-by: Michael S. Tsirkin > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 9fbdfcd..df4b9d0 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue > *rq, > return skb; > } > > -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) > +static struct sk_buff *receive_mergeable(struct net_device *dev, > + struct receive_queue *rq, > + struct page *page, > + unsigned int len) > { > - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb); > - struct page *page; > - int num_buf, i, len; > + struct skb_vnet_hdr *hdr = page_address(buf); > + int num_buf = hdr->mhdr.num_buffers; > + struct sk_buff *skb = page_to_skb(rq, page, len); > + int i; > + > + skb = page_to_skb(rq, page, len); > + > + if (unlikely(!skb)) > + goto err_skb; > + > > - num_buf = hdr->mhdr.num_buffers; > while (--num_buf) { > i = skb_shinfo(skb)->nr_frags; > if (i >= MAX_SKB_FRAGS) { > @@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, > struct sk_buff *skb) > } > page = virtqueue_get_buf(rq->vq, &len); > if (!page) { > - pr_debug("%s: rx error: %d buffers missing\n", > - skb->dev->name, hdr->mhdr.num_buffers); > - skb->dev->stats.rx_length_errors++; > - return -EINVAL; > + pr_debug("%s: rx error: %d buffers %d missing\n", > + dev->name, hdr->mhdr.num_buffers, num_buf); > + dev->stats.rx_length_errors++; > + goto err_buf; > } > > if (len > PAGE_SIZE) > @@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, > struct sk_buff *skb) > > --rq->num; > } > - return 0; > + return skb
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
(2013/11/21 2:36), Frank Ch. Eigler wrote: > Hi - > >>> Does this new blacklist cover enough that the kernel now survives a >>> broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? >> >> That's generally the purpose of the annotations - if it doesn't then >> that's a bug. > > AFAIK, no kernel since kprobes was introduced has ever stood up to > that test. perf probe lacks the wildcarding powers of systemtap, so > one needs to resort to something like: > > # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do >perf probe $symbol > done > > then wait for a few hours for that to finish. Then, or while the loop > is still running, run > > # perf record -e 'probe:*' -aR sleep 1 > > to take a kernel down. Um, indeed, current blacklist is not perfect. As I reported in this series, I've found 2 more patterns. I guess there still have some others. But anyway, I don't think it is good to fix all such bugs in this series. This is just the first step to do it. :) Thank you, -- Masami HIRAMATSU IT Management Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
On 11/20/2013 09:56 AM, Steven Rostedt wrote: > On Wed, 20 Nov 2013 12:36:00 -0500 > "Frank Ch. Eigler" wrote: > >> Hi - >> Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? >>> >>> That's generally the purpose of the annotations - if it doesn't then >>> that's a bug. >> >> AFAIK, no kernel since kprobes was introduced has ever stood up to >> that test. perf probe lacks the wildcarding powers of systemtap, so >> one needs to resort to something like: >> >> # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do >>perf probe $symbol >> done > > I'm curious to why one would do that. IIUC, perf now has function > tracing support. Then consider something like probing all inline "call" sites, which will be sprinkled in the middle where ftrace doesn't apply. The point is not whether there's an alternative - kprobes really ought to be wholly safe regardless. Slow, if you did such broad probing, sure, but still safe. And a real use-case probably wouldn't probe *all* functions/inlines, but it illustrates that there are at least a few in the full set that don't behave well. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
Hi - > > Does this new blacklist cover enough that the kernel now survives a > > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? > > That's generally the purpose of the annotations - if it doesn't then > that's a bug. AFAIK, no kernel since kprobes was introduced has ever stood up to that test. perf probe lacks the wildcarding powers of systemtap, so one needs to resort to something like: # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do perf probe $symbol done then wait for a few hours for that to finish. Then, or while the loop is still running, run # perf record -e 'probe:*' -aR sleep 1 to take a kernel down. - FChE ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
masami.hiramatsu.pt wrote: > [...] This series also includes a change which prohibits probing on > the address in .entry.text because the code is used for very > low-level sensitive interrupt/syscall entries. Probing such code may > cause unexpected result (actually most of that area is already in > the kprobe blacklist). So I've decide to prohibit probing all of > them. [...] Does this new blacklist cover enough that the kernel now survives a broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? - FChE ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
On Wed, 20 Nov 2013 12:36:00 -0500 "Frank Ch. Eigler" wrote: > Hi - > > > > Does this new blacklist cover enough that the kernel now survives a > > > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? > > > > That's generally the purpose of the annotations - if it doesn't then > > that's a bug. > > AFAIK, no kernel since kprobes was introduced has ever stood up to > that test. perf probe lacks the wildcarding powers of systemtap, so > one needs to resort to something like: > > # cat /proc/kallsyms | grep ' [tT] ' | while read addr type symbol; do >perf probe $symbol > done I'm curious to why one would do that. IIUC, perf now has function tracing support. -- Steve > > then wait for a few hours for that to finish. Then, or while the loop > is still running, run > > # perf record -e 'probe:*' -aR sleep 1 > > to take a kernel down. > > > - FChE ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Wed, Nov 20, 2013 at 08:14:21AM -0800, Eric Dumazet wrote: > On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote: > > > Hmm some kind of disconnect here. > > I got you rmanagement about bufferbloat. > > > > What I am saying is that maybe we should drop packets more > > aggressively: when we drop one packet of a flow, why not > > drop everything that's queued and is for the same flow? > > I really hope your TCP flows use SACK ;) > > Please read the rfc 2018 introduction for details. > > If a packet is dropped, it doesn't mean following packets _have_ to be > dropped. Got it, thanks. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Wed, 2013-11-20 at 18:06 +0200, Michael S. Tsirkin wrote: > Hmm some kind of disconnect here. > I got you rmanagement about bufferbloat. > > What I am saying is that maybe we should drop packets more > aggressively: when we drop one packet of a flow, why not > drop everything that's queued and is for the same flow? I really hope your TCP flows use SACK ;) Please read the rfc 2018 introduction for details. If a packet is dropped, it doesn't mean following packets _have_ to be dropped. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Wed, Nov 20, 2013 at 07:16:33AM -0800, Eric Dumazet wrote: > On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote: > > On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote: > > > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: > > > > > > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a > > > > it didn't drop packets received from host as far as I can tell. > > > > virtio is more like a pipe than a real NIC in this respect. > > > > > > Prior/after to this patch, you were not posting buffers, so if packets > > > were received on a physical NIC, you were dropping the packets anyway. > > > > > > It makes no difference at all, adding a cushion might make you feel > > > better, but its really not worth it. > > > > > > Under memory stress, it makes better sense to drop a super big GRO > > > packet (The one needing frag_list extension ...) > > > > > > It gives a better signal to the sender to reduce its pressure, and gives > > > opportunity to free more of your memory. > > > > > > > OK, but in that case one wonders whether we should do more to free memory? > > > > E.g. imagine that we dropped a packet of a specific TCP flow > > because we couldn't allocate a new packet. > > > > What happens now is that the old packet is freed as well. > > > > So quite likely the next packet in queue will get processed > > since it will reuse the memory we have just freed. > > > > The next packet and the next after it etc all will have to go through > > the net stack until they get at the socket and are dropped then > > because we missed a segment. Even worse, GRO gets disabled so the load > > on receiver goes up instead of down. > > > > Sounds like a problem doesn't it? > > I see no problem at all. GRO is a hint for high rates (and obviously > when there is enough memory) > > > > > GRO actually detects it's the same flow and can see packet is > > out of sequence. Why doesn't it drop the packet then? > > Alternatively, we could (for example using the pre-allocated skb > > like I suggested) notify GRO that it should start dropping packets > > of this flow. > > > > What do you think? > > > > I think we disagree a lot on memory management on networking stacks. > > We did a lot of work in TCP stack and Qdisc layers to lower memory > pressure (and bufferbloat), an you seem to try hard to introduce yet > another layer of buffer bloat in virtio_net. > > So add whatever you want to proudly state to your management : > > "Look how smart we are : we drop no packets in our layer" > Hmm some kind of disconnect here. I got you rmanagement about bufferbloat. What I am saying is that maybe we should drop packets more aggressively: when we drop one packet of a flow, why not drop everything that's queued and is for the same flow? ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH -tip v3 00/23] kprobes: introduce NOKPROBE_SYMBOL() and general cleaning of kprobe blacklist
* Frank Ch. Eigler wrote: > masami.hiramatsu.pt wrote: > > > [...] This series also includes a change which prohibits probing > > on the address in .entry.text because the code is used for very > > low-level sensitive interrupt/syscall entries. Probing such code > > may cause unexpected result (actually most of that area is already > > in the kprobe blacklist). So I've decide to prohibit probing all > > of them. [...] > > Does this new blacklist cover enough that the kernel now survives a > broadly wildcarded perf-probe, e.g. over e.g. all of its kallsyms? That's generally the purpose of the annotations - if it doesn't then that's a bug. Thanks, Ingo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 3/3] virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification
virtio_ccw's notify() callback for the common IO layer invokes virtio_driver's notify() callback to pass-on information to a backend driver if an online device disappeared. Signed-off-by: Heinz Graalfs --- drivers/s390/kvm/virtio_ccw.c | 14 -- 1 file changed, 12 insertions(+), 2 deletions(-) diff --git a/drivers/s390/kvm/virtio_ccw.c b/drivers/s390/kvm/virtio_ccw.c index 35b9aaa..420010d 100644 --- a/drivers/s390/kvm/virtio_ccw.c +++ b/drivers/s390/kvm/virtio_ccw.c @@ -1064,8 +1064,18 @@ out_free: static int virtio_ccw_cio_notify(struct ccw_device *cdev, int event) { - /* TODO: Check whether we need special handling here. */ - return 0; + int rc; + struct virtio_ccw_device *vcdev = dev_get_drvdata(&cdev->dev); + + switch (event) { + case CIO_GONE: + rc = notify_virtio_device(&vcdev->vdev, VDEV_GONE); + break; + default: + rc = NOTIFY_DONE; + break; + } + return rc; } static struct ccw_device_id virtio_ids[] = { -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 2/3] virtio_blk: add virtblk_notify() as virtio_driver's notify() callback
Add virtblk_notify() as virtio_driver's notify() callback. When a transport driver is notified that a device disappeared it should invoke this callback to prevent further request queueing. Subsequent block layer calls of virtio_blk's request function will fail, resulting in appropriate I/O errors. Signed-off-by: Heinz Graalfs --- drivers/block/virtio_blk.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/block/virtio_blk.c b/drivers/block/virtio_blk.c index 2d43be4..7fc1d62 100644 --- a/drivers/block/virtio_blk.c +++ b/drivers/block/virtio_blk.c @@ -901,6 +901,19 @@ static void virtblk_remove(struct virtio_device *vdev) ida_simple_remove(&vd_index_ida, index); } +static int virtblk_notify(struct virtio_device *vdev, int event) +{ + struct virtio_blk *vblk = vdev->priv; + + if (event == VDEV_GONE) { + queue_flag_set_unlocked(QUEUE_FLAG_DYING, vblk->disk->queue); + queue_flag_set_unlocked(QUEUE_FLAG_NOMERGES, vblk->disk->queue); + queue_flag_set_unlocked(QUEUE_FLAG_NOXMERGES, + vblk->disk->queue); + } + return NOTIFY_DONE; +} + #ifdef CONFIG_PM static int virtblk_freeze(struct virtio_device *vdev) { @@ -961,6 +974,7 @@ static struct virtio_driver virtio_blk = { .probe = virtblk_probe, .remove = virtblk_remove, .config_changed = virtblk_config_changed, + .notify = virtblk_notify, #ifdef CONFIG_PM .freeze = virtblk_freeze, .restore= virtblk_restore, -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 0/3] virtio: add new notify() callback to virtio_driver
Hi, when an active virtio block device is hot-unplugged from a KVM guest, running affected guest user applications are not aware of any errors that occur due to the lost device. This patch-set adds code to avoid further request queueing when a lost block device is detected, resulting in appropriate error info. On System z there exists no handshake mechanism between host and guest when a device is hot-unplugged. The device is removed and no further I/O is possible. When an online channel device disappears on System z the kernel's CIO layer informs the driver (virtio_ccw) about the lost device. It's just the block device drivers that care to provide a notify callback. Here are some more error details: For a particular block device virtio's request function virtblk_request() is called by the block layer to queue requests to be handled by the host. In case of a lost device requests can still be queued, but an appropriate subsequent host kick usually fails. This leads to situations where no error feedback is shown. In order to prevent request queueing for lost devices appropriate settings in the block layer should be made. Exploiting System z's CIO notify handler callback, and adding a corresponding new virtio_driver notify() handler to 'inform' the block layer, solve this task. Patch 1 adds an optional notify() callback to virtio_driver. Patch 2 adds a new notify() callback for the virtio_blk driver. When called for a lost device settings are made to prevent future request queueing. Patch 3 modifies the CIO notify handler in virtio_ccw's transport layer to pass on the lost device info to virtio's backend driver virtio_blk. Heinz Graalfs (3): virtio: add notify() callback to virtio_driver virtio_blk: add virtblk_notify() as virtio_driver's notify() callback virtio_ccw: invoke virtio_driver's notify() on CIO_GONE notification drivers/block/virtio_blk.c| 14 ++ drivers/s390/kvm/virtio_ccw.c | 14 -- drivers/virtio/virtio.c | 8 include/linux/virtio.h| 10 ++ 4 files changed, 44 insertions(+), 2 deletions(-) -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC 1/3] virtio: add notify() callback to virtio_driver
Add an optional notify() callback to virtio_driver. A backend driver can provide this callback to perform actions for a lost device. notify() event values are inherited from virtio_ccw's notify() callback. We might want to support even more of them lateron. notify() return values are defined in include/linux/notifier.h. Signed-off-by: Heinz Graalfs --- drivers/virtio/virtio.c | 8 include/linux/virtio.h | 10 ++ 2 files changed, 18 insertions(+) diff --git a/drivers/virtio/virtio.c b/drivers/virtio/virtio.c index ee59b74..a09abb4 100644 --- a/drivers/virtio/virtio.c +++ b/drivers/virtio/virtio.c @@ -186,6 +186,14 @@ void unregister_virtio_driver(struct virtio_driver *driver) } EXPORT_SYMBOL_GPL(unregister_virtio_driver); +int notify_virtio_device(struct virtio_device *vdev, int event) +{ + struct virtio_driver *drv = drv_to_virtio(vdev->dev.driver); + + return drv->notify ? drv->notify(vdev, event) : NOTIFY_DONE; +} +EXPORT_SYMBOL_GPL(notify_virtio_device); + int register_virtio_device(struct virtio_device *dev) { int err; diff --git a/include/linux/virtio.h b/include/linux/virtio.h index f15f6e7..da18e9a 100644 --- a/include/linux/virtio.h +++ b/include/linux/virtio.h @@ -110,6 +110,15 @@ int register_virtio_device(struct virtio_device *dev); void unregister_virtio_device(struct virtio_device *dev); /** + * notify event values + * @VDEV_GONE: device gone + */ +enum { + VDEV_GONE = 1, +}; +int notify_virtio_device(struct virtio_device *dev, int event); + +/** * virtio_driver - operations for a virtio I/O driver * @driver: underlying device driver (populate name and owner). * @id_table: the ids serviced by this driver. @@ -129,6 +138,7 @@ struct virtio_driver { void (*scan)(struct virtio_device *dev); void (*remove)(struct virtio_device *dev); void (*config_changed)(struct virtio_device *dev); + int (*notify)(struct virtio_device *dev, int event); #ifdef CONFIG_PM int (*freeze)(struct virtio_device *dev); int (*restore)(struct virtio_device *dev); -- 1.8.3.1 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Wed, 2013-11-20 at 10:58 +0200, Michael S. Tsirkin wrote: > On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote: > > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: > > > > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a > > > it didn't drop packets received from host as far as I can tell. > > > virtio is more like a pipe than a real NIC in this respect. > > > > Prior/after to this patch, you were not posting buffers, so if packets > > were received on a physical NIC, you were dropping the packets anyway. > > > > It makes no difference at all, adding a cushion might make you feel > > better, but its really not worth it. > > > > Under memory stress, it makes better sense to drop a super big GRO > > packet (The one needing frag_list extension ...) > > > > It gives a better signal to the sender to reduce its pressure, and gives > > opportunity to free more of your memory. > > > > OK, but in that case one wonders whether we should do more to free memory? > > E.g. imagine that we dropped a packet of a specific TCP flow > because we couldn't allocate a new packet. > > What happens now is that the old packet is freed as well. > > So quite likely the next packet in queue will get processed > since it will reuse the memory we have just freed. > > The next packet and the next after it etc all will have to go through > the net stack until they get at the socket and are dropped then > because we missed a segment. Even worse, GRO gets disabled so the load > on receiver goes up instead of down. > > Sounds like a problem doesn't it? I see no problem at all. GRO is a hint for high rates (and obviously when there is enough memory) > > GRO actually detects it's the same flow and can see packet is > out of sequence. Why doesn't it drop the packet then? > Alternatively, we could (for example using the pre-allocated skb > like I suggested) notify GRO that it should start dropping packets > of this flow. > > What do you think? > I think we disagree a lot on memory management on networking stacks. We did a lot of work in TCP stack and Qdisc layers to lower memory pressure (and bufferbloat), an you seem to try hard to introduce yet another layer of buffer bloat in virtio_net. So add whatever you want to proudly state to your management : "Look how smart we are : we drop no packets in our layer" ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
2014 World Conference on Information Systems and Technologies; Submission: Nov. 29
Apologies if you are receiving this mail more than once... ** WorldCIST'14 The 2014 World Conference on Information Systems and Technologies April 15 - 18, Madeira Island, Portugal http://www.aisti.eu/worldcist14/ ** The 2014 World Conference on Information Systems and Technologies (WorldCIST'14: http://www.aisti.eu/worldcist14) is a global forum for researchers and practitioners to present and discuss the most recent innovations, trends, results, experiences and concerns in the several perspectives of Information Systems and Technologies. We are pleased to invite you to submit your papers to WorldCISTI'14. All submissions will be reviewed on the basis of relevance, originality, importance and clarity. THEMES Submitted papers should be related with one or more of the main themes proposed for the Conference: A) Information and Knowledge Management (IKM); B) Organizational Models and Information Systems (OMIS); C) Intelligent and Decision Support Systems (IDSS); D) Software Systems, Architectures, Applications and Tools (SSAAT); E) Computer Networks, Mobility and Pervasive Systems (CNMPS); F) Human-Computer Interaction (HCI); G) Health Informatics (HIS); H) Information Technologies in Education (ITE). TYPES OF SUBMISSIONS AND DECISIONS Four types of papers can be submitted: Full paper: Finished or consolidated R&D works, to be included in one of the Conference themes. These papers are assigned a 10-page limit. Short paper: Ongoing works with relevant preliminary results, open to discussion. These papers are assigned a 7-page limit. Poster paper: Initial work with relevant ideas, open to discussion. These papers are assigned to a 4-page limit. Company paper: Companies' papers that show practical experience, R & D, tools, etc., focused on some topics of the conference. These papers are assigned to a 4-page limit. Submitted papers must comply with the format of Advances in Intelligent Systems and Computing Series (see Instructions for Authors at Springer Website or download a DOC example) be written in English, must not have been published before, not be under review for any other conference or publication and not include any information leading to the authors identification. Therefore, the authors names, affiliations and bibliographic references should not be included in the version for evaluation by the Program Committee. This information should only be included in the camera-ready version, saved in Word or Latex format and also in PDF format. These files must be accompanied by the Consent to Publication form filled out, in a ZIP file, and uploaded at the conference management system. All papers will be subjected to a double-blind review by at least two members of the Program Committee. Based on Program Committee evaluation, a paper can be rejected or accepted by the Conference Chairs. In the later case, it can be accepted as the type originally submitted or as another type. Thus, full papers can be accepted as short papers or poster papers only. Similarly, short papers can be accepted as poster papers only. In these cases, the authors will be allowed to maintain the original number of pages in the camera-ready version. The authors of accepted poster papers must also build and print a poster to be exhibited during the Conference. This poster must follow an A1 or A2 vertical format. The Conference includes Work Sessions where these posters are presented and orally discussed, with a 5 minute limit per poster. The authors of accepted full papers will have 15 minutes to present their work in a Conference Work Session; approximately 5 minutes of discussion will follow each presentation. The authors of accepted short papers and company papers will have 11 minutes to present their work in a Conference Work Session; approximately 4 minutes of discussion will follow each presentation. PUBLICATION AND INDEXING To ensure that a full paper, short paper, poster paper or company paper is published in the Proceedings, at least one of the authors must be fully registered by the 24th of January 2014, and the paper must comply with the suggested layout and page-limit. Additionally, all recommended changes must be addressed by the authors before they submit the camera-ready version. No more than one paper per registration will be published in the Conference Proceedings. An extra fee must be paid for publication of additional papers, with a maximum of one additional paper per registration. Full and short papers will be published in Proceedings by Springer, in Advances in Intelligent Systems and Computing Series. Poster and company papers will be published in Proceedings by AISTI. Published full and short papers will be submitted
Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers
On Wed, Nov 20, 2013 at 08:54:02AM -0500, Jason Wang wrote: > > > - 原始邮件 - > > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > > > When mergeable buffer were used, we only put the first page buf leave the > > > rest > > > of buffers in the virt queue. This will cause the driver could not get the > > > correct head buffer any more. Fix this by dropping the rest of buffers for > > > this > > > packet. > > > > > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > > > (virtio_net: Defer skb allocation in receive path). > > > > > > Cc: Rusty Russell > > > Cc: Michael S. Tsirkin > > > Cc: Michael Dalton > > > Cc: Eric Dumazet > > > Cc: Shirley Ma > > > Signed-off-by: Jason Wang > > > > Just to clarify my previous comment: it was not about the > > idea of adding drop_mergeable_buffer - rather, I think that > > adding knowledge about mergeable buffers into page_to_skb creates an > > ugly internal API. > > > > Let's move the call to page_to_skb within receive_mergeable instead: > > it's also nice that int offset = buf - page_address(page) logic > > is not spread around like it was. > > > > Also, it's not nice that we ignore length errors when we drop > > packets because of OOM. > > > > So I came up with the following - it seems to work but I didn't > > stress test yet. > > I've no objection on this. But I've rather like my small and direct patch > to be applied to -net first. It has lower risk and was much more easier to > be backported to stable trees. Then we can do the re-factor like this in > net-next. It makes the interfaces too messy. We are not in code freeze in -net - only feature freeze, so no reason to make code like spagetty, and it's only 25 lines changed as compared to 40. It's not a huge refactoring. It's just as easy to backport my patch too. You just drop the goto in the new code path we added. Let me show you (untested) - your patch is not smaller. Signed-off-by: Michael S. Tsirkin commit 9b442fe970d5c71311d4314edef26ee2eb16e7fb Author: Michael S. Tsirkin Date: Wed Nov 20 12:44:14 2013 +0200 virtio_net: fix resource leak on alloc failure virtio net got confused, started dropping packets. Signed-off-by: Michael S. Tsirkin diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 9fbdfcd..df4b9d0 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -297,13 +297,22 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, return skb; } -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) +static struct sk_buff *receive_mergeable(struct net_device *dev, +struct receive_queue *rq, +struct page *page, +unsigned int len) { - struct skb_vnet_hdr *hdr = skb_vnet_hdr(skb); - struct page *page; - int num_buf, i, len; + struct skb_vnet_hdr *hdr = page_address(buf); + int num_buf = hdr->mhdr.num_buffers; + struct sk_buff *skb = page_to_skb(rq, page, len); + int i; + + skb = page_to_skb(rq, page, len); + + if (unlikely(!skb)) + goto err_skb; + - num_buf = hdr->mhdr.num_buffers; while (--num_buf) { i = skb_shinfo(skb)->nr_frags; if (i >= MAX_SKB_FRAGS) { @@ -313,10 +322,10 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) } page = virtqueue_get_buf(rq->vq, &len); if (!page) { - pr_debug("%s: rx error: %d buffers missing\n", -skb->dev->name, hdr->mhdr.num_buffers); - skb->dev->stats.rx_length_errors++; - return -EINVAL; + pr_debug("%s: rx error: %d buffers %d missing\n", +dev->name, hdr->mhdr.num_buffers, num_buf); + dev->stats.rx_length_errors++; + goto err_buf; } if (len > PAGE_SIZE) @@ -326,7 +335,25 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *skb) --rq->num; } - return 0; + return skb; +err_skb: + put_page(page); +err_buf: + dev->stats.rx_dropped++; + dev_kfree_skb(head_skb); + while (--num_buf) { + buf = virtqueue_get_buf(rq->vq, &len); + if (unlikely(!buf)) { + pr_debug("%s: rx error: %d buffers missing\n", +dev->name, num_buf); + dev->stats.rx_length_errors++; + break; + } + page = buf; + give_pages(rq, page); + --rq->num; + } + return NULL; } static void receive_buf(struct receive_queue *rq, void *buf, unsigned i
Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure
- 原始邮件 - > On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote: > > > > > > - 原始邮件 - > > > On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote: > > > > We need decrease the rq->num after we can get a buf through > > > > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, > > > > the > > > > refill routine won't be triggered under heavy memory stress since the > > > > driver may > > > > still think there's enough room. > > > > > > > > This bug was introduced by commit > > > > 2613af0ed18a11d5c566a81f9a6510b73180660a > > > > (virtio_net: migrate mergeable rx buffers to page frag allocators). > > > > > > > > Cc: Rusty Russell > > > > Cc: Michael S. Tsirkin > > > > Cc: Michael Dalton > > > > Cc: Eric Dumazet > > > > Signed-off-by: Jason Wang > > > > > > So let's wrap virtqueue_get_buf to make sure we get it right? > > > > > > > Ok. good idea. > > So I did this (below) but the compiler is not smart enough to > avoid two branches on data path. > So I'm not sure anymore: with my patch it's pretty clear how > the code works. > > Signed-off-by: Michael S. Tsirkin > > but I don't think we need to apply this. > True, another point is we'd better to handle both increasing and decreasing in the same way. We do not increase it in a helper. > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 11d9cc9..1cc2e43 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue > *rq, > return skb; > } > > +static void *rq_get_buf(struct receive_queue *rq, unsigned int *len) > +{ > + void *buf = virtqueue_get_buf(rq->vq, len); > + if (buf) > + rq->num--; > + return buf; > +} > + > static struct sk_buff *receive_mergeable(struct net_device *dev, >struct receive_queue *rq, >void *buf, > @@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct > net_device *dev, > while (--num_buf) { > int num_skb_frags; > > - buf = virtqueue_get_buf(rq->vq, &len); > + buf = rq_get_buf(rq, &len); > if (unlikely(!buf)) { > pr_debug("%s: rx error: %d buffers out of %d missing\n", >dev->name, num_buf, hdr->mhdr.num_buffers); > @@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct > net_device *dev, > } > > page = virt_to_head_page(buf); > - --rq->num; > > num_skb_frags = skb_shinfo(curr_skb)->nr_frags; > if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) { > @@ -370,7 +377,7 @@ err_buf: > dev->stats.rx_dropped++; > dev_kfree_skb(head_skb); > while (--num_buf) { > - buf = virtqueue_get_buf(rq->vq, &len); > + buf = rq_get_buf(rq, &len); > if (unlikely(!buf)) { > pr_debug("%s: rx error: %d buffers missing\n", >dev->name, num_buf); > @@ -379,7 +386,6 @@ err_buf: > } > page = virt_to_head_page(buf); > put_page(page); > - --rq->num; > } > return NULL; > } > @@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int > budget) > > again: > while (received < budget && > -(buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { > +(buf = rq_get_buf(rq, &len)) != NULL) { > receive_buf(rq, buf, len); > - --rq->num; > received++; > } > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH RFC] virtio_net: fix error handling for mergeable buffers
- 原始邮件 - > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > > When mergeable buffer were used, we only put the first page buf leave the > > rest > > of buffers in the virt queue. This will cause the driver could not get the > > correct head buffer any more. Fix this by dropping the rest of buffers for > > this > > packet. > > > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > > (virtio_net: Defer skb allocation in receive path). > > > > Cc: Rusty Russell > > Cc: Michael S. Tsirkin > > Cc: Michael Dalton > > Cc: Eric Dumazet > > Cc: Shirley Ma > > Signed-off-by: Jason Wang > > Just to clarify my previous comment: it was not about the > idea of adding drop_mergeable_buffer - rather, I think that > adding knowledge about mergeable buffers into page_to_skb creates an > ugly internal API. > > Let's move the call to page_to_skb within receive_mergeable instead: > it's also nice that int offset = buf - page_address(page) logic > is not spread around like it was. > > Also, it's not nice that we ignore length errors when we drop > packets because of OOM. > > So I came up with the following - it seems to work but I didn't > stress test yet. I've no objection on this. But I've rather like my small and direct patch to be applied to -net first. It has lower risk and was much more easier to be backported to stable trees. Then we can do the re-factor like this in net-next. > > commit ebffb3fe4335ffe07124e4518e76d6e05844fa18 > Author: Michael S. Tsirkin > Date: Wed Nov 20 14:41:29 2013 +0200 > > virtio_net: fix error handling for mergeable buffers > > Eric Dumazet noticed that if we encounter an error > when processing a mergeable buffer, we don't > dequeue all of the buffers from this packet, > the result is almost sure to be loss of networking. > > Jason Wang noticed that we also leak a page and that we don't decrement > the rq buf count, so we won't repost buffers (a resource leak). > > Cc: Rusty Russell > Cc: Michael Dalton > Reported-by: Eric Dumazet > Reported-by: Jason Wang > Signed-off-by: Michael S. Tsirkin > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 01f4eb5..42f6a1e 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue > *rq, > return skb; > } > > -static int receive_mergeable(struct receive_queue *rq, struct sk_buff > *head_skb) > +static struct sk_buff *receive_mergeable(struct net_device *dev, > + struct receive_queue *rq, > + void *buf, > + unsigned int len) > { > - struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb); > + struct skb_vnet_hdr *hdr = buf; > + int num_buf = hdr->mhdr.num_buffers; > + struct page *page = virt_to_head_page(buf); > + int offset = buf - page_address(page); > + struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, > +MAX_PACKET_LEN); > struct sk_buff *curr_skb = head_skb; > - char *buf; > - struct page *page; > - int num_buf, len, offset; > > - num_buf = hdr->mhdr.num_buffers; > - while (--num_buf) { > - int num_skb_frags = skb_shinfo(curr_skb)->nr_frags; > + if (unlikely(!curr_skb)) > + goto err_skb; > + > + while (--num_buf) { > + int num_skb_frags; > + > buf = virtqueue_get_buf(rq->vq, &len); > if (unlikely(!buf)) { > - pr_debug("%s: rx error: %d buffers missing\n", > - head_skb->dev->name, hdr->mhdr.num_buffers); > - head_skb->dev->stats.rx_length_errors++; > - return -EINVAL; > + pr_debug("%s: rx error: %d buffers out of %d missing\n", > + dev->name, num_buf, hdr->mhdr.num_buffers); > + dev->stats.rx_length_errors++; > + goto err_buf; > } > if (unlikely(len > MAX_PACKET_LEN)) { > pr_debug("%s: rx error: merge buffer too long\n", > - head_skb->dev->name); > + dev->name); > len = MAX_PACKET_LEN; > } > + > + page = virt_to_head_page(buf); > + --rq->num; > + > + num_skb_frags = skb_shinfo(curr_skb)->nr_frags; > if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) { > struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); > - if (unlikely(!nskb)) { > - head_skb->dev->stats.rx_dropped++; > - return -ENOMEM; > - } > + > + if (unlikely(!ns
Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure
On Wed, Nov 20, 2013 at 07:08:50AM -0500, Jason Wang wrote: > > > - 原始邮件 - > > On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote: > > > We need decrease the rq->num after we can get a buf through > > > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the > > > refill routine won't be triggered under heavy memory stress since the > > > driver may > > > still think there's enough room. > > > > > > This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a > > > (virtio_net: migrate mergeable rx buffers to page frag allocators). > > > > > > Cc: Rusty Russell > > > Cc: Michael S. Tsirkin > > > Cc: Michael Dalton > > > Cc: Eric Dumazet > > > Signed-off-by: Jason Wang > > > > So let's wrap virtqueue_get_buf to make sure we get it right? > > > > Ok. good idea. So I did this (below) but the compiler is not smart enough to avoid two branches on data path. So I'm not sure anymore: with my patch it's pretty clear how the code works. Signed-off-by: Michael S. Tsirkin but I don't think we need to apply this. diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 11d9cc9..1cc2e43 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -296,6 +296,14 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, return skb; } +static void *rq_get_buf(struct receive_queue *rq, unsigned int *len) +{ + void *buf = virtqueue_get_buf(rq->vq, len); + if (buf) + rq->num--; + return buf; +} + static struct sk_buff *receive_mergeable(struct net_device *dev, struct receive_queue *rq, void *buf, @@ -315,7 +323,7 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, while (--num_buf) { int num_skb_frags; - buf = virtqueue_get_buf(rq->vq, &len); + buf = rq_get_buf(rq, &len); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers out of %d missing\n", dev->name, num_buf, hdr->mhdr.num_buffers); @@ -329,7 +337,6 @@ static struct sk_buff *receive_mergeable(struct net_device *dev, } page = virt_to_head_page(buf); - --rq->num; num_skb_frags = skb_shinfo(curr_skb)->nr_frags; if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) { @@ -370,7 +377,7 @@ err_buf: dev->stats.rx_dropped++; dev_kfree_skb(head_skb); while (--num_buf) { - buf = virtqueue_get_buf(rq->vq, &len); + buf = rq_get_buf(rq, &len); if (unlikely(!buf)) { pr_debug("%s: rx error: %d buffers missing\n", dev->name, num_buf); @@ -379,7 +386,6 @@ err_buf: } page = virt_to_head_page(buf); put_page(page); - --rq->num; } return NULL; } @@ -675,9 +681,8 @@ static int virtnet_poll(struct napi_struct *napi, int budget) again: while (received < budget && - (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { + (buf = rq_get_buf(rq, &len)) != NULL) { receive_buf(rq, buf, len); - --rq->num; received++; } ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb
On Wed, Nov 20, 2013 at 07:08:02AM -0500, Jason Wang wrote: > > > - 原始邮件 - > > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > > > When mergeable buffer were used, we only put the first page buf leave the > > > rest > > > of buffers in the virt queue. This will cause the driver could not get the > > > correct head buffer any more. Fix this by dropping the rest of buffers for > > > this > > > packet. > > > > > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > > > (virtio_net: Defer skb allocation in receive path). > > > > > > Cc: Rusty Russell > > > Cc: Michael S. Tsirkin > > > Cc: Michael Dalton > > > Cc: Eric Dumazet > > > Cc: Shirley Ma > > > Signed-off-by: Jason Wang > > > --- > > > This patch was needed for stable > > > --- > > > drivers/net/virtio_net.c | 18 +- > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 7bab4de..24fd502 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq) > > > netif_wake_subqueue(vi->dev, vq2txq(vq)); > > > } > > > > > > +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf) > > > +{ > > > + char *buf; > > > + int len; > > > + > > > + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { > > > + --rq->num; > > > + put_page(virt_to_head_page(buf)); > > > + } > > > +} > > > + > > > > This is the same code we have in receive_mergeable anyway. > > So let's reuse that. > > > > > > receive_mergeable() was called after page_to_skb() was called and > there's lots of conditions check there. I'm not sure how could we > reuse them. I posted a patch showing how :) > > > /* Called from bottom half context */ > > > static struct sk_buff *page_to_skb(struct receive_queue *rq, > > > struct page *page, unsigned int offset, > > > @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct > > > receive_queue *rq, > > > > > > /* copy small packet so we can reuse these pages for small data */ > > > skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > > > - if (unlikely(!skb)) > > > + if (unlikely(!skb)) { > > > + if (vi->mergeable_rx_bufs) { > > > + hdr = (struct skb_vnet_hdr *)p; > > > + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers); > > > + } > > > return NULL; > > > + } > > > > > > hdr = skb_vnet_hdr(skb); > > > > > > -- > > > 1.8.3.2 > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > > the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/ > > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH RFC] virtio_net: fix error handling for mergeable buffers
On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > When mergeable buffer were used, we only put the first page buf leave the rest > of buffers in the virt queue. This will cause the driver could not get the > correct head buffer any more. Fix this by dropping the rest of buffers for > this > packet. > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > (virtio_net: Defer skb allocation in receive path). > > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Cc: Michael Dalton > Cc: Eric Dumazet > Cc: Shirley Ma > Signed-off-by: Jason Wang Just to clarify my previous comment: it was not about the idea of adding drop_mergeable_buffer - rather, I think that adding knowledge about mergeable buffers into page_to_skb creates an ugly internal API. Let's move the call to page_to_skb within receive_mergeable instead: it's also nice that int offset = buf - page_address(page) logic is not spread around like it was. Also, it's not nice that we ignore length errors when we drop packets because of OOM. So I came up with the following - it seems to work but I didn't stress test yet. commit ebffb3fe4335ffe07124e4518e76d6e05844fa18 Author: Michael S. Tsirkin Date: Wed Nov 20 14:41:29 2013 +0200 virtio_net: fix error handling for mergeable buffers Eric Dumazet noticed that if we encounter an error when processing a mergeable buffer, we don't dequeue all of the buffers from this packet, the result is almost sure to be loss of networking. Jason Wang noticed that we also leak a page and that we don't decrement the rq buf count, so we won't repost buffers (a resource leak). Cc: Rusty Russell Cc: Michael Dalton Reported-by: Eric Dumazet Reported-by: Jason Wang Signed-off-by: Michael S. Tsirkin diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 01f4eb5..42f6a1e 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -296,41 +296,53 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, return skb; } -static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb) +static struct sk_buff *receive_mergeable(struct net_device *dev, +struct receive_queue *rq, +void *buf, +unsigned int len) { - struct skb_vnet_hdr *hdr = skb_vnet_hdr(head_skb); + struct skb_vnet_hdr *hdr = buf; + int num_buf = hdr->mhdr.num_buffers; + struct page *page = virt_to_head_page(buf); + int offset = buf - page_address(page); + struct sk_buff *head_skb = page_to_skb(rq, page, offset, len, + MAX_PACKET_LEN); struct sk_buff *curr_skb = head_skb; - char *buf; - struct page *page; - int num_buf, len, offset; - num_buf = hdr->mhdr.num_buffers; - while (--num_buf) { - int num_skb_frags = skb_shinfo(curr_skb)->nr_frags; + if (unlikely(!curr_skb)) + goto err_skb; + + while (--num_buf) { + int num_skb_frags; + buf = virtqueue_get_buf(rq->vq, &len); if (unlikely(!buf)) { - pr_debug("%s: rx error: %d buffers missing\n", -head_skb->dev->name, hdr->mhdr.num_buffers); - head_skb->dev->stats.rx_length_errors++; - return -EINVAL; + pr_debug("%s: rx error: %d buffers out of %d missing\n", +dev->name, num_buf, hdr->mhdr.num_buffers); + dev->stats.rx_length_errors++; + goto err_buf; } if (unlikely(len > MAX_PACKET_LEN)) { pr_debug("%s: rx error: merge buffer too long\n", -head_skb->dev->name); +dev->name); len = MAX_PACKET_LEN; } + + page = virt_to_head_page(buf); + --rq->num; + + num_skb_frags = skb_shinfo(curr_skb)->nr_frags; if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) { struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); - if (unlikely(!nskb)) { - head_skb->dev->stats.rx_dropped++; - return -ENOMEM; - } + + if (unlikely(!nskb)) + goto err_skb; if (curr_skb == head_skb) skb_shinfo(curr_skb)->frag_list = nskb; else curr_skb->next = nskb; - curr_skb = nskb; head_skb->truesize += nskb->truesize; + curr_skb = ns
Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure
- 原始邮件 - > On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote: > > We need decrease the rq->num after we can get a buf through > > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the > > refill routine won't be triggered under heavy memory stress since the > > driver may > > still think there's enough room. > > > > This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a > > (virtio_net: migrate mergeable rx buffers to page frag allocators). > > > > Cc: Rusty Russell > > Cc: Michael S. Tsirkin > > Cc: Michael Dalton > > Cc: Eric Dumazet > > Signed-off-by: Jason Wang > > So let's wrap virtqueue_get_buf to make sure we get it right? > Ok. good idea. > > --- > > The patch was needed for 3.12 stable. > > --- > > drivers/net/virtio_net.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 24fd502..de1d6ca 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, > > struct sk_buff *head_skb) > > head_skb->dev->stats.rx_length_errors++; > > return -EINVAL; > > } > > + --rq->num; > > if (unlikely(len > MERGE_BUFFER_LEN)) { > > pr_debug("%s: rx error: merge buffer too long\n", > > head_skb->dev->name); > > @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, > > struct sk_buff *head_skb) > > skb_add_rx_frag(curr_skb, num_skb_frags, page, > > offset, len, MERGE_BUFFER_LEN); > > } > > - --rq->num; > > } > > return 0; > > } > > -- > > 1.8.3.2 > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb
- 原始邮件 - > On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > > When mergeable buffer were used, we only put the first page buf leave the > > rest > > of buffers in the virt queue. This will cause the driver could not get the > > correct head buffer any more. Fix this by dropping the rest of buffers for > > this > > packet. > > > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > > (virtio_net: Defer skb allocation in receive path). > > > > Cc: Rusty Russell > > Cc: Michael S. Tsirkin > > Cc: Michael Dalton > > Cc: Eric Dumazet > > Cc: Shirley Ma > > Signed-off-by: Jason Wang > > --- > > This patch was needed for stable > > --- > > drivers/net/virtio_net.c | 18 +- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 7bab4de..24fd502 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq) > > netif_wake_subqueue(vi->dev, vq2txq(vq)); > > } > > > > +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf) > > +{ > > + char *buf; > > + int len; > > + > > + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { > > + --rq->num; > > + put_page(virt_to_head_page(buf)); > > + } > > +} > > + > > This is the same code we have in receive_mergeable anyway. > So let's reuse that. > > receive_mergeable() was called after page_to_skb() was called and there's lots of conditions check there. I'm not sure how could we reuse them. > > /* Called from bottom half context */ > > static struct sk_buff *page_to_skb(struct receive_queue *rq, > >struct page *page, unsigned int offset, > > @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct > > receive_queue *rq, > > > > /* copy small packet so we can reuse these pages for small data */ > > skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > > - if (unlikely(!skb)) > > + if (unlikely(!skb)) { > > + if (vi->mergeable_rx_bufs) { > > + hdr = (struct skb_vnet_hdr *)p; > > + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers); > > + } > > return NULL; > > + } > > > > hdr = skb_vnet_hdr(skb); > > > > -- > > 1.8.3.2 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure
On Wed, Nov 20, 2013 at 05:07:26PM +0800, Jason Wang wrote: > We need decrease the rq->num after we can get a buf through > virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the > refill routine won't be triggered under heavy memory stress since the driver > may > still think there's enough room. > > This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a > (virtio_net: migrate mergeable rx buffers to page frag allocators). > > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Cc: Michael Dalton > Cc: Eric Dumazet > Signed-off-by: Jason Wang So let's wrap virtqueue_get_buf to make sure we get it right? > --- > The patch was needed for 3.12 stable. > --- > drivers/net/virtio_net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 24fd502..de1d6ca 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, > struct sk_buff *head_skb) > head_skb->dev->stats.rx_length_errors++; > return -EINVAL; > } > + --rq->num; > if (unlikely(len > MERGE_BUFFER_LEN)) { > pr_debug("%s: rx error: merge buffer too long\n", >head_skb->dev->name); > @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, > struct sk_buff *head_skb) > skb_add_rx_frag(curr_skb, num_skb_frags, page, > offset, len, MERGE_BUFFER_LEN); > } > - --rq->num; > } > return 0; > } > -- > 1.8.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb
On Wed, Nov 20, 2013 at 05:07:25PM +0800, Jason Wang wrote: > When mergeable buffer were used, we only put the first page buf leave the rest > of buffers in the virt queue. This will cause the driver could not get the > correct head buffer any more. Fix this by dropping the rest of buffers for > this > packet. > > The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 > (virtio_net: Defer skb allocation in receive path). > > Cc: Rusty Russell > Cc: Michael S. Tsirkin > Cc: Michael Dalton > Cc: Eric Dumazet > Cc: Shirley Ma > Signed-off-by: Jason Wang > --- > This patch was needed for stable > --- > drivers/net/virtio_net.c | 18 +- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 7bab4de..24fd502 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq) > netif_wake_subqueue(vi->dev, vq2txq(vq)); > } > > +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf) > +{ > + char *buf; > + int len; > + > + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { > + --rq->num; > + put_page(virt_to_head_page(buf)); > + } > +} > + This is the same code we have in receive_mergeable anyway. So let's reuse that. > /* Called from bottom half context */ > static struct sk_buff *page_to_skb(struct receive_queue *rq, > struct page *page, unsigned int offset, > @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct receive_queue > *rq, > > /* copy small packet so we can reuse these pages for small data */ > skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > - if (unlikely(!skb)) > + if (unlikely(!skb)) { > + if (vi->mergeable_rx_bufs) { > + hdr = (struct skb_vnet_hdr *)p; > + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers); > + } > return NULL; > + } > > hdr = skb_vnet_hdr(skb); > > -- > 1.8.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net 1/3] virtio-net: drop the rest of buffers when we can't allocate skb
When mergeable buffer were used, we only put the first page buf leave the rest of buffers in the virt queue. This will cause the driver could not get the correct head buffer any more. Fix this by dropping the rest of buffers for this packet. The bug was introduced by commit 9ab86bbcf8be755256f0a5e994e0b38af6b4d399 (virtio_net: Defer skb allocation in receive path). Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Michael Dalton Cc: Eric Dumazet Cc: Shirley Ma Signed-off-by: Jason Wang --- This patch was needed for stable --- drivers/net/virtio_net.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 7bab4de..24fd502 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -222,6 +222,17 @@ static void skb_xmit_done(struct virtqueue *vq) netif_wake_subqueue(vi->dev, vq2txq(vq)); } +static void drop_mergeable_buffer(struct receive_queue *rq, int num_buf) +{ + char *buf; + int len; + + while (--num_buf && (buf = virtqueue_get_buf(rq->vq, &len)) != NULL) { + --rq->num; + put_page(virt_to_head_page(buf)); + } +} + /* Called from bottom half context */ static struct sk_buff *page_to_skb(struct receive_queue *rq, struct page *page, unsigned int offset, @@ -237,8 +248,13 @@ static struct sk_buff *page_to_skb(struct receive_queue *rq, /* copy small packet so we can reuse these pages for small data */ skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); - if (unlikely(!skb)) + if (unlikely(!skb)) { + if (vi->mergeable_rx_bufs) { + hdr = (struct skb_vnet_hdr *)p; + drop_mergeable_buffer(rq, hdr->mhdr.num_buffers); + } return NULL; + } hdr = skb_vnet_hdr(skb); -- 1.8.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net 3/3] virtio-net: fix resources leaking when fail to allocate frag skb
When we fail to allocate a frag skb, we need put the refcnt of the page and drop the rest of the buffers. Otherwise the page was leaked and driver won't get a correct head buffer after this failure. This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx buffers to page frag allocators). Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Michael Dalton Cc: Eric Dumazet Signed-off-by: Jason Wang --- The patch was needed for 3.12 stable. --- drivers/net/virtio_net.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index de1d6ca..f6f1e20 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -339,9 +339,12 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb) head_skb->dev->name); len = MERGE_BUFFER_LEN; } + page = virt_to_head_page(buf); if (unlikely(num_skb_frags == MAX_SKB_FRAGS)) { struct sk_buff *nskb = alloc_skb(0, GFP_ATOMIC); if (unlikely(!nskb)) { + put_page(page); + drop_mergeable_buffer(rq, num_buf); head_skb->dev->stats.rx_dropped++; return -ENOMEM; } @@ -358,7 +361,6 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb) head_skb->len += len; head_skb->truesize += MERGE_BUFFER_LEN; } - page = virt_to_head_page(buf); offset = buf - (char *)page_address(page); if (skb_can_coalesce(curr_skb, num_skb_frags, page, offset)) { put_page(page); -- 1.8.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH net 2/3] virtio-net: fix num calculation on frag skb allocation failure
We need decrease the rq->num after we can get a buf through virtqueue_get_buf() even if we could not allocate frag skb. Otherwise, the refill routine won't be triggered under heavy memory stress since the driver may still think there's enough room. This bug was introduced by commit 2613af0ed18a11d5c566a81f9a6510b73180660a (virtio_net: migrate mergeable rx buffers to page frag allocators). Cc: Rusty Russell Cc: Michael S. Tsirkin Cc: Michael Dalton Cc: Eric Dumazet Signed-off-by: Jason Wang --- The patch was needed for 3.12 stable. --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 24fd502..de1d6ca 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -333,6 +333,7 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb) head_skb->dev->stats.rx_length_errors++; return -EINVAL; } + --rq->num; if (unlikely(len > MERGE_BUFFER_LEN)) { pr_debug("%s: rx error: merge buffer too long\n", head_skb->dev->name); @@ -367,7 +368,6 @@ static int receive_mergeable(struct receive_queue *rq, struct sk_buff *head_skb) skb_add_rx_frag(curr_skb, num_skb_frags, page, offset, len, MERGE_BUFFER_LEN); } - --rq->num; } return 0; } -- 1.8.3.2 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 01:38:16PM -0800, Michael Dalton wrote: > Great catch Jason. I agree this now raises the larger issue of how to > handle a memory alloc failure in the middle of receive. As Eric mentioned, > we can drop the packet and free the remaining (num_buf) frags. > > Michael, perhaps I'm missing something, but why would you prefer > pre-allocating buffers in this case? If the guest kernel is OOM'ing, > dropping packets should provide backpressure. > > Also, we could just as easily fail the initial skb alloc in page_to_skb, > and I think that case also needs to be handled now in the same fashion as > a memory allocation failure in receive_mergeable. > > Best, > > Mike Yes I missed this last night. Thanks a lot Eric and Michael for pointing this out. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 05:34:16PM -0800, Michael Dalton wrote: > Hi, > > After further reflection I think we're looking at two related issues: > (a) a memory leak that Jason has identified that occurs when a memory > allocation fails in receive_mergeable. Jasons commit solves this issue. > (b) virtio-net does not dequeue all buffers for a packet in the > case that an error occurs on receive and mergeable receive buffers is > enabled. > > For (a), this bug is new and due to changes in 2613af0ed18a, and the > net impact is memory leak on the physical page. However, I believe (b) > has always been possible in some form because if page_to_skb() returns > NULL (e.g., due to SKB allocation failure), receive_mergeable is never > called. AFAICT this is also the behavior prior to 2613af0ed18a. > > The net impact of (b) would be that virtio-net would interpret a packet > buffer that is in the middle of a mergeable packet as the start of a > new packet, which is definitely also a bug (and the buffer contents > could contain bytes that resembled a valid virtio-net header). > > A solution for (b) will require handling both the page_to_skb memory > allocation failures and the memory allocation failures in > receive_mergeable introduced by 2613af0ed18a. > > Best, > > Mike Absolutely. I missed this fact yesterday night but I can see it clearly in the morning. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [PATCH net] virtio-net: fix page refcnt leaking when fail to allocate frag skb
On Tue, Nov 19, 2013 at 02:00:11PM -0800, Eric Dumazet wrote: > On Tue, 2013-11-19 at 23:53 +0200, Michael S. Tsirkin wrote: > > > Which NIC? Virtio? Prior to 2613af0ed18a11d5c566a81f9a6510b73180660a > > it didn't drop packets received from host as far as I can tell. > > virtio is more like a pipe than a real NIC in this respect. > > Prior/after to this patch, you were not posting buffers, so if packets > were received on a physical NIC, you were dropping the packets anyway. > > It makes no difference at all, adding a cushion might make you feel > better, but its really not worth it. > > Under memory stress, it makes better sense to drop a super big GRO > packet (The one needing frag_list extension ...) > > It gives a better signal to the sender to reduce its pressure, and gives > opportunity to free more of your memory. > OK, but in that case one wonders whether we should do more to free memory? E.g. imagine that we dropped a packet of a specific TCP flow because we couldn't allocate a new packet. What happens now is that the old packet is freed as well. So quite likely the next packet in queue will get processed since it will reuse the memory we have just freed. The next packet and the next after it etc all will have to go through the net stack until they get at the socket and are dropped then because we missed a segment. Even worse, GRO gets disabled so the load on receiver goes up instead of down. Sounds like a problem doesn't it? GRO actually detects it's the same flow and can see packet is out of sequence. Why doesn't it drop the packet then? Alternatively, we could (for example using the pre-allocated skb like I suggested) notify GRO that it should start dropping packets of this flow. What do you think? -- MST ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization